Cost Analysis

Cost Analysis#

Understanding token costs for agent-based kernel generation.

Token Consumption#

Agent methods consume more tokens than direct LLM sampling.

Method	Tokens per Success
Pass@5	~50K
Claude Code (normal)	~500K
AKO4ALL	~5.19M

Cost Factors#

Iterative Debugging#

Agents may perform multiple iterations:

Each iteration generates new code
Execution feedback increases context
Error messages increase prompt size

Model Selection#

Model	Relative Cost
GPT-4o	Medium
Opus-4.6	High
Qwen3.5	Low
GLM-5.0	Medium

Operator Complexity#

Operator Type	Average Iterations
ATen (Simple)	2-5
ATen (Complex)	5-10
vLLM	10-20
cuBLAS	10-30

Cost Estimation#

Quick Estimation#

# First run in debug mode (8 operators)
bash test_ops.sh --debug --device-count 1

# Check token usage
cat agent_bench/runs/<run_name>/results.json | grep tokens

Extrapolation#

Full run cost ≈ (debug tokens / 8) × 210

Cost Optimization#

Reduce Operators#

# Test only specific operators
bash test_ops.sh add,softmax,mul --device-count 1

Use Cheaper Methods#

# naive_cc uses fewer tokens than normal_cc
bash test_ops.sh add -m naive_cc --device-count 1

Set Timeout#

# Limit time per operator
bash test_ops.sh add --timeout 300 --device-count 1

Budget Planning#

Based on KernelGenBench experiments:

Scale	Estimated Tokens	Estimated Cost (Opus)
Debug (8 operators)	~5M	~$50
ATen (110 operators)	~500M	~$5,000
Full (210 operators)	~1B	~$10,000
Full AKO4ALL	~5B	~$50,000

Warning

Large-scale agent evaluation can consume billions of tokens. Be sure to test with debug mode first and plan your budget accordingly.