Cost Analysis#

Understanding token costs for agent-based kernel generation.

Token Consumption#

Agent methods consume more tokens than direct LLM sampling.

Method

Tokens per Success

Pass@5

~50K

Claude Code (normal)

~500K

AKO4ALL

~5.19M

Cost Factors#

Iterative Debugging#

Agents may perform multiple iterations:

  • Each iteration generates new code

  • Execution feedback increases context

  • Error messages increase prompt size

Model Selection#

Model

Relative Cost

GPT-4o

Medium

Opus-4.6

High

Qwen3.5

Low

GLM-5.0

Medium

Operator Complexity#

Operator Type

Average Iterations

ATen (Simple)

2-5

ATen (Complex)

5-10

vLLM

10-20

cuBLAS

10-30

Cost Estimation#

Quick Estimation#

# First run in debug mode (8 operators)
bash test_ops.sh --debug --device-count 1

# Check token usage
cat agent_bench/runs/<run_name>/results.json | grep tokens

Extrapolation#

Full run cost ≈ (debug tokens / 8) × 210

Cost Optimization#

Reduce Operators#

# Test only specific operators
bash test_ops.sh add,softmax,mul --device-count 1

Use Cheaper Methods#

# naive_cc uses fewer tokens than normal_cc
bash test_ops.sh add -m naive_cc --device-count 1

Set Timeout#

# Limit time per operator
bash test_ops.sh add --timeout 300 --device-count 1

Budget Planning#

Based on KernelGenBench experiments:

Scale

Estimated Tokens

Estimated Cost (Opus)

Debug (8 operators)

~5M

~$50

ATen (110 operators)

~500M

~$5,000

Full (210 operators)

~1B

~$10,000

Full AKO4ALL

~5B

~$50,000

Warning

Large-scale agent evaluation can consume billions of tokens. Be sure to test with debug mode first and plan your budget accordingly.