Setup#

Configure your environment for the Agent Track evaluation.

Option B: Separate Environments#

If Claude Code is installed in a different environment:

cp agent_bench/config.example.yaml agent_bench/config.yaml

Edit config.yaml:

paths:
  python: /path/to/envs/kernelgenbench/bin/python

When running, export PATH:

export PATH="/path/to/claude_tool/bin:$PATH"
cd agent_bench && bash test_ops.sh add --device-count 1

Configuration Fields#

Field

Description

paths.python

Python interpreter with torch + vllm + kernelgenbench

agent.bin

Path to agent CLI executable (default: claude)

API Credentials#

Ensure your API keys are set:

# Anthropic Claude
export ANTHROPIC_API_KEY=your_key

# OpenAI / OpenAI-compatible
export OPENAI_API_KEY=your_key

Verify Setup#

cd agent_bench

# Quick test with single operator
bash test_ops.sh add --device-count 1