Commands#
CLI commands for LLM Track evaluation.
Basic Usage#
Single Operator Test#
python scripts/generate_kernel_and_verify.py \
--op-name aten::add \
--single-test \
--server-type openai \
--model-name gpt-4o \
--max-rounds 3
Full Benchmark#
python scripts/generate_kernel_and_verify.py \
--server-type openai \
--model-name gpt-4o \
--max-rounds 10
Dataset Selection#
Full Dataset (NVIDIA)#
python scripts/generate_kernel_and_verify.py \
--dataset KernelGenBench \
--server-type openai \
--model-name gpt-4o
ATen Only (All Platforms)#
python scripts/generate_kernel_and_verify.py \
--dataset KernelGenBench-aten \
--server-type openai \
--model-name gpt-4o
Specific Operator Sources#
# vLLM operators only
python scripts/generate_kernel_and_verify.py \
--dataset KernelGenBench-vllm \
--server-type openai
# cuBLAS operators only
python scripts/generate_kernel_and_verify.py \
--dataset KernelGenBench-cublas \
--server-type openai
Server Types#
OpenAI#
python scripts/generate_kernel_and_verify.py \
--server-type openai \
--model-name gpt-4o
Anthropic#
python scripts/generate_kernel_and_verify.py \
--server-type anthropic \
--model-name claude-opus-4-6
Advanced Options#
Enable Reflection#
Enable feedback from previous rounds:
python scripts/generate_kernel_and_verify.py \
--server-type openai \
--model-name gpt-4o \
--reflection
Resume from Checkpoint#
python scripts/generate_kernel_and_verify.py \
--resume-from output/pass_at_k/previous_run/
Debug Mode#
Test with only 8 operators:
python scripts/generate_kernel_and_verify.py \
--debug \
--server-type openai