Getting Started#
This guide helps you quickly set up KernelGenBench and run your first evaluation.
Prerequisites#
Before installing KernelGenBench, ensure you have:
Installation#
NVIDIA#
git clone https://github.com/flagos-ai/KernelGenBench.git
cd KernelGenBench
pip install -r requirements/requirements_nvidia.txt
pip install -e .
vllm==0.13.0will automatically install compatible versions of torch and triton.
Domestic Chips (Ascend / MUSA / Hygon / Iluvatar / MetaX)#
On domestic chips, torch and the chip-specific runtime (e.g., torch_npu, torch_musa) are pre-installed in the vendor container image. Use the vendor-provided Docker image to start a container, then install KernelGenBench inside it:
# Start the vendor container (example for Ascend NPU)
docker run -it --rm --network host \
--device=/dev/davinci0 --device=/dev/davinci_manager \
ascend/pytorch:latest bash
# Inside the container, clone and install
git clone https://github.com/flagos-ai/KernelGenBench.git
cd KernelGenBench
pip install -r requirements/requirements_ascend.txt
pip install -e .
# For other chips, replace the requirements file:
# Hygon DCU: requirements/requirements_hygon.txt
# MUSA: requirements/requirements_musa.txt
# Iluvatar: requirements/requirements_iluvatar.txt
# MetaX: requirements/requirements_metax.txt
Note: Do NOT install vllm on non-NVIDIA platforms — it is NVIDIA-only.
Configure API Credentials#
Set up your LLM provider credentials:
# Anthropic Claude
export ANTHROPIC_API_KEY=your_key
# OpenAI / OpenAI-compatible
export OPENAI_API_KEY=your_key
export OPENAI_BASE_URL=http://your-endpoint/v1 # Optional, for custom endpoints
Install Claude Code CLI (for Agent Track)#
If you plan to use the Agent Track, install the Claude Code CLI:
npm install -g @anthropic-ai/claude-code
Verify Installation#
Test that KernelGenBench is installed correctly:
python -c "import kernelgenbench; print('KernelGenBench installed successfully')"
Run Your First Evaluation#
Quick Test (Single Operator)#
Test with a single Operator to verify everything works:
python scripts/generate_kernel_and_verify.py \
--op-name aten::add \
--single-test \
--server-type openai \
--model-name gpt-4o \
--max-rounds 3
Full Evaluation#
Run a complete evaluation on all operators:
# Full evaluation (210 operators)
python scripts/generate_kernel_and_verify.py \
--server-type openai \
--model-name gpt-4o \
--max-rounds 10
# Non-NVIDIA chips (ATen only, 110 operators)
python scripts/generate_kernel_and_verify.py \
--dataset KernelGenBench-aten \
--server-type openai \
--model-name gpt-4o \
--max-rounds 10
Datasets#
Dataset |
Operators |
Description |
|---|---|---|
|
210 |
Full set (ATen + vLLM + cuBLAS, NVIDIA) |
|
160 |
ATen + vLLM (NVIDIA, no cuBLAS) |
|
110 |
ATen operators only |
|
50 |
vLLM operators only (NVIDIA only) |
|
50 |
cuBLAS operators only (NVIDIA only) |
Note
On non-NVIDIA chips, the default dataset is automatically set to KernelGenBench-aten because vLLM and cuBLAS operators require NVIDIA GPUs.
Hardware Detection#
KernelGenBench automatically detects your hardware platform:
# Check detected device
python -c "from runtime import get_device_type; print(get_device_type())"
Troubleshooting#
Issue |
Solution |
|---|---|
|
Run |
CUDA out of memory |
Reduce |
API authentication errors |
Verify your API keys are set correctly |
vLLM installation conflicts on non-NVIDIA platforms |
Do not install vLLM; use vendor container images |
Next Steps#
Overview - Learn what KernelGenBench is
Features - Explore all features
LLM Track - Detailed LLM evaluation
Agent Track - Detailed Agent evaluation
FAQ - Frequently asked questions