Getting Started#

This guide helps you quickly set up KernelGenBench and run your first evaluation.

Prerequisites#

Before installing KernelGenBench, ensure you have:

  • Python 3.10 or higher

  • CUDA 11.0+ (for NVIDIA platforms)

  • API credentials for your chosen LLM provider

Installation#

NVIDIA#

git clone https://github.com/flagos-ai/KernelGenBench.git
cd KernelGenBench
pip install -r requirements/requirements_nvidia.txt
pip install -e .

vllm==0.13.0 will automatically install compatible versions of torch and triton.

Domestic Chips (Ascend / MUSA / Hygon / Iluvatar / MetaX)#

On domestic chips, torch and the chip-specific runtime (e.g., torch_npu, torch_musa) are pre-installed in the vendor container image. Use the vendor-provided Docker image to start a container, then install KernelGenBench inside it:

# Start the vendor container (example for Ascend NPU)
docker run -it --rm --network host \
    --device=/dev/davinci0 --device=/dev/davinci_manager \
    ascend/pytorch:latest bash

# Inside the container, clone and install
git clone https://github.com/flagos-ai/KernelGenBench.git
cd KernelGenBench
pip install -r requirements/requirements_ascend.txt
pip install -e .

# For other chips, replace the requirements file:
#   Hygon DCU:  requirements/requirements_hygon.txt
#   MUSA:       requirements/requirements_musa.txt
#   Iluvatar:   requirements/requirements_iluvatar.txt
#   MetaX:      requirements/requirements_metax.txt

Note: Do NOT install vllm on non-NVIDIA platforms — it is NVIDIA-only.

Configure API Credentials#

Set up your LLM provider credentials:

# Anthropic Claude
export ANTHROPIC_API_KEY=your_key

# OpenAI / OpenAI-compatible
export OPENAI_API_KEY=your_key
export OPENAI_BASE_URL=http://your-endpoint/v1  # Optional, for custom endpoints

Install Claude Code CLI (for Agent Track)#

If you plan to use the Agent Track, install the Claude Code CLI:

npm install -g @anthropic-ai/claude-code

Verify Installation#

Test that KernelGenBench is installed correctly:

python -c "import kernelgenbench; print('KernelGenBench installed successfully')"

Run Your First Evaluation#

Quick Test (Single Operator)#

Test with a single Operator to verify everything works:

python scripts/generate_kernel_and_verify.py \
    --op-name aten::add \
    --single-test \
    --server-type openai \
    --model-name gpt-4o \
    --max-rounds 3

Full Evaluation#

Run a complete evaluation on all operators:

# Full evaluation (210 operators)
python scripts/generate_kernel_and_verify.py \
    --server-type openai \
    --model-name gpt-4o \
    --max-rounds 10

# Non-NVIDIA chips (ATen only, 110 operators)
python scripts/generate_kernel_and_verify.py \
    --dataset KernelGenBench-aten \
    --server-type openai \
    --model-name gpt-4o \
    --max-rounds 10

Datasets#

Dataset

Operators

Description

KernelGenBench

210

Full set (ATen + vLLM + cuBLAS, NVIDIA)

KernelGenBench-nocublas

160

ATen + vLLM (NVIDIA, no cuBLAS)

KernelGenBench-aten

110

ATen operators only

KernelGenBench-vllm

50

vLLM operators only (NVIDIA only)

KernelGenBench-cublas

50

cuBLAS operators only (NVIDIA only)

Note

On non-NVIDIA chips, the default dataset is automatically set to KernelGenBench-aten because vLLM and cuBLAS operators require NVIDIA GPUs.

Hardware Detection#

KernelGenBench automatically detects your hardware platform:

# Check detected device
python -c "from runtime import get_device_type; print(get_device_type())"

Troubleshooting#

Issue

Solution

ModuleNotFoundError: No module named 'kernelgenbench'

Run pip install -e . in the project root

CUDA out of memory

Reduce --device-count or use smaller batch sizes

API authentication errors

Verify your API keys are set correctly

vLLM installation conflicts on non-NVIDIA platforms

Do not install vLLM; use vendor container images

Next Steps#