快速开始

快速开始#

本指南帮助您快速安装 KernelGenBench 并运行您的第一次评估。

前提条件#

在安装 KernelGenBench 之前，请确保您具备：

Python 3.10 或更高版本
CUDA 11.0+（适用于 NVIDIA 平台）
您选择的 LLM 提供商的 API 凭证

安装#

NVIDIA#

git clone https://github.com/flagos-ai/KernelGenBench.git
cd KernelGenBench
pip install -r requirements/requirements_nvidia.txt
pip install -e .

vllm==0.13.0 将自动安装兼容版本的 torch 和 triton。

国产芯片（昇腾 / MUSA / 海光 / 天数智芯 / 摩尔线程）#

在国产芯片上，torch 和芯片特定的运行时（如 torch_npu、torch_musa）已预先安装在厂商容器镜像中。使用厂商提供的 Docker 镜像启动容器，然后在其中安装 KernelGenBench：

# Start the vendor container (example for Ascend NPU)
docker run -it --rm --network host \
    --device=/dev/davinci0 --device=/dev/davinci_manager \
    ascend/pytorch:latest bash

# Inside the container, clone and install
git clone https://github.com/flagos-ai/KernelGenBench.git
cd KernelGenBench
pip install -r requirements/requirements_ascend.txt
pip install -e .

# For other chips, replace the requirements file:
#   Hygon DCU:  requirements/requirements_hygon.txt
#   MUSA:       requirements/requirements_musa.txt
#   Iluvatar:   requirements/requirements_iluvatar.txt
#   MetaX:      requirements/requirements_metax.txt

注意：不要在非 NVIDIA 平台上安装 vllm —— 它仅适用于 NVIDIA。

配置 API 凭证#

设置您的 LLM 提供商凭证：

# Anthropic Claude
export ANTHROPIC_API_KEY=your_key

# OpenAI / OpenAI-compatible
export OPENAI_API_KEY=your_key
export OPENAI_BASE_URL=http://your-endpoint/v1  # Optional, for custom endpoints

Install Claude Code CLI (for Agent Track)#

如果您计划使用智能体赛道，请安装 Claude Code CLI：

npm install -g @anthropic-ai/claude-code

验证安装#

测试 KernelGenBench 是否正确安装：

python -c "import kernelgenbench; print('KernelGenBench installed successfully')"

运行您的第一次评估#

快速测试（单个算子）#

使用单个算子进行测试以验证一切正常：

python scripts/generate_kernel_and_verify.py \
    --op-name aten::add \
    --single-test \
    --server-type openai \
    --model-name gpt-4o \
    --max-rounds 3

完整评估#

对所有算子运行完整评估：

# Full evaluation (210 operators)
python scripts/generate_kernel_and_verify.py \
    --server-type openai \
    --model-name gpt-4o \
    --max-rounds 10

# Non-NVIDIA chips (ATen only, 110 operators)
python scripts/generate_kernel_and_verify.py \
    --dataset KernelGenBench-aten \
    --server-type openai \
    --model-name gpt-4o \
    --max-rounds 10

数据集#

数据集	算子	描述
`KernelGenBench`	210	完整集合 (ATen + vLLM + cuBLAS, NVIDIA)
`KernelGenBench-nocublas`	160	ATen + vLLM (NVIDIA, no cuBLAS)
`KernelGenBench-aten`	110	仅 ATen 算子
`KernelGenBench-vllm`	50	仅 vLLM 算子 (仅限 NVIDIA)
`KernelGenBench-cublas`	50	仅 cuBLAS 算子 (仅限 NVIDIA)

备注

在非 NVIDIA 芯片上，默认数据集自动设置为 KernelGenBench-aten，因为 vLLM 和 cuBLAS 算子需要 NVIDIA GPU。

硬件检测#

KernelGenBench 自动检测您的硬件平台：

# Check detected device
python -c "from runtime import get_device_type; print(get_device_type())"

故障排除#

问题	解决方案
`ModuleNotFoundError: No module named 'kernelgenbench'`	在项目根目录运行 `pip install -e .`
CUDA 内存不足	减少 `--device-count` 或使用更小的批次大小
API 认证错误	验证您的 API 密钥设置正确
非 NVIDIA 平台上的 vLLM 安装冲突	不要安装 vLLM；使用厂商容器镜像

下一步#

概述 - 了解 KernelGenBench 是什么
功能 - 探索所有功能
LLM 赛道 - 详细的 LLM 评估
智能体赛道 - 详细的智能体评估
常见问题 - 常见问题解答