多芯片支持

多芯片支持#

KernelGenBench 支持六种硬件平台，具有自动设备检测和统一的执行流水线。

设备类型在运行时自动检测：

# Check detected device
python -c "from runtime import get_device_type; print(get_device_type())"

所有平台使用相同的命令——框架自动处理设备差异：

# Same command works on all platforms
python scripts/generate_kernel_and_verify.py \
    --server-type openai \
    --model-name gpt-4o

平台	默认数据集
NVIDIA	KernelGenBench（210 个算子）
其他	KernelGenBench-aten（110 个算子）

在非 NVIDIA 平台上，vLLM 和 cuBLAS 算子不可用。

L3 分析仅支持 NVIDIA，受限于工具可用性。

数值容差根据平台自动调整，以适应不同的浮点实现。

非 NVIDIA 平台存在：

框架注入平台特定的代码模板：