Multi-Chip Support

Multi-Chip Support#

KernelGenBench supports six hardware platforms with automatic device detection and unified execution pipeline.

Platform	Description	Notes
NVIDIA	A100 GPUs	Primary baseline
Ascend NPU	Huawei AI accelerators	—
MUSA	Moore Threads GPUs	—
Hygon DCU	Hygon data center accelerators	—
Iluvatar	Iluvatar AI chips	—
MetaX	MUXI accelerators	—

Device type is automatically detected at runtime:

# Check detected device
python -c "from runtime import get_device_type; print(get_device_type())"

All platforms use the same commands — the framework handles device differences automatically:

# Same command works on all platforms
python scripts/generate_kernel_and_verify.py \
    --server-type openai \
    --model-name gpt-4o

Platform	Default Dataset
NVIDIA	KernelGenBench (210 operators)
Others	KernelGenBench-aten (110 operators)

On non-NVIDIA platforms, vLLM and cuBLAS operators are unavailable.

L3 profiling is NVIDIA-only due to tool availability.

Numerical tolerances are automatically adjusted per platform to account for different floating-point implementations.

Non-NVIDIA platforms have:

The framework injects platform-specific code templates: