Multi-Chip Support#
KernelGenBench supports six hardware platforms with automatic device detection and unified execution pipeline.
Supported Platforms#
Platform |
Description |
Notes |
|---|---|---|
NVIDIA |
A100 GPUs |
Primary baseline |
Ascend NPU |
Huawei AI accelerators |
— |
MUSA |
Moore Threads GPUs |
— |
Hygon DCU |
Hygon data center accelerators |
— |
Iluvatar |
Iluvatar AI chips |
— |
MetaX |
MUXI accelerators |
— |
Auto-Detection#
Device type is automatically detected at runtime:
# Check detected device
python -c "from runtime import get_device_type; print(get_device_type())"
Unified Commands#
All platforms use the same commands — the framework handles device differences automatically:
# Same command works on all platforms
python scripts/generate_kernel_and_verify.py \
--server-type openai \
--model-name gpt-4o
Platform-Specific Behavior#
Dataset Selection#
Platform |
Default Dataset |
|---|---|
NVIDIA |
KernelGenBench (210 operators) |
Others |
KernelGenBench-aten (110 operators) |
On non-NVIDIA platforms, vLLM and cuBLAS operators are unavailable.
Anti-Hack Layers#
Layer |
NVIDIA |
Non-NVIDIA |
|---|---|---|
L1: AST Static Scan |
✓ |
✓ |
L2: Ghost Replay |
✓ |
✓ |
L3: Hardware Profiling |
✓ |
✗ |
L3 profiling is NVIDIA-only due to tool availability.
Tolerance Settings#
Numerical tolerances are automatically adjusted per platform to account for different floating-point implementations.
Cross-Platform Challenges#
Compiler Maturity#
Non-NVIDIA platforms have:
Less mature Triton compilers
Incomplete backend support
Different memory models
Performance Impact#
Non-NVIDIA platforms require ~2× more tokens and time
Cross-platform degradation can be severe
Platform-specific optimizations needed
Hardware-Specific Templates#
The framework injects platform-specific code templates:
Import statements
Runtime configurations
Memory constraints
Device-specific constants