What is KernelGenBench?#
KernelGenBench is a benchmark framework for evaluating LLM and agent-based Triton kernel generation across multiple hardware platforms. It is a component of FlagOS — a unified, open-source AI system software stack.

Purpose#
KernelGenBench provides a standardized way to measure how effectively AI models can generate GPU kernel code. The generated Triton kernels serve as drop-in replacements for production use, enabling direct evaluation of real-world applicability.
Problem Addressed#
The benchmark addresses a critical gap in the AI ecosystem: while LLM show promise in automating kernel development, there was no comprehensive way to evaluate their effectiveness across diverse Operator sources and heterogeneous hardware platforms.
Components#
KernelGenBench consists of two complementary sub-benchmarks:
Sub-benchmark |
Description |
|---|---|
Multi-Source evaluation with 210 operators |
|
Multi-Chip evaluation across 6 hardware platforms |