Glossary#
This section defines technical terminology used throughout the KernelGenBench documentation.
- Agent
A coding agent that autonomously generates, executes, and iterates on code based on feedback. In KernelGenBench, agents like Claude Code and OpenCode can debug and optimize kernels through execution-driven reinforcement.
- ATen
PyTorch’s native tensor library, providing fundamental operations for deep learning. KernelGenBench includes 110 ATen operators derived from real model training traces.
- CUDA
NVIDIA’s proprietary parallel computing platform and programming model for GPU acceleration. CUDA is deeply tied to NVIDIA hardware architecture.
- cuBLAS
NVIDIA’s closed-source Basic Linear Algebra Subprograms library, highly optimized for NVIDIA GPUs. KernelGenBench includes 50 cuBLAS operators representing extreme performance challenges.
- GEMM
General Matrix Multiplication, a fundamental linear algebra operation. cuBLAS includes numerous GEMM variants across different precisions and batching modes.
- Kernel
A function that executes on a GPU, written in CUDA or Triton. Kernels directly determine computational performance and must be optimized for specific hardware.
- KernelGenBench
A comprehensive benchmark framework for evaluating LLM and agent-based Triton kernel generation across multiple hardware platforms. Part of the FlagOS ecosystem.
- KernelGenBench-aten
A dataset subset containing 110 PyTorch ATen operators, used for cross-platform evaluation on all supported hardware.
- KernelGenBench-cublas
A dataset subset containing 50 cuBLAS operators, available only on NVIDIA platforms due to library dependencies.
- KernelGenBench-nocublas
A dataset subset containing 160 operators (ATen + vLLM), used for NVIDIA evaluation without cuBLAS dependency.
- KernelGenBench-MS
The Multi-Source sub-benchmark evaluating 210 operators from three sources (ATen, vLLM, cuBLAS) on NVIDIA hardware.
- KernelGenBench-MC
The Multi-Chip sub-benchmark evaluating 110 ATen operators across six hardware platforms to measure performance portability.
- KernelGenBench-vllm
A dataset subset containing 50 vLLM operators, available only on NVIDIA platforms.
- LLM
Large Language Model, an AI model trained on vast amounts of text data. In KernelGenBench, LLMs are evaluated on their ability to generate GPU kernels.
- Operator
A reusable computational unit in deep learning frameworks. Operators define “what” to compute (e.g.,
torch.add), while kernels define “how” to execute on hardware.
- Pass@K
An evaluation metric measuring whether at least one correct solution exists among K generated samples. Pass@1 tests single-generation capability; Pass@5 allows multiple attempts.
- PagedAttention
A memory-efficient attention mechanism used in vLLM for LLM inference. Part of the vLLM operator subset in KernelGenBench.
- Speedup
Performance improvement ratio of generated kernel versus baseline implementation. Calculated as geometric mean across test cases and operators.
- Triton
An open-source programming language for GPU kernels that abstracts low-level details while maintaining high performance. Triton code is portable across different GPU architectures.
- vLLM
A high-throughput LLM inference engine with custom CUDA kernels. KernelGenBench includes 50 vLLM operators representing production inference workloads.
Acronyms#
Acronym |
Full Name |
|---|---|
AST |
Abstract Syntax Tree |
ATen |
A Tensor Library |
BLAS |
Basic Linear Algebra Subprograms |
CUDA |
Compute Unified Device Architecture |
DCU |
Data Center Accelerator |
GEMM |
General Matrix Multiplication |
GPU |
Graphics Processing Unit |
LLM |
Large Language Model |
MUSA |
Moore Threads Unified System Architecture |
NPU |
Neural Processing Unit |
Hardware Platforms#
Platform |
Vendor |
Description |
|---|---|---|
NVIDIA |
NVIDIA |
A100 GPUs, primary evaluation baseline |
Ascend |
Huawei |
Neural Processing Units |
MUSA |
Moore Threads |
GPU architecture |
Hygon |
Hygon |
Data Center Accelerators |
Iluvatar |
Iluvatar |
AI accelerators |
MetaX |
MUXI |
GPU accelerators |