Glossary

Glossary#

This section defines technical terminology used throughout the KernelGenBench documentation.

Agent: A coding agent that autonomously generates, executes, and iterates on code based on feedback. In KernelGenBench, agents like Claude Code and OpenCode can debug and optimize kernels through execution-driven reinforcement.

ATen: PyTorch’s native tensor library, providing fundamental operations for deep learning. KernelGenBench includes 110 ATen operators derived from real model training traces.

CUDA: NVIDIA’s proprietary parallel computing platform and programming model for GPU acceleration. CUDA is deeply tied to NVIDIA hardware architecture.

cuBLAS: NVIDIA’s closed-source Basic Linear Algebra Subprograms library, highly optimized for NVIDIA GPUs. KernelGenBench includes 50 cuBLAS operators representing extreme performance challenges.

GEMM: General Matrix Multiplication, a fundamental linear algebra operation. cuBLAS includes numerous GEMM variants across different precisions and batching modes.

Kernel: A function that executes on a GPU, written in CUDA or Triton. Kernels directly determine computational performance and must be optimized for specific hardware.

KernelGenBench: A comprehensive benchmark framework for evaluating LLM and agent-based Triton kernel generation across multiple hardware platforms. Part of the FlagOS ecosystem.

KernelGenBench-aten: A dataset subset containing 110 PyTorch ATen operators, used for cross-platform evaluation on all supported hardware.

KernelGenBench-cublas: A dataset subset containing 50 cuBLAS operators, available only on NVIDIA platforms due to library dependencies.

KernelGenBench-nocublas: A dataset subset containing 160 operators (ATen + vLLM), used for NVIDIA evaluation without cuBLAS dependency.

KernelGenBench-MS: The Multi-Source sub-benchmark evaluating 210 operators from three sources (ATen, vLLM, cuBLAS) on NVIDIA hardware.

KernelGenBench-MC: The Multi-Chip sub-benchmark evaluating 110 ATen operators across six hardware platforms to measure performance portability.

KernelGenBench-vllm: A dataset subset containing 50 vLLM operators, available only on NVIDIA platforms.

LLM: Large Language Model, an AI model trained on vast amounts of text data. In KernelGenBench, LLMs are evaluated on their ability to generate GPU kernels.

Operator: A reusable computational unit in deep learning frameworks. Operators define “what” to compute (e.g., torch.add), while kernels define “how” to execute on hardware.

Pass@K: An evaluation metric measuring whether at least one correct solution exists among K generated samples. Pass@1 tests single-generation capability; Pass@5 allows multiple attempts.

PagedAttention: A memory-efficient attention mechanism used in vLLM for LLM inference. Part of the vLLM operator subset in KernelGenBench.

Speedup: Performance improvement ratio of generated kernel versus baseline implementation. Calculated as geometric mean across test cases and operators.

Triton: An open-source programming language for GPU kernels that abstracts low-level details while maintaining high performance. Triton code is portable across different GPU architectures.

vLLM: A high-throughput LLM inference engine with custom CUDA kernels. KernelGenBench includes 50 vLLM operators representing production inference workloads.

Acronym	Full Name
AST	Abstract Syntax Tree
ATen	A Tensor Library
BLAS	Basic Linear Algebra Subprograms
CUDA	Compute Unified Device Architecture
DCU	Data Center Accelerator
GEMM	General Matrix Multiplication
GPU	Graphics Processing Unit
LLM	Large Language Model
MUSA	Moore Threads Unified System Architecture
NPU	Neural Processing Unit

Platform	Vendor	Description
NVIDIA	NVIDIA	A100 GPUs, primary evaluation baseline
Ascend	Huawei	Neural Processing Units
MUSA	Moore Threads	GPU architecture
Hygon	Hygon	Data Center Accelerators
Iluvatar	Iluvatar	AI accelerators
MetaX	MUXI	GPU accelerators