Features#
FlagGems-vLLM provides the following key features:
Operators have undergone deep performance tuning — Each operator is carefully optimized for throughput and latency across multiple hardware backends.
Triton kernel call optimization — Kernel launch overhead is minimized through specialized Triton kernel patterns and autotuning.
Flexible multi-backend support mechanism — The library supports a variety of GPU hardware platforms, allowing operators to run efficiently regardless of the underlying device.
Support for common vLLM operators — Includes optimized implementations of operators frequently used in vLLM inference, such as
moe_align_block_size,grouped_topk,fused_moe,flash_mla, and more.
Relationship with FlagGems and vllm-plugin-fl#
The three repositories are used together but have different responsibilities:
FlagGems: the general-purpose FlagGems operator library. It provides common PyTorch/Triton operator replacements and exposes
flag_gems.enable()/flag_gems.use_gems()to register operators into PyTorch dispatch.FlagGems-vllm: this repository. It contains vLLM-scenario operator implementations and tests/benchmarks that are aligned with the corresponding FlagGems implementations where the same operator exists. It exposes operators through the
flaggems_vllmPython package, for exampleflaggems_vllm.grouped_topk,flaggems_vllm.fused_experts_impl, andflaggems_vllm.moe_align_block_size.vllm-plugin-fl: the vLLM plugin layer. It uses FlagGems as the global operator backend by importing FlagGems and calling
flag_gems.enable(). For vLLM-specific fused kernels that are not enabled through PyTorch dispatch, it explicitly imports and calls operators from FlagGems-vllm.
In a typical vLLM plugin environment, the call flow is:
vLLM
-> vllm-plugin-fl
-> flag_gems.enable() for general FlagGems operator registration
-> flaggems_vllm.<operator>() for vLLM-specific fused operators
This means FlagGems and FlagGems-vllm are complementary: FlagGems provides the common operator backend, while FlagGems-vllm provides vLLM-oriented kernels and compatibility tests/benchmarks used by vllm-plugin-fl.