Features#
Unified multi-chip backend
Leverages FlagGems (unified operator library) and FlagCX (unified communication library) to provide chip-agnostic inference capabilities. The same model can run on different hardware without code modifications.
Flexible operator dispatch
Implements a priority-based dispatch system that selects between FlagGems, vendor-specific, and PyTorch reference implementations. Operators can be configured per-backend with automatic fallback on failure.
Platform auto-detection
Automatically detects hardware and loads platform-specific configuration. Supports NVIDIA GPU, Ascend NPU, T-head-Zhenwu, Iluvatar, MetaX, Moore Threads, Tsingmicro, and Hygon chips.
Extensible vendor backend
Supports built-in vendor backends (CUDA, Ascend), external plugin packages via setuptools entry points, and environment-based plugin modules.