FlagTree 0.2.0 release#
Highlights#
FlagTree inherits capabilities from the previous version, continuously integrates new backends, expands support for Triton versions, and provides hardware-aware optimization capabilities. The project is currently in its early stages, aiming to be compatible with existing adaptation solutions for various AI chip backends, unify the code repository, build a code co-construction platform, and quickly implement multi-backend support in a single repository.
New features#
Added multi-backend Support
Currently supported backends include triton_shared cpu, iluvatar, xpu (klx), mthreads, metax, aipu(arm npu), ascend npu & cpu, tsingmicro, cambricon, with bold indicating newly added ones.
Each new backend maintains the capabilities of the previous version: cross-platform compilation and rapid verification, plugin-based high-differentiation modules, CI/CD, and quality management capabilities.
Jointly developing common extensions for the middleware layer with backend vendors, and open-sourcing standardized PyTorch backend extensions to support Triton / FlagTree practices.
Dual Compilation Path Support
Supports TritonGPU and Linalg compilation paths. Provides multiple integration paradigms for non-GPGPU backends, adds FLIR repository support for Linalg Dialect extensions and MLIR extensions for backend compilation.
Added support for Triton versions
Currently supported Triton versions include 3.0.x, 3.1.x, 3.2.x, 3.3.x, with bold indicating newly added ones.
Hardware-aware optimization support
Supports providing guided programming interfaces for backend-common or specific hardware features. Through compatible extensions, adding guidance information at the frontend to provide flexibility in operator writing and performance tuning.
Joint construction with FlagGems operator library
Collaborating with the FlagGems operator library to support related features in version adaptation, backend interfaces, registration mechanisms, and test modifications.
Looking ahead#
GPGPU backend code will be integrated, decoupling backend differentiation changes from TritonGPU; non-GPGPU backends will be horizontally integrated on the FLIR foundation, with unified design for common passes.
Providing Triton adaptation version upgrade guides for backend vendors: 3.0 -> 3.1 -> 3.2 -> 3.3.
CI/CD will add FlagGems operator library functional testing.
Integrating C++ Runtime functionality to reduce runtime overhead outside of kernels to be on par with CUDA.