FlagTree 0.2.0 release

FlagTree 0.2.0 release#

Highlights#

FlagTree inherits capabilities from the previous version, continuously integrates new backends, expands support for Triton versions, and provides hardware-aware optimization capabilities. The project is currently in its early stages, aiming to be compatible with existing adaptation solutions for various AI chip backends, unify the code repository, build a code co-construction platform, and quickly implement multi-backend support in a single repository.

New features#

  • Added multi-backend Support

Currently supported backends include triton_shared cpu, iluvatar, xpu (klx), mthreads, metax, aipu(arm npu), ascend npu & cpu, tsingmicro, cambricon, with bold indicating newly added ones.
Each new backend maintains the capabilities of the previous version: cross-platform compilation and rapid verification, plugin-based high-differentiation modules, CI/CD, and quality management capabilities.
Jointly developing common extensions for the middleware layer with backend vendors, and open-sourcing standardized PyTorch backend extensions to support Triton / FlagTree practices.

  • Dual Compilation Path Support

Supports TritonGPU and Linalg compilation paths. Provides multiple integration paradigms for non-GPGPU backends, adds FLIR repository support for Linalg Dialect extensions and MLIR extensions for backend compilation.

  • Added support for Triton versions

Currently supported Triton versions include 3.0.x, 3.1.x, 3.2.x, 3.3.x, with bold indicating newly added ones.

  • Hardware-aware optimization support

Supports providing guided programming interfaces for backend-common or specific hardware features. Through compatible extensions, adding guidance information at the frontend to provide flexibility in operator writing and performance tuning.

  • Joint construction with FlagGems operator library

Collaborating with the FlagGems operator library to support related features in version adaptation, backend interfaces, registration mechanisms, and test modifications.

Looking ahead#

GPGPU backend code will be integrated, decoupling backend differentiation changes from TritonGPU; non-GPGPU backends will be horizontally integrated on the FLIR foundation, with unified design for common passes.
Providing Triton adaptation version upgrade guides for backend vendors: 3.0 -> 3.1 -> 3.2 -> 3.3.
CI/CD will add FlagGems operator library functional testing.
Integrating C++ Runtime functionality to reduce runtime overhead outside of kernels to be on par with CUDA.