Features#
PyTorch-Plugin-FL provides the following capabilities:
Automatic device registration#
Automatically registers FlagGems Triton operators as dispatch implementations for the flagos device. Once imported, all tensor operations on device="flagos" automatically use FlagGems Triton kernels without code changes.
Configurable backend routing#
Select FlagGems or native vendor backend (CUDA/MACA/Ascend) at per-operator granularity. The backends.conf configuration file controls which operators use which backend, with environment variable overrides for individual operators.
Multi-platform support#
Supports three hardware platforms:
Platform |
Backend |
Notes |
|---|---|---|
NVIDIA CUDA |
CUDA 12.8 + FlagGems Triton |
Full FlagGems support |
MACA (MetaX) |
MACA cu-bridge + shim |
Import |
Huawei Ascend |
ACL NN API |
FlagGems disabled; native kernels only |
Complete device management API#
Provides a full PyTorch-compatible device interface:
Stream management
Event synchronization
RNG state
AMP (Automatic Mixed Precision)
Device context management
Memory allocator (device and pinned)