Features

Contents

Features#

PyTorch-Plugin-FL provides the following capabilities:

Automatic device registration#

Automatically registers FlagGems Triton operators as dispatch implementations for the flagos device. Once imported, all tensor operations on device="flagos" automatically use FlagGems Triton kernels without code changes.

Configurable backend routing#

Select FlagGems or native vendor backend (CUDA/MACA/Ascend) at per-operator granularity. The backends.conf configuration file controls which operators use which backend, with environment variable overrides for individual operators.

Multi-platform support#

Supports three hardware platforms:

Platform	Backend	Notes
NVIDIA CUDA	CUDA 12.8 + FlagGems Triton	Full FlagGems support
MACA (MetaX)	MACA cu-bridge + shim	Import `torch_fl` before `torch`
Huawei Ascend	ACL NN API	FlagGems disabled; native kernels only

Complete device management API#

Provides a full PyTorch-compatible device interface:

Stream management
Event synchronization
RNG state
AMP (Automatic Mixed Precision)
Device context management
Memory allocator (device and pinned)