Application integration

Application integration#

FlagCX integrates with upper-layer applications such as PyTorch and PaddlePaddle. The table below lists the frameworks supported by FlagCX and their related communication operations, where the batch_XXX and XXX_coalesced ops refer to the usage of group primitives.

Framework

PyTorch

PaddlePaddle

send

โœ“

โœ“

recv

โœ“

โœ“

all_gather

โœ“

โœ“

all_gather_into_tensor_coalesced

โœ“ (in order, no aggregation)

โ˜“

all_reduce

โœ“

โœ“

all_reduce_coalesced

โœ“ (in order, no aggregation)

โ˜“

all_to_all

โœ“

โœ“

all_to_all_single

โœ“

โœ“

barrier

โœ“

โœ“

batch_isend_irecv

โœ“

โœ“

broadcast

โœ“

โœ“

gather

โœ“

โœ“

reduce

โœ“

โœ“

reduce_scatter

โœ“

โœ“

reduce_scatter_tensor_coalesced

โœ“ (in order, no aggregation)

โ˜“

scatter

โœ“

โœ“

Note that PyTorch support is enabled via the FlagCX Torch plugin, which provides native integration with the PyTorch distributed backend. This plugin has undergone comprehensive validation across diverse communication backends and hardware platforms, ensuring robust functionality, consistent performance, and compatibility in multi-chip heterogeneous environments.

FlagCX Backend

NCCL

IXCCL

CNCL

MCCL

XCCL

DUCCL

HCCL

MUSACCL

RCCL

TCCL

ECCL

PyTorch Support

โœ“

โœ“

โœ“

โœ“

โœ“

โœ“

โœ“

โœ“

โœ“

โœ“

โœ“

Tip

To enable heterogeneous cross-chip communication using the PyTorch DDP FlagCX backend, it is recommended to use identical PyTorch versions across all nodes. Mismatched versions may lead to initialization failures during process group setup.