FlagCX Environment Variables#
This document provides a comprehensive reference for all environment variables used in FlagCX.
Table of Contents#
Debug and Logging#
Variable |
Default |
Description |
|---|---|---|
|
None |
Set debug logging level. Values: VERSION, WARN, INFO, ABORT, TRACE |
|
INIT,ENV |
Comma-separated list of debug subsystems. Prefix with ^ to invert. Values: INIT, COLL, P2P, SHM, NET, GRAPH, TUNING, ENV, ALLOC, CALL, PROXY, NVLS, BOOTSTRAP, REG, ALL |
|
stdout |
Output file for debug logs. Supports %h (hostname), %p (pid) placeholders |
|
0 |
When set to 1, enables extended debug info on warnings |
|
0 |
When set to 1, enables setting thread names for debugging |
Communication Mode#
Variable |
Default |
Description |
|---|---|---|
|
0 |
When set to 1, uses host communication mode |
|
0 |
When set to 1, enables uniRunner mode |
|
None |
Specifies the communication ID for bootstrap. When set, rank 0 will create the root |
|
None |
Override the host identifier string for host hashing |
|
None |
Comma-separated list of cluster split counts (e.g., 2,4,8), enabling hybridRunner mode |
Buffer and Memory#
Variable |
Default |
Description |
|---|---|---|
|
67108864 (64MB) |
Network buffer size in bytes |
|
4194304 (4MB) |
Network chunk size in bytes |
|
67108864 (64MB) |
P2P buffer size in bytes |
|
16777216 (16MB) |
P2P chunk size in bytes |
|
32 |
Capacity of semaphore buffer pool |
|
128 |
Kernel FIFO capacity |
|
128 |
Reduce operation FIFO capacity |
|
0 |
When set to 1, enables memory allocation via device adaptor |
|
0 |
When set to 1, enables DMA-BUF support for memory registration |
Proxy and Runtime#
Variable |
Default |
Description |
|---|---|---|
|
0 |
When set to 1, enables runtime proxy mode |
|
8 |
Frequency of append operation in progress loop |
|
0 |
When set to 1, disables P2P transport |
|
0 |
When set to 1, disables P2P scheduling optimization |
|
None |
Path to device function library for async kernel loading |
Topology Configuration#
Variable |
Default |
Description |
|---|---|---|
|
None |
Path to XML topology file for network/GPU topology |
|
None |
Path to dump discovered topology as XML |
|
None |
Path to inter-server routing configuration file |
|
0 |
When set to 1, disables topology detection |
Tuner Configuration#
Variable |
Default |
Description |
|---|---|---|
|
0 |
When set to 1, enables the internal tuner |
|
None |
Specifies a communicator tag to use from config list |
|
5 |
Number of loops for tuner search (minimum 5) |
|
None |
Current tuner configuration ID (for FlagScale tuning) |
|
None |
Best tuner configuration ID (for FlagScale tuning) |
|
None |
Set to 1 when tuning is complete (set by system) |
|
None |
Path to tune file |
|
None |
Tune group index |
|
0 |
When set to 1, enables tuning with FlagScale |
|
0 |
When set to 1, uses a single communicator for tuning (note: no FLAGCX_ prefix) |
HybridRunner Configuration#
Variable |
Default |
Description |
|---|---|---|
|
Sequential |
C2C algorithm selection. Values: RING_PIPELINED, XML_INPUT |
|
None |
Directory path to export algorithm XML files |
|
None |
Prefix for exported algorithm XML files |
|
None |
Directory path to import algorithm XML files |
|
None |
Prefix for imported algorithm XML files |
|
None |
Granularity for C2C algorithm search |
UniRunner Configuration#
Variable |
Default |
Description |
|---|---|---|
|
1024 |
Size of P2P event pool |
|
1 |
Number of slices for uniRunner |
|
32 |
Number of threads per block for uniRunner |
|
1 |
Number of blocks for uniRunner |
|
0 |
When set to 1, uses local reduction in uniRunner |
|
0 |
When set to 1, uses ring allgather in uniRunner |
|
0 |
When set to 1, uses sliced allreduce in uniRunner |
|
0 |
Number of reduction slices for uniRunner (0 = auto) |
|
65536 |
Reduction slice size in bytes for uniRunner |
Network Configuration#
InfiniBand (IB) Settings#
Variable |
Default |
Description |
|---|---|---|
|
0 |
When set to 1, disables InfiniBand |
|
None |
Specifies which IB HCA devices to use |
|
-1 |
GID index for RoCE. -1 means auto-detect |
|
2 |
RoCE version number to use |
|
18 |
IB timeout value (exponential, actual timeout = 4.096us * 2^value) |
|
7 |
Number of IB retry attempts |
|
0 |
IB partition key index |
|
0 |
When set to 1, enables inline data for small messages |
|
0 |
IB Service Level |
|
0 |
IB Traffic Class |
|
8192 |
Threshold (bytes) above which adaptive routing is enabled |
|
2 |
PCI relaxed ordering mode. 0=off, 1=on, 2=auto |
|
-2 |
Adaptive routing setting. -2=auto, -1=off, 0+=on with value |
|
1 |
When set to 1, merges Virtual Functions |
|
1 |
When set to 1, merges multiple NICs into one logical device |
|
1 |
Number of Queue Pairs per connection |
|
0 |
When set to 1, splits data across QPs |
|
None |
Address family for IB. Values: AF_IB, AF_INET, AF_INET6 |
|
None |
IP address range for IB connections |
|
0 |
When set to 1, disables GDR flush operations |
|
0 |
When set to 1, splits data across QPs for IBUC |
IB Retransmission#
Variable |
Default |
Description |
|---|---|---|
|
0 |
When set to 1, enables software retransmission |
|
5000 |
Minimum RTO timeout in microseconds |
|
10 |
Maximum number of retransmission retries |
|
16 |
ACK interval for retransmission |
|
16 |
Maximum outstanding requests |
Socket Network#
Variable |
Default |
Description |
|---|---|---|
|
Auto |
Socket address family. Values: AF_INET (IPv4), AF_INET6 (IPv6) |
|
Auto |
Network interface name(s) to use. Prefix with ^ to exclude, = for exact match |
|
-2 |
Number of sockets per thread (-2=auto) |
|
-2 |
Number of socket threads (-2=auto) |
|
0 |
When set to 1, forces socket network instead of IB |
UCX Network#
Variable |
Default |
Description |
|---|---|---|
|
0 |
When set to 1, disables UCX network |
|
None |
UCX transport layers to use. Falls back to UCX_TLS if not set |
|
1 |
When set to 1, disables UCX CUDA support |
Gloo Network#
Variable |
Default |
Description |
|---|---|---|
|
0 |
When set to 1, disables IB for Gloo transport |
Plugin Configuration#
Variable |
Default |
Description |
|---|---|---|
|
None |
Path to device adaptor plugin shared library |
|
None |
Path to network adaptor plugin shared library |
|
None |
Path to CCL adaptor plugin shared library |
Miscellaneous#
Variable |
Default |
Description |
|---|---|---|
|
0 |
(Commented out) When set to 1, ignores CPU affinity |
Notes#
Boolean variables generally use 0 for false/disabled and 1 for true/enabled
Variables with default -2 typically indicate “auto-detect” behavior
The FLAGCX_ prefix is automatically added to variable names when using the FLAGCX_PARAM macro
Some variables may only take effect at initialization time
Debug logging can significantly impact performance; use with caution in production