Tests#
Performance Test#
Performance tests are maintained in test/perf.
cd test/perf
make [USE_NVIDIA | USE_ILUVATAR_COREX | USE_CAMBRICON | USE_METAX | USE_MUSA | USE_KUNLUNXIN | USE_DU | USE_ASCEND | USE_AMD | USE_TSM | USE_ENFLAME]=1
mpirun --allow-run-as-root -np 8 ./test_allreduce -b 128K -e 4G -f 2
Note that the default MPI install path is set to /usr/local/mpi, you may specify the MPI path with:
make MPI_HOME=<MPI path>
All tests support the same set of arguments:
Sizes to scan
-b <min>minimum size in bytes to start with. Default: 1M.-e <max>maximum size in bytes to end at. Default: 1G.-f <increment factor>multiplication factor between sizes. Default: 2.
Performance
-w, <warmup iterations >number of warmup iterations (not timed). Default: 5.-n, <iterations >number of iterations. Default: 20.
Test operation
-R, <0/1/2>enable local buffer registration on send/recv buffers. Default: 0.-s, <OCT/DEC/HEX>specify MPI communication split mode. Default: 0
Utils
-p, <0/1>print buffer info. Default: 0.-hprint help message. Default: disabled.
Device API Test#
Device API tests are maintained in test/kernel/. There are four test binaries:
Binary |
What it tests |
|---|---|
|
Intra-node AllReduce via Device API. Correctness + bandwidth benchmarking. |
|
Inter-node two-sided AlltoAll (FIFO-based; Window-based with |
|
Inter-node one-sided AlltoAll (put+signal+wait pattern). Requires |
|
Correctness suite for 10 one-sided Device API kernels. Requires |
Build:
cd test/kernel
make USE_NVIDIA=1 # or other backend flag
Supports MPI_HOME=<path>.
Run examples:
# Intra-node AllReduce (single node, 8 GPUs)
mpirun --allow-run-as-root -np 8 -x FLAGCX_USE_HETERO_COMM=1 -x FLAGCX_MEM_ENABLE=1 ./test_intranode -b 1M -e 64M -f 2
# Inter-node two-sided AlltoAll (multi-node)
mpirun --allow-run-as-root -np 16 -x FLAGCX_USE_HETERO_COMM=1 -x FLAGCX_MEM_ENABLE=1 ./test_internode_twosided -b 1M -e 64M -f 2 -R 1
# Inter-node one-sided AlltoAll (requires -R 1 or -R 2)
mpirun --allow-run-as-root -np 16 -x FLAGCX_USE_HETERO_COMM=1 -x FLAGCX_MEM_ENABLE=1 ./test_internode_onesided -b 1M -e 64M -f 2 -R 2
# Device API correctness test (requires -R 1 or -R 2)
mpirun --allow-run-as-root -np 16 -x FLAGCX_USE_HETERO_COMM=1 -x FLAGCX_MEM_ENABLE=1 ./test_device_api -b 1M -e 64M -f 2 -R 2
Arguments are the same as Performance Test (-b, -e, -f, -w, -n, -R, -p, -s).
Registration modes (-R):
-R 0: Raw device memory (default). No explicit registration.-R 1: IPC mode โflagcxMemAlloc+flagcxCommRegister.-R 2: Window mode โflagcxMemAlloc+flagcxCommWindowRegister.
One-sided tests (test_internode_onesided, test_device_api) require -R 1 or -R 2.