FlagFFT User Guide#
Use the C API#
FlagFFT exposes a cuFFT-compatible C API in include/flagfft.h.
Create Plans#
flagfftPlan1d(plan, nx, type, batch)
flagfftPlan2d(plan, nx, ny, type)
flagfftPlan3d(plan, nx, ny, nz, type) // NOT_SUPPORTED
flagfftPlanMany(plan, rank, n, inembed, istride, idist,
onembed, ostride, odist, type, batch)
Execute Transforms#
// Complex-to-Complex (single & double precision)
flagfftExecC2C(plan, idata, odata, direction)
flagfftExecZ2Z(plan, idata, odata, direction)
// Real-to-Complex (forward)
flagfftExecR2C(plan, idata, odata)
flagfftExecD2Z(plan, idata, odata)
// Complex-to-Real (inverse)
flagfftExecC2R(plan, idata, odata)
flagfftExecZ2D(plan, idata, odata)
Manage Plans#
flagfftSetStream(plan, stream) // Attach a CUDA stream
flagfftDestroy(plan) // Free plan resources
flagfftGetPlanDescription(plan) // Human-readable plan summary
Data Types#
FlagFFT Type |
C Type |
Description |
|---|---|---|
|
|
Single-precision complex |
|
|
Double-precision complex |
|
|
Single-precision real |
|
|
Double-precision real |
Transform Types#
Type Constant |
Transform |
|---|---|
|
Complex ā Complex |
|
Double Complex ā Double Complex |
|
Real ā Complex |
|
Double Real ā Double Complex |
|
Complex ā Real |
|
Double Complex ā Double Real |
Supported Features#
Feature |
Status |
|---|---|
Rank-1 arbitrary-length C2C, Z2Z |
Cooley-Tukey + Bluestein/Rader |
Rank-1 arbitrary-length R2C, D2Z (forward) |
Supported |
Rank-1 arbitrary-length C2R, Z2D (inverse) |
Supported |
Rank-1 roundtrip (R2CāC2R, D2ZāZ2D) |
Supported |
Rank-2 contiguous row-major C2C, Z2Z |
RTRT decomposition |
Rank-2 contiguous row-major R2C, D2Z, C2R, Z2D |
Supported |
Batched transforms |
Supported |
In-place and out-of-place |
Supported |
CUDA stream attachment |
Supported |
Planned / Not Yet Supported#
Feature |
Status |
|---|---|
Rank-3 transforms ( |
Returns |
Rank-2 more exec algos |
RTRT only currently |
Use the Native CLI#
flagfft-cli is a native benchmark and verification tool. Build it with -DFLAGFFT_BUILD_CLI=ON.
Benchmark FFT Performance#
flagfft-cli bench [OPTIONS]
Option |
Default |
Description |
|---|---|---|
|
|
Transform rank: |
|
|
Transform type: |
|
required |
Transform size(s), comma-separated: |
|
|
Batch size |
|
|
|
|
|
|
|
|
Warmup iterations |
|
|
Measurement iterations |
|
ā |
Output results as JSON |
|
ā |
Print the execution plan decomposition path (use with |
Examples:
# Benchmark 1D C2C FFT of size 4096, batch 256
flagfft-cli bench --api c2c --shape 4096 --batch 256
# Benchmark 2D Z2Z FFT
flagfft-cli bench --rank 2 --api z2z --shape 256x256
# Compare multiple sizes with JSON output
flagfft-cli bench --api r2c --shape 1024,2048,4096,8192 --json
# Print the kernel execution plan
flagfft-cli bench --api c2c --shape 997 --print-path --json
Auto-Tune (planned)#
flagfft-cli tune [OPTIONS]
Currently a placeholder; exits with FLAGFFT_NOT_SUPPORTED.
Exit Codes#
Code |
Meaning |
|---|---|
|
Passed |
|
Failed / invalid arguments |
|
Runtime error |
|
Skipped / unsupported |
Run Tests#
FlagFFT has three layers of testing: a unified Python test runner, C++ unit tests (Google Test), and Python codegen tests (pytest).
Use the Unified Test Runner#
tools/run_tests.py is the primary entry point for running the full test suite. It orchestrates both accuracy tests (C++ ctest binaries comparing FlagFFT output against cuFFT) and performance benchmarks (flagfft-cli bench).
Usage#
python tools/run_tests.py [OPTIONS]
Flag |
Default |
Description |
|---|---|---|
|
ā |
Comma-separated operator IDs to test |
|
ā |
Path to file with one operator ID per line |
|
ā |
Skip operators lexicographically before this value |
|
|
Comma-separated stages to include ( |
|
|
Test combination: |
|
|
Comma-separated GPU IDs or |
|
|
Directory for summary and per-operator result files |
|
|
Path to CMake build directory |
|
ā |
Run only accuracy tests |
|
ā |
Run only performance (benchmark) tests |
|
|
Per-test subprocess timeout in seconds |
|
|
Benchmark warmup iterations |
|
|
Benchmark measurement iterations |
|
ā |
Save stdout/stderr of each test to log files |
|
|
Color mode: |
|
ā |
Verbose output |
Combination Presets#
Preset |
Description |
|---|---|
|
Quick smoke test ā Cooley-Tukey sizes, batch 1, scale 1.0 |
|
Quick smoke test ā Bluestein/Rader sizes, batch 1, scale 1.0 |
|
Full 1D ā all CT sizes Ć all batches Ć all scales |
|
Quick 2D ā selected 2D sizes, batch {1,4}, scale 1.0 |
|
Full 2D ā selected 2D sizes Ć all batches Ć all scales |
Examples#
# Quick smoke test (default)
python tools/run_tests.py
# Full test suite on GPU 0
python tools/run_tests.py --combination full --gpus 0
# Full suite across 4 GPUs
python tools/run_tests.py --combination full --gpus 0,1,2,3
# Accuracy only, specific operators
python tools/run_tests.py --combination full --ops c2c_1d,r2c_1d --accuracy-only
# Performance benchmarks only
python tools/run_tests.py --combination full --performance-only
Output#
Console: Real-time progress with per-GPU status
results/summary.jsonā Top-level summary withtimestamp,env,config,result, andsummarysectionsresults/{op_id}/accuracy_result.jsonā Per-operator accuracy detailsresults/{op_id}/performance_result.jsonā Per-operator benchmark details
Exit code is 0 if all accuracy tests passed, 1 if any failed.
Run C++ Tests#
Built with -DFLAGFFT_BUILD_TESTS=ON. Each test binary compares FlagFFT output against cuFFT using normwise relative error metrics (rel_l2, rel_linf).
Test Structure#
Test Pattern |
Coverage |
|---|---|
|
Plan lifecycle, error codes, unsupported API contracts |
|
Rank-2 C2C/Z2Z correctness |
|
C2C forward/inverse, Cooley-Tukey/Bluestein, single/multi-batch |
|
Double-precision complex |
|
Float real ā complex |
|
Double real ā complex |
|
Complex ā float real |
|
Double complex ā double real |
|
Real roundtrip validation |
|
Double real roundtrip |
Suffix key: s = single-batch, b = multi-batch; ct = Cooley-Tukey, bs = Bluestein/Rader.
Run Individual Tests#
# Run a specific test
./build/ctest/test_exec_c2c_fwd_ct_s
# With custom parameters
./build/ctest/test_exec_c2c_fwd_ct_s --nx 4096 --batch 64 --direction forward
# Run all ctest tests
cd build && ctest --output-on-failure
Each test binary accepts: --nx, --batch, --direction, --scale, --json-file.
Run Python Tests#
Tests for the flagfft_codegen Python package. Requires the package installed (pip install .).
# Run all Python tests
pytest tests/python/ -v
# Run only codegen-marked tests
pytest tests/python/ -v -m codegen
Tests cover codelet structure, kernel source generation, JIT CSV parsing, and Bluestein/reshape/R2C metadata. Tests that require Triton/TLE are automatically skipped when dependencies are unavailable.
Configure Tests#
The test parameter space is defined in conf/:
conf/operators.yamlā 14 operator definitions (1D/2D Ć C2C/Z2Z/R2C/D2Z/C2R/Z2D, plus roundtrip)conf/test_matrix.yamlā Parameter space: 11 smooth sizes (CT), 4 prime/composite sizes (Bluestein), 3 batch sizes, 3 scale factors, 6 combination rules