FlagTensor User Guide#
Use FlagTensor#
FlagTensor integrates directly with PyTorch. Import the package and call operators on CUDA tensors:
import torch
import flagtensor
# Element-wise operations
x = torch.randn(1024, device="cuda", dtype=torch.float32)
y = flagtensor.abs(x)
z = flagtensor.relu(x)
w = flagtensor.sigmoid(x)
# Binary operations
a = torch.randn(1024, device="cuda")
b = torch.randn(1024, device="cuda")
c = flagtensor.add(a, b)
# Tensor contraction
m = torch.randn(64, 32, device="cuda")
n = torch.randn(32, 48, device="cuda")
r = flagtensor.gett(m, n)
Operator List#
The complete operator registry is maintained at FlagTensor conf/operators.yaml.
Category |
Operators |
Status |
|---|---|---|
Unary |
abs, acos, acosh, asin, asinh, atan, atanh, ceil, conj, cos, cosh, exp, floor, identity, log, mish, neg, rcp, relu, sigmoid, sin, sinh, soft_plus, soft_sign, sqrt, swish, tan, tanh |
stable |
Binary |
add, max, min, mul |
stable |
Contraction |
gett, tgett, ttgt, tensor_contraction_trinary, trinary_generic |
stable |
Sparse |
block_sparse_tensor_contraction |
experimental |
Run Tests#
# Single operator correctness test
pytest tests/unary/test_abs.py -v
# Record test results as JSON (using CPU-FP64 reference)
pytest tests/unary/test_abs.py --ref cpu --record json --output results.json
# Multi-GPU test runner (from YAML registry)
python tools/run_tests.py --stages stable --gpus 0,1
# Extract operator marks
python tools/get_marks.py --stage stable --output ops.txt
# Benchmark with recording
pytest benchmark/test_unary_perf.py -m abs \
--mode kernel --level core --record log
# Parse benchmark summary
python tools/summary_for_plot.py result-*.log