Requirements#
Hardware#
NVIDIA GPU with CUDA support (for Triton execution and cuTensor baseline comparison).
Software#
Dependency |
Notes |
|---|---|
Python 3.8+ |
|
PyTorch 2.6.0 |
With CUDA support |
FlagTree |
FlagOS-maintained Triton fork |
cuTensor |
For baseline comparison |
pytest |
Test runner |
PyYAML |
Operator registry |
matplotlib |
Visualization |
openpyxl |
XLSX report generation |