Requirements

Contents

Requirements#

Hardware#

NVIDIA GPU with CUDA support (for Triton execution and cuTensor baseline comparison).

Software#

Dependency

Notes

Python 3.8+

PyTorch 2.6.0

With CUDA support

FlagTree

FlagOS-maintained Triton fork

cuTensor

For baseline comparison

pytest

Test runner

PyYAML

Operator registry

matplotlib

Visualization

openpyxl

XLSX report generation