Requirements# Hardware# NVIDIA GPU with CUDA support (for GPU-specific tests and benchmarks). Software# Python 3.8 or later PyTorch (with CUDA support) Triton CuPy (cupy-cuda12x)