Coverage for src/flag_gems/runtime/backend/_thead/__init__.py: 0%

4 statements  

« prev     ^ index     » next       coverage.py v7.6.9, created at 2026-06-05 07:36 +0800

1""" 

2T-Head Zhenwu (真武) PPU Backend Configuration 

3 

4Product: Zhenwu PPU (真武处理器) 

5- Model: Zhenwu 810E (supports up to 16 cards with ICN interconnect) 

6- Architecture: Proprietary T-Head AI accelerator architecture 

7- SDK: PPU SDK v2.0.0+ 

8 

9Key Features: 

10- Full CUDA API compatibility (cuda runtime & driver APIs) 

11- Triton support: 2.3.x, 3.0.x - 3.4.x with AIU extensions 

12- Accelerated libraries: acdnn, acblas, acfft, acsolver, acrand, acsparse 

13- Multi-card support: ICN interconnect, MIG (up to 8 instances), SRIOV 

14- Device management: ppu-smi tool (similar to nvidia-smi) 

15 

16Hardware Capabilities: 

17- Tensor Core support with extended PTX instructions 

18- Dynamic frequency scaling (200MHz ~ max frequency) 

19- Support for FP16/BF16/FP32/INT8 precision 

20- High-bandwidth memory with optimized access patterns 

21 

22PyTorch Integration: 

23- Uses torch.cuda interface (CUDA-compatible API) 

24- Compatible with existing CUDA-based PyTorch code 

25- No special torch.ppu module required 

26 

27Reference: 

28- Official Documentation: https://help.aliyun.com/zh/document_detail/3011255.html 

29""" 

30 

31from backend_utils import VendorInfoBase 

32 

33vendor_info = VendorInfoBase( 

34 vendor_name="thead", 

35 # PPU uses CUDA-compatible API, accessed via torch.cuda 

36 device_name="cuda", 

37 # PPU device management tool (similar to nvidia-smi) 

38 device_query_cmd="ppu-smi", 

39 # Use standard CUDA dispatch key 

40 dispatch_key=None, 

41 # PPU has custom Triton backend with AIU extensions 

42 # The compiler supports Triton 2.3.x - 3.4.x 

43 triton_extra_name=None, # Uses standard CUDA path with PPU-specific compiler 

44) 

45 

46# Operators that should use PyTorch native implementation 

47# Based on PPU SDK capabilities and performance characteristics 

48CUSTOMIZED_UNUSED_OPS = ( 

49 # PPU has strong acceleration library support (acdnn, acblas, etc.) 

50 # Most operators should benefit from FlagGems optimization 

51 # This list can be tuned based on benchmarking results 

52) 

53 

54__all__ = ["*"]