Coverage for src/flag_gems/runtime/backend/_thead/__init__.py: 0%
4 statements
« prev ^ index » next coverage.py v7.6.9, created at 2026-06-05 07:36 +0800
« prev ^ index » next coverage.py v7.6.9, created at 2026-06-05 07:36 +0800
1"""
2T-Head Zhenwu (真武) PPU Backend Configuration
4Product: Zhenwu PPU (真武处理器)
5- Model: Zhenwu 810E (supports up to 16 cards with ICN interconnect)
6- Architecture: Proprietary T-Head AI accelerator architecture
7- SDK: PPU SDK v2.0.0+
9Key Features:
10- Full CUDA API compatibility (cuda runtime & driver APIs)
11- Triton support: 2.3.x, 3.0.x - 3.4.x with AIU extensions
12- Accelerated libraries: acdnn, acblas, acfft, acsolver, acrand, acsparse
13- Multi-card support: ICN interconnect, MIG (up to 8 instances), SRIOV
14- Device management: ppu-smi tool (similar to nvidia-smi)
16Hardware Capabilities:
17- Tensor Core support with extended PTX instructions
18- Dynamic frequency scaling (200MHz ~ max frequency)
19- Support for FP16/BF16/FP32/INT8 precision
20- High-bandwidth memory with optimized access patterns
22PyTorch Integration:
23- Uses torch.cuda interface (CUDA-compatible API)
24- Compatible with existing CUDA-based PyTorch code
25- No special torch.ppu module required
27Reference:
28- Official Documentation: https://help.aliyun.com/zh/document_detail/3011255.html
29"""
31from backend_utils import VendorInfoBase
33vendor_info = VendorInfoBase(
34 vendor_name="thead",
35 # PPU uses CUDA-compatible API, accessed via torch.cuda
36 device_name="cuda",
37 # PPU device management tool (similar to nvidia-smi)
38 device_query_cmd="ppu-smi",
39 # Use standard CUDA dispatch key
40 dispatch_key=None,
41 # PPU has custom Triton backend with AIU extensions
42 # The compiler supports Triton 2.3.x - 3.4.x
43 triton_extra_name=None, # Uses standard CUDA path with PPU-specific compiler
44)
46# Operators that should use PyTorch native implementation
47# Based on PPU SDK capabilities and performance characteristics
48CUSTOMIZED_UNUSED_OPS = (
49 # PPU has strong acceleration library support (acdnn, acblas, etc.)
50 # Most operators should benefit from FlagGems optimization
51 # This list can be tuned based on benchmarking results
52)
54__all__ = ["*"]