FlagGems 实验性算子#
本节列举 FlagGems 中的实验性算子。这些算子与 PyTorch 的原生实现相比, 能够达到平均 0.8 倍或更高的性能。
性能数据概览#
算子总数:142
平均加速比范围:0.81x - 7.23x
测试环境:Hopper GPU
过滤条件:平均加速比 ≥ 0.8x
按性能排列的算子列表#
序号 |
算子 |
平均加速比 |
分类 |
|---|---|---|---|
1 |
|
7.23x 🏆 |
Internal |
2 |
|
2.41x 🏆 |
Math |
3 |
|
1.85x ✅ |
Other |
4 |
|
1.79x ✅ |
Activation |
5 |
|
1.64x ✅ |
Loss |
6 |
|
1.47x ✅ |
Other |
7 |
|
1.44x ✅ |
Other |
8 |
|
1.43x ✅ |
Other |
9 |
|
1.41x ✅ |
Shape |
10 |
|
1.40x ✅ |
Math |
11 |
|
1.37x ✅ |
Math |
12 |
|
1.32x ✅ |
Other |
13 |
|
1.27x ✅ |
Other |
14 |
|
1.24x ✅ |
Other |
15 |
|
1.23x ✅ |
Other |
16 |
|
1.20x 📈 |
Vision |
17 |
|
1.18x 📈 |
Shape |
18 |
|
1.17x 📈 |
Activation |
19 |
|
1.17x 📈 |
Activation |
20 |
|
1.16x 📈 |
Shape |
21 |
|
1.16x 📈 |
Activation |
22 |
|
1.14x 📈 |
Math |
23 |
|
1.14x 📈 |
Linear Algebra |
24 |
|
1.13x 📈 |
Math |
25 |
|
1.12x 📈 |
Vision |
26 |
|
1.11x 📈 |
Padding |
27 |
|
1.11x 📈 |
Vision |
28 |
|
1.11x 📈 |
Math |
29 |
|
1.10x 📈 |
Activation |
30 |
|
1.10x 📈 |
Activation |
31 |
|
1.10x 📈 |
Vision |
32 |
|
1.09x 📈 |
Math |
33 |
|
1.09x 📈 |
Activation |
34 |
|
1.09x 📈 |
Math |
35 |
|
1.09x 📈 |
Math |
36 |
|
1.08x 📈 |
Math |
37 |
|
1.07x 📈 |
Vision |
38 |
|
1.06x 📈 |
Vision |
39 |
|
1.06x 📈 |
Padding |
40 |
|
1.06x 📈 |
Activation |
41 |
|
1.05x 📈 |
Math |
42 |
|
1.05x 📈 |
Activation |
43 |
|
1.04x 📈 |
Padding |
44 |
|
1.04x 📈 |
Activation |
45 |
|
1.04x 📈 |
Activation |
46 |
|
1.04x 📈 |
Arithmetic |
47 |
|
1.03x 📈 |
Math |
48 |
|
1.03x 📈 |
Activation |
49 |
|
1.03x 📈 |
Activation |
50 |
|
1.03x 📈 |
Activation |
51 |
|
1.03x 📈 |
Activation |
52 |
|
1.03x 📈 |
Math |
53 |
|
1.03x 📈 |
Math |
54 |
|
1.03x 📈 |
Activation |
55 |
|
1.02x 📈 |
Math |
56 |
|
1.02x 📈 |
Math |
57 |
|
1.02x 📈 |
Math |
58 |
|
1.02x 📈 |
Math |
59 |
|
1.02x 📈 |
Math |
60 |
|
1.02x 📈 |
Math |
61 |
|
1.02x 📈 |
Activation |
62 |
|
1.02x 📈 |
Activation |
63 |
|
1.02x 📈 |
Math |
64 |
|
1.02x 📈 |
Math |
65 |
|
1.01x 📈 |
Math |
66 |
|
1.01x 📈 |
Vision |
67 |
|
1.01x 📈 |
Math |
68 |
|
1.01x 📈 |
Math |
69 |
|
1.01x 📈 |
Math |
70 |
|
1.01x 📈 |
Math |
71 |
|
1.01x 📈 |
Math |
72 |
|
1.01x 📈 |
Math |
73 |
|
1.01x 📈 |
Math |
74 |
|
1.00x 📈 |
Math |
75 |
|
1.00x 📈 |
Math |
76 |
|
1.00x 📈 |
Math |
77 |
|
1.00x 📈 |
Math |
78 |
|
1.00x 📈 |
Activation |
79 |
|
1.00x 📈 |
Math |
80 |
|
1.00x 📈 |
Math |
81 |
|
1.00x 📈 |
Loss |
82 |
|
1.00x 📈 |
Math |
83 |
|
1.00x 📈 |
Math |
84 |
|
1.00x 📈 |
Math |
85 |
|
1.00x 📈 |
Math |
86 |
|
1.00x 📈 |
Math |
87 |
|
1.00x 📈 |
Other |
88 |
|
1.00x 📈 |
Math |
89 |
|
1.00x 📈 |
Math |
90 |
|
1.00x 📈 |
Activation |
91 |
|
1.00x 📈 |
Loss |
92 |
|
1.00x 📈 |
Activation |
93 |
|
1.00x 📈 |
Arithmetic |
94 |
|
1.00x 📈 |
Math |
95 |
|
1.00x 📈 |
Math |
96 |
|
1.00x 📈 |
Activation |
97 |
|
1.00x 📈 |
Math |
98 |
|
1.00x 📈 |
Math |
99 |
|
1.00x 📈 |
Math |
100 |
|
1.00x ⚡ |
Math |
101 |
|
1.00x ⚡ |
Internal |
102 |
|
1.00x ⚡ |
Shape |
103 |
|
1.00x ⚡ |
Other |
104 |
|
1.00x ⚡ |
Shape |
105 |
|
1.00x ⚡ |
Internal |
106 |
|
1.00x ⚡ |
Activation |
107 |
|
1.00x ⚡ |
Math |
108 |
|
1.00x ⚡ |
Shape |
109 |
|
1.00x ⚡ |
Activation |
110 |
|
1.00x ⚡ |
Math |
111 |
|
1.00x ⚡ |
Shape |
112 |
|
1.00x ⚡ |
Other |
113 |
|
1.00x ⚡ |
Other |
114 |
|
1.00x ⚡ |
Math |
115 |
|
0.99x ⚡ |
Other |
116 |
|
0.99x ⚡ |
Math |
117 |
|
0.99x ⚡ |
Math |
118 |
|
0.98x ⚡ |
Activation |
119 |
|
0.97x ⚡ |
Math |
120 |
|
0.97x ⚡ |
Arithmetic |
121 |
|
0.96x ⚡ |
Math |
122 |
|
0.96x ⚡ |
Math |
123 |
|
0.95x ⚡ |
Arithmetic |
124 |
|
0.95x ⚡ |
Loss |
125 |
|
0.92x ⚡ |
Activation |
126 |
|
0.91x ⚡ |
Activation |
127 |
|
0.90x ⚡ |
Loss |
128 |
|
0.90x ⚡ |
Padding |
129 |
|
0.89x ⚡ |
Shape |
130 |
|
0.89x ⚡ |
Other |
131 |
|
0.88x ⚡ |
Other |
132 |
|
0.86x ⚡ |
Activation |
133 |
|
0.86x ⚡ |
Math |
134 |
|
0.86x ⚡ |
Math |
135 |
|
0.86x ⚡ |
Other |
136 |
|
0.86x ⚡ |
Math |
137 |
|
0.82x ⚡ |
Math |
138 |
|
0.82x ⚡ |
Activation |
139 |
|
0.82x ⚡ |
Math |
140 |
|
0.82x ⚡ |
Internal |
141 |
|
0.81x ⚡ |
Math |
142 |
|
special ⚡ |
Normalization |
图例:
🏆 卓越:加速比 ≥ 2.0x
✅ 优秀:加速比 ≥ 1.5x
📈 良好:加速比 ≥ 1.0x
⚡ 合格:加速比 ≥ 0.8x
算子分类说明#
Activation: 激活函数 (ReLU, GELU, Sigmoid, etc.)
Arithmetic: 基本算术操作 (add, mul, div, etc.)
Comparison: 比较操作 (eq, ne, gt, lt, etc.)
Internal: 内部、工具操作
Linear Algebra: 矩阵操作 (matmul, mv, etc.)
Loss: 损失函数计算 (MSE, Cross-Entropy, etc.)
Math: 数学函数 (sin, cos, exp, log, etc.)
NLP: 自然语言处理
Other: 杂项
Padding: 数据补齐操作 (reflection_pad, replication_pad, etc.)
Shape: 形状操控操作
Vision: 计算机视觉操作
说明#
所有算子均通过正确性测试
性能数据采用多种不同输入形状在 Hopper GPU 上采集
加速比计算方式:
PyTorch_time / FlagGems_time数值较大意味着性能较好