FlagDNN 用户指南#
使用 FlagDNN#
FlagDNN 直接与 PyTorch 集成。导入包并对 CUDA 张量调用算子:
import torch
import flag_dnn
# 在 CUDA 上创建张量
x = torch.randn(1024, device='cuda')
# 应用 ReLU 激活
y = flag_dnn.ops.relu(x)
算子列表#
完整的算子注册表维护在 FlagDNN conf/operators.yaml。
张量操作#
identity, reshape, transpose, slice, concatenate, gen_index, binary_select, one_hot, embedding
神经网络 — 激活#
relu, gelu, gelu_approx_tanh, silu, swish, leaky_relu, leaky_relu_, prelu, elu, elu_, rrelu, rrelu_, mish, softplus, softsign, softshrink, hardswish, relu6, selu, glu, celu, tanh, sigmoid, sigmoid_backward, logsigmoid, hardtanh, hardtanh_, threshold, threshold_
神经网络 — 归一化#
batch_norm, batchnorm, batchnorm_inference, layernorm, layer_norm, rms_norm, rmsnorm, group_norm
神经网络 — Softmax#
softmax, softmin, log_softmax
神经网络 — 池化#
max_pool1d, max_pool2d, max_pool3d, avg_pool1d, avg_pool2d, avg_pool3d, adaptive_avg_pool1d, adaptive_avg_pool2d, adaptive_avg_pool3d, adaptive_max_pool1d, adaptive_max_pool2d, adaptive_max_pool3d
神经网络 — 卷积#
conv1d, conv2d, conv3d, conv_fprop, conv_dgrad, conv_wgrad, causal_conv1d
神经网络 — 注意力#
sdpa, sdpa_backward
神经网络 — 其他#
interpolate
数学 — 一元#
sqrt, abs, neg, clamp, isinf, isnan, square, rsqrt, positive, log, exp, bitwise_not, ceil, floor, reciprocal, sin, cos, tan, erf
数学 — 二元#
add, sub, mul, div, pow, mod, max, min, scale, eq, ne, lt, le, gt, ge, minimum, maximum, fmin, fmax
数学 — 比较#
cmp_eq, cmp_neq, cmp_lt, cmp_le, cmp_gt, cmp_ge
数学 — 位运算#
bitwise_and, bitwise_or, bitwise_xor
数学 — 逻辑#
logical_and, logical_or, logical_not
线性代数#
mv, mm, matmul, dot
归约#
sum, mean, prod, reduction, cumsum, cumprod, cummin, cummax, any, all
损失#
kl_div, mse_loss, l1_loss
融合算子#
add_square, rmsnorm_rht_amax