FlagDNN User Guide#

Use FlagDNN#

FlagDNN integrates directly with PyTorch. Import the package and call operators on CUDA tensors:

import torch
import flag_dnn

# Create a tensor on CUDA
x = torch.randn(1024, device='cuda')

# Apply ReLU activation
y = flag_dnn.ops.relu(x)

Operator List#

The complete operator registry is maintained at FlagDNN conf/operators.yaml.

Tensor Operations#

identity, reshape, transpose, slice, concatenate, gen_index, binary_select, one_hot, embedding

Neural Network — Activation#

relu, gelu, gelu_approx_tanh, silu, swish, leaky_relu, leaky_relu_, prelu, elu, elu_, rrelu, rrelu_, mish, softplus, softsign, softshrink, hardswish, relu6, selu, glu, celu, tanh, sigmoid, sigmoid_backward, logsigmoid, hardtanh, hardtanh_, threshold, threshold_

Neural Network — Normalization#

batch_norm, batchnorm, batchnorm_inference, layernorm, layer_norm, rms_norm, rmsnorm, group_norm

Neural Network — Softmax#

softmax, softmin, log_softmax

Neural Network — Pooling#

max_pool1d, max_pool2d, max_pool3d, avg_pool1d, avg_pool2d, avg_pool3d, adaptive_avg_pool1d, adaptive_avg_pool2d, adaptive_avg_pool3d, adaptive_max_pool1d, adaptive_max_pool2d, adaptive_max_pool3d

Neural Network — Convolution#

conv1d, conv2d, conv3d, conv_fprop, conv_dgrad, conv_wgrad, causal_conv1d

Neural Network — Attention#

sdpa, sdpa_backward

Neural Network — Other#

interpolate

Math — Unary#

sqrt, abs, neg, clamp, isinf, isnan, square, rsqrt, positive, log, exp, bitwise_not, ceil, floor, reciprocal, sin, cos, tan, erf

Math — Binary#

add, sub, mul, div, pow, mod, max, min, scale, eq, ne, lt, le, gt, ge, minimum, maximum, fmin, fmax

Math — Comparison#

cmp_eq, cmp_neq, cmp_lt, cmp_le, cmp_gt, cmp_ge

Math — Bitwise#

bitwise_and, bitwise_or, bitwise_xor

Math — Logical#

logical_and, logical_or, logical_not

Linear Algebra#

mv, mm, matmul, dot

Reduction#

sum, mean, prod, reduction, cumsum, cumprod, cummin, cummax, any, all

Loss#

kl_div, mse_loss, l1_loss

Fused#

add_square, rmsnorm_rht_amax