Dispatch through environment variables#
All plugin behavior is controlled by environment variables with the SGLANG_FL_* prefix.
Layer 2 β Fused Op Dispatch#
Variable |
Default |
Description |
|---|---|---|
|
|
Master switch: |
|
|
Global backend preference: |
|
β |
Per-op backend priority, e.g. |
|
β |
Skip listed ops from OOT dispatch (comma-separated class names) |
|
β |
Only dispatch listed ops (mutually exclusive with BLACKLIST) |
|
|
|
|
β |
Deny specific vendors (comma-separated, e.g. |
|
β |
Allow only listed vendors (comma-separated) |
|
β |
Path to dispatch log file (records which ops are intercepted) |
Layer 1 β ATen Replacement (FlagGems)#
Variable |
Default |
Description |
|---|---|---|
|
|
Master switch: |
|
β |
Only listed ATen ops use FlagGems (comma-separated) |
|
β |
Listed ATen ops donβt use FlagGems (comma-separated) |
|
|
|
|
β |
Path to ATen replacement log file |
|
|
|
SGLANG_FL_FLAGOS_WHITELISTandSGLANG_FL_FLAGOS_BLACKLISTare mutually exclusive.SGLANG_FL_FLAGOS_WHITELISTtakes priority over YAMLflagos_blacklist.
Layer 3 β Distributed Communication#
Variable |
Default |
Description |
|---|---|---|
|
|
Backend: |
|
β |
FlagCX installation path (if set, defaults to |
System / Debug#
Variable |
Default |
Description |
|---|---|---|
|
β |
Path to YAML config file (overrides platform auto-detection) |
|
(auto) |
Force platform: |
|
|
Dispatch system log level: |
|
(all) |
SGLang built-in: filter which plugins to load (comma-separated) |
Examples#
# Force all ops to reference backend (pure PyTorch, useful for precision debugging)
SGLANG_FL_PREFER=reference python -m sglang.launch_server \
--model-path Qwen/Qwen2.5-0.5B-Instruct \
--port 30000 --disable-piecewise-cuda-graph
# Per-op: RMSNorm uses vendor, others use flagos
SGLANG_FL_PER_OP="rms_norm=vendor|flagos;silu_and_mul=flagos" \
python -m sglang.launch_server \
--model-path Qwen/Qwen2.5-0.5B-Instruct \
--port 30000 --disable-piecewise-cuda-graph
# Skip RotaryEmbedding from OOT dispatch (fall through to SGLang native CUDA)
SGLANG_FL_OOT_BLACKLIST=RotaryEmbedding python -m sglang.launch_server \
--model-path Qwen/Qwen2.5-0.5B-Instruct \
--port 30000 --disable-piecewise-cuda-graph
# Disable ATen layer, keep only fused op dispatch
USE_FLAGGEMS=0 python -m sglang.launch_server \
--model-path Qwen/Qwen2.5-0.5B-Instruct \
--port 30000 --disable-piecewise-cuda-graph
# Use YAML config with env var override
SGLANG_FL_CONFIG=./my_config.yaml SGLANG_FL_PREFER=reference \
python -m sglang.launch_server \
--model-path Qwen/Qwen2.5-0.5B-Instruct \
--port 30000 --disable-piecewise-cuda-graph