Dispatch through environment variables#

All plugin behavior is controlled by environment variables with the SGLANG_FL_* prefix.

Layer 2 β€” Fused Op Dispatch#

Variable

Default

Description

SGLANG_FL_OOT_ENABLED

1

Master switch: 0 disables Layer 2 (keeps Layer 1 ATen active)

SGLANG_FL_PREFER

flagos

Global backend preference: flagos, vendor, reference

SGLANG_FL_PER_OP

β€”

Per-op backend priority, e.g. rms_norm=vendor|flagos;silu_and_mul=reference

SGLANG_FL_OOT_BLACKLIST

β€”

Skip listed ops from OOT dispatch (comma-separated class names)

SGLANG_FL_OOT_WHITELIST

β€”

Only dispatch listed ops (mutually exclusive with BLACKLIST)

SGLANG_FL_STRICT

0

1 = disable fallback (error if preferred backend unavailable)

SGLANG_FL_DENY_VENDORS

β€”

Deny specific vendors (comma-separated, e.g. cuda,ascend)

SGLANG_FL_ALLOW_VENDORS

β€”

Allow only listed vendors (comma-separated)

SGLANG_FL_DISPATCH_LOG

β€”

Path to dispatch log file (records which ops are intercepted)

Layer 1 β€” ATen Replacement (FlagGems)#

Variable

Default

Description

USE_FLAGGEMS

1

Master switch: 0 disables all ATen replacement

SGLANG_FL_FLAGOS_WHITELIST

β€”

Only listed ATen ops use FlagGems (comma-separated)

SGLANG_FL_FLAGOS_BLACKLIST

β€”

Listed ATen ops don’t use FlagGems (comma-separated)

SGLANG_FLAGGEMS_RECORD

0

1 = record which ATen ops are replaced

SGLANG_FLAGGEMS_LOG_PATH

β€”

Path to ATen replacement log file

SGLANG_FLAGGEMS_LOG_ONCE

1

1 = log each op only once, 0 = log every call

SGLANG_FL_FLAGOS_WHITELIST and SGLANG_FL_FLAGOS_BLACKLIST are mutually exclusive. SGLANG_FL_FLAGOS_WHITELIST takes priority over YAML flagos_blacklist.

Layer 3 β€” Distributed Communication#

Variable

Default

Description

SGLANG_FL_DIST_BACKEND

nccl

Backend: nccl / hccl / flagcx

FLAGCX_PATH

β€”

FlagCX installation path (if set, defaults to flagcx backend)

System / Debug#

Variable

Default

Description

SGLANG_FL_CONFIG

β€”

Path to YAML config file (overrides platform auto-detection)

SGLANG_FL_PLATFORM

(auto)

Force platform: cuda, ascend (overrides auto-detection)

SGLANG_FL_LOG_LEVEL

INFO

Dispatch system log level: DEBUG, INFO, WARNING, ERROR

SGLANG_PLUGINS

(all)

SGLang built-in: filter which plugins to load (comma-separated)

Examples#

# Force all ops to reference backend (pure PyTorch, useful for precision debugging)
SGLANG_FL_PREFER=reference python -m sglang.launch_server \
    --model-path Qwen/Qwen2.5-0.5B-Instruct \
    --port 30000 --disable-piecewise-cuda-graph

# Per-op: RMSNorm uses vendor, others use flagos
SGLANG_FL_PER_OP="rms_norm=vendor|flagos;silu_and_mul=flagos" \
    python -m sglang.launch_server \
    --model-path Qwen/Qwen2.5-0.5B-Instruct \
    --port 30000 --disable-piecewise-cuda-graph

# Skip RotaryEmbedding from OOT dispatch (fall through to SGLang native CUDA)
SGLANG_FL_OOT_BLACKLIST=RotaryEmbedding python -m sglang.launch_server \
    --model-path Qwen/Qwen2.5-0.5B-Instruct \
    --port 30000 --disable-piecewise-cuda-graph

# Disable ATen layer, keep only fused op dispatch
USE_FLAGGEMS=0 python -m sglang.launch_server \
    --model-path Qwen/Qwen2.5-0.5B-Instruct \
    --port 30000 --disable-piecewise-cuda-graph

# Use YAML config with env var override
SGLANG_FL_CONFIG=./my_config.yaml SGLANG_FL_PREFER=reference \
    python -m sglang.launch_server \
    --model-path Qwen/Qwen2.5-0.5B-Instruct \
    --port 30000 --disable-piecewise-cuda-graph