Dispatch through YAML config file#
The plugin ships with a sample config file config/sample.yaml with all available options. Copy it and customize:
# Copy the sample config
cp $(python -c "from sglang_fl.config import _CONFIG_DIR; print(_CONFIG_DIR / 'sample.yaml')") my_config.yaml
# Edit as needed, then launch with it
SGLANG_FL_CONFIG=./my_config.yaml python -m sglang.launch_server \
--model-path Qwen/Qwen2.5-0.5B-Instruct \
--port 30000 --disable-piecewise-cuda-graph
If SGLANG_FL_CONFIG is not set, the plugin uses sensible defaults (equivalent to prefer: flagos on CUDA). You only need a YAML file when you want to customize behavior.
Config Fields#
# Global backend preference: flagos | vendor | reference
prefer: flagos
# Per-op backend priority (ordered list, first available wins)
op_backends:
rms_norm: [vendor, flagos, reference]
silu_and_mul: [flagos, vendor, reference]
# Layer 2 fused ops to skip (fall through to SGLang native CUDA)
# Available: SiluAndMul, RMSNorm, RotaryEmbedding
oot_blacklist:
- RotaryEmbedding
# Layer 1 ATen ops to exclude from FlagGems Triton replacement
flagos_blacklist:
- mul
- sub
Field |
Description |
|---|---|
|
Global backend preference: |
|
Per-op ordered backend list (first available wins, can list 1–3 backends) |
|
Layer 2 fused ops to skip from OOT dispatch (fall through to SGLang native CUDA) |
|
Layer 1 ATen ops to exclude from FlagGems replacement (fall through to PyTorch native) |
Common Recipes#
Each recipe shows a YAML config and expected dispatch result. Use Dispatch Log to verify.
1. Skip RotaryEmbedding from OOT dispatch (fall through to SGLang native CUDA)#
# my_config.yaml
prefer: flagos
oot_blacklist:
- RotaryEmbedding
Expected dispatch log: only SiluAndMul and RMSNorm appear, no RotaryEmbedding.
2. Force RMSNorm to use vendor backend, others use flagos#
# my_config.yaml
prefer: flagos
op_backends:
rms_norm: [vendor, flagos, reference]
Expected dispatch log: RMSNorm → vendor(vendor.nvidia), SiluAndMul → flagos(flagos).
3. Use pure PyTorch reference for all Ops (useful for precision debugging)#
# my_config.yaml
prefer: reference
Expected dispatch log: all ops → reference(reference).