Configure backend selection#
The dispatch system supports multiple ways to configure backend selection:
User-specified configuration file (YAML) - Complete override
Environment variables - Override specific items
Platform-specific configuration file - Auto-detected defaults
Built-in default values
Configuration priority#
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Configuration Priority β
β (Highest to Lowest) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β 1. VLLM_FL_CONFIG β User config file, complete override β
β 2. Environment Variables β Override specific items β
β 3. Platform Config File β ascend.yaml / cuda.yaml defaults β
β 4. Built-in Defaults β Code-defined default values β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Note
Environment variables can override specific items from platform config
If user doesnβt set any environment variable, platform config is used
Users can also modify platform config files directly
The dispatch system applies configuration in the following order:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Configuration Resolution β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β VLLM_FL_CONFIG set? β
β β β
β βββ Yes βββΆ Use user config file (complete override) β
β β β
β βββ No βββΆ For each setting: β
β β β
β βββ Env var set? βββΆ Use env var value β
β β β
β βββ Not set βββΆ Use platform config value β
β β β
β βββ Not found βββΆ Defaultβ
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
User-specified configuration file (YAML)#
Set the VLLM_FL_CONFIG environment variable to specify a YAML configuration file that completely overrides all other settings:
export VLLM_FL_CONFIG=/path/to/vllm_fl_dispatch.yaml
Example configuration file#
# vllm_fl_dispatch.yaml
# Preferred backend type: flagos, vendor, or reference
prefer: vendor
# Strict mode:
# true = fail immediately on error, no fallback
# false = try next backend on failure (default)
strict: false
# Vendor whitelist (optional)
allow_vendors:
- cuda
# Vendor blacklist (optional)
deny_vendors:
- ascend
# Per-operator backend selection order (optional)
# Only the backends listed will be tried, in the specified order.
op_backends:
rms_norm:
- vendor # Try any available vendor first
- flagos # Then try flagos
# reference not listed, so it won't be used for rms_norm
silu_and_mul:
- vendor:cuda # Only try CUDA, not other vendors
- flagos
- reference
# FlagGems operator blacklist (optional)
# These operators will NOT use FlagGems implementation
flagos_blacklist:
- to_copy
- zeros
- mm
# OOT operator blacklist (optional)
# These operators will NOT be registered as OOT replacements
oot_blacklist:
- fused_moe
Token type explanations#
Token |
Description |
|---|---|
|
FlagOS default implementation |
|
PyTorch reference implementation |
|
Any available vendor backend (auto-detects hardware) |
|
Only CUDA vendor backend |
|
Only Ascend vendor backend |
Note: When using vendor (without specifying a vendor name), the system automatically selects an available vendor backend based on hardware detection.
More op backends selection example#
op_backends:
mul:
- flagos
silu_and_mul:
- flagos
- vendor
- reference
Environment variables#
Environment variables can override specific items from platform config. If not set, values from platform config file are used.
Core Configuration#
Variable |
Default |
Description |
|---|---|---|
|
|
Global switch. Set |
|
(none) |
Path to YAML config file (complete override) |
|
(auto) |
Force platform: |
Backend Selection#
Variable |
Default |
Description |
|---|---|---|
|
|
Preferred backend: |
|
|
Strict mode: |
|
(none) |
Per-operator order: |
|
(none) |
Vendor whitelist, comma-separated |
|
(none) |
Vendor blacklist, comma-separated |
FlagGems Control#
Variable |
Default |
Description |
|---|---|---|
|
|
Enable/disable FlagGems |
|
(none) |
FlagGems ops whitelist (mutually exclusive with blacklist) |
|
(none) |
FlagGems ops blacklist (mutually exclusive with whitelist) |
Priority: WHITELIST > BLACKLIST (env) > flagos_blacklist (config file)
OOT Operator Control#
Variable |
Default |
Description |
|---|---|---|
|
|
Enable OOT operator registration |
|
(none) |
OOT ops whitelist |
|
(none) |
OOT ops blacklist |
Priority: WHITELIST > BLACKLIST (env) > oot_blacklist (config file)
Debug & Logging#
Variable |
Default |
Description |
|---|---|---|
|
|
Log level: |
|
|
Enable dispatch debug mode |
Plugins#
Variable |
Default |
Description |
|---|---|---|
|
(none) |
External plugin modules, comma-separated |
|
(none) |
Operator config JSON file path |
Other environment variables#
Variable |
Default |
Description |
|---|---|---|
|
(none) |
FlagCX library path (enables FlagCX communication backend) |
|
|
FlagGems enabled ops list file |
Examples#
# Use platform default config (auto-detected)
# Nothing to set - just run your application
# Override only the prefer setting (other items from platform config)
export VLLM_FL_PREFER=vendor
# Override FlagGems blacklist (overrides config file blacklist)
export VLLM_FL_FLAGOS_BLACKLIST="mm,to_copy,zeros"
# Use whitelist instead (completely ignores any blacklist)
export VLLM_FL_FLAGOS_WHITELIST="silu_and_mul,rms_norm"
# Specify per-operator order
export VLLM_FL_PER_OP="rms_norm=vendor|flagos|reference"
# Use completely custom config file
export VLLM_FL_CONFIG=/path/to/my_config.yaml
# Force specific platform
export VLLM_FL_PLATFORM=ascend
# Enable debug logging
export VLLM_FL_LOG_LEVEL=DEBUG
Whitelist vs Blacklist Priority#
For FlagGems and OOT operators:
WHITELIST (env) βββΆ Completely overrides blacklist
β
βββ Not set βββΆ BLACKLIST (env) βββΆ Overrides config blacklist
β
βββ Not set βββΆ Config file blacklist
β
βββ Not set βββΆ Allow all
Note
Whitelist and blacklist environment variables are mutually exclusive (error if both set)
If whitelist is set, it completely ignores any blacklist (env or config)
Environment blacklist overrides config file blacklist (not merged)
Example: Combined environment variables#
# Platform config (ascend.yaml) has:
# prefer: flagos
# flagos_blacklist: [to_copy, zeros, mm, ...]
# User overrides only prefer, blacklist still from config
export VLLM_FL_PREFER=vendor
# Result:
# prefer: vendor (from env)
# flagos_blacklist: [to_copy, zeros, mm, ...] (from config)
# User wants to override blacklist too
export VLLM_FL_PREFER=vendor
export VLLM_FL_FLAGOS_BLACKLIST="custom_op1,custom_op2"
# Result:
# prefer: vendor (from env)
# flagos_blacklist: [custom_op1, custom_op2] (from env, config ignored)
Note
Environment variables override, not merge: Setting an env var replaces the config value entirely
VLLM_FL_PREFERsets preference, not exclusivity: It defines the selection order but will fall back to other backends if the preferred one is unavailableTo force a specific backend: Combine
PREFERwithDENY_VENDORSor usePER_OPto exclude unwanted backendsVLLM_FL_STRICT=1: Enables strict mode β fails immediately if the primary implementation fails, no fallback is attempted
Fallback mechanism#
When VLLM_FL_STRICT=0 (default), if the primary implementation fails, the system automatically tries other available implementations:
Op 'rms_norm' using 'default.flagos' (kind=flagos, vendor=None)
[WARNING] Implementation 'default.flagos' failed for op 'rms_norm': ...
Op 'rms_norm' fallback to 'reference.torch' (kind=reference, vendor=None)
Platform-specific configuration#
The system automatically detects hardware and loads the corresponding configuration file from config/ directory:
Platform |
Config File |
Auto-Detection |
|---|---|---|
Ascend NPU |
|
|
NVIDIA GPU |
|
|
You can force a specific platform using VLLM_FL_PLATFORM environment variable:
export VLLM_FL_PLATFORM=ascend # Force Ascend config
export VLLM_FL_PLATFORM=cuda # Force CUDA config
Operator List#
This reference lists all operators supported by vllm-plugin-FL and their backend availability.
Supported operators#
Operator |
Description |
FlagGems |
Reference |
Vendor |
|---|---|---|---|---|
|
SiLU activation + element-wise multiplication |
β |
β |
β |
|
RMS normalization |
β |
β |
β |
|
Rotary position embedding |
β |
β |
β |
|
Attention backend class path |
β |
- |
β |
Backend priorities#
The dispatch system selects operators based on the following priority hierarchyοΌPriority values are spaced by 50 to allow future insertion of intermediate priorities.
FlagGems (DEFAULT) β Priority 150
Vendor-specific β Priority 100
PyTorch Reference β Priority 50
Higher priority values are preferred. When an implementation is unavailable, the system falls back to the next priority level.