Operator dispatch mechanism#
This directory implements the operator dispatch mechanism for vllm-plugin-FL, providing a flexible operator dispatch system that selects between different backend implementations (FlagGems, PyTorch, vendor-specific) based on availability and policy configuration.
Directory structure#
dispatch/
βββ __init__.py # Module entry point, exports public API
βββ types.py # Core type definitions (OpImpl, BackendImplKind)
βββ registry.py # Thread-safe operator registry
βββ policy.py # Selection policy management
βββ manager.py # Core dispatch manager
βββ builtin_ops.py # Built-in operator registration
βββ ops.py # Backend base interface
βββ discovery.py # Plugin discovery mechanism
βββ logger_manager.py # Centralized logging configuration
βββ config/ # Platform-specific configurations
β βββ __init__.py # Config loader module
β βββ ascend.yaml # Ascend NPU default configuration
β βββ cuda.yaml # CUDA default configuration
βββ backends/ # Backend implementations
βββ base.py # Backend abstract base class
βββ flaggems/ # FlagGems backend (DEFAULT, priority 150)
β βββ flaggems.py # Backend class
β βββ register_ops.py # Registration function
β βββ impl/ # Operator implementations
β βββ activation.py
β βββ normalization.py
β βββ rotary.py
β βββ attention.py # AttentionFLBackend, AttentionFLImpl
β βββ mla.py # MLAFLBackend, MLAFLImpl
β βββ custom_attention.py # Attention backend registration
βββ reference/ # Reference backend (PyTorch, priority 50)
βββ vendor/ # Vendor-specific backends (priority 100)
βββ cuda/ # NVIDIA CUDA backend
β βββ impl/
β βββ activation.py
β βββ normalization.py
β βββ rotary.py
βββ ascend/ # Huawei Ascend NPU backend
βββ impl/
βββ activation.py
βββ normalization.py
βββ rotary.py
βββ attention.py # AscendAttentionBackend
βββ attention_mask.py # Attention mask utilities
Core concepts#
1. Backend implementation kind (BackendImplKind)#
types.py includes backend implementation kinds as follows:
DEFAULT: Default implementation (FlagGems), priority 150
VENDOR: Vendor-specific implementation, priority 100
REFERENCE: Reference implementation (PyTorch native), priority 50
2. Operator implementation (OpImpl)#
types.py includes operator implementationοΌeach operator implementation contains:
op_name: Operator name (e.g., βsilu_and_mulβ, βrms_normβ)impl_id: Unique implementation identifier (e.g., βdefault.flagosβ)kind: Implementation typefn: Actual implementation functionvendor: Vendor name (required for VENDOR type)priority: Selection priority (higher value = preferred)
3. Selection policy#
policy.py includes selection policy.
Policy controls operator implementation selection:
prefer: Preferred implementation typestrict: Strict mode, whether to raise error when primary implementation failsper_op_order: Custom selection order for each operatordeny_vendors: List of denied vendorsallow_vendors: Whitelist of allowed vendors
Architecture overview#
Dispatch flow diagram#
Cache Check: Check if dispatch cache hits
Get Implementations: Retrieve all registered implementations from registry
Vendor Filtering: Filter by policyβs allow/deny lists
Availability Check: Call
is_available()to check if implementation is availablePriority Sorting: Select best implementation based on per-op order or default order
Cache Result: Cache selection result to speed up subsequent calls
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β User Code β
β call_op("rms_norm", x, ...) β
ββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β OpManager β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β 1. Check Cache β β
β β 2. Get Policy (from env or context) β β
β β 3. Query Registry for all implementations β β
β β 4. Filter by vendor allow/deny list β β
β β 5. Check availability (is_available()) β β
β β 6. Sort by priority & selection order β β
β β 7. Cache & return selected implementation β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β OpRegistry β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β FlagGems β β Vendor β β Reference β β
β β Priority: 150β β Priority: 100β β Priority: 50 β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Priority selection flow#
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β VLLM_FL_PREFER=flagos β
β (Default Behavior) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββ΄βββββββββββββββββββββ
β β
βΌ βΌ
ββββββββββββββββ Available? ββββββββββββββββ Available?
β FlagGems ββββββNoβββββββΆβ Vendor ββββββNoβββββββΆ
β Priority: 150β β Priority: 100β
ββββββββββββββββ ββββββββββββββββ
β β
Yes Yes
β β
βΌ βΌ
β Selected β Selected
ββββββββββββββββ
β Reference β
β Priority: 50 β
ββββββββββββββββ
β
Yes
β
βΌ
β Selected
Plugin integration points#
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Plugin Discovery β
β β
β ββββββββββββββββββ ββββββββββββββββββ ββββββββββββββββββ β
β β Built-in β β Entry Points β β Environment β β
β β backends/ β β (setuptools) β β PLUGIN_MODULESβ β
β β vendor/ β β β β β β
β ββββββββββ¬ββββββββ ββββββββββ¬ββββββββ ββββββββββ¬ββββββββ β
β β β β β
β βββββββββββββββββββββ΄βββββββββββββββββββββ β
β β β
βββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββ
β Registry β
β register() β
βββββββββββββββββ