Release Notes

Release Notes#

v0.1.0#

Note

This is a preview release. The version number shown is a pre-release identifier and may change upon final release. Content in this preview is for reference only and does not constitute a commitment or warranty for the final product.

Initial release of sglang-plugin-FL.

Added features
- Three-layer operator replacement architecture for SGLang:
  - Layer 1: ATen operator replacement via FlagGems Triton kernels
  - Layer 2: SGLang fused kernel dispatch (SiluAndMul, RMSNorm, RotaryEmbedding)
  - Layer 3: Distributed communication via CommunicatorFL (FlagCX / torch.distributed)
- Non-intrusive plugin architecture using SGLang entry_points
- Per-operator backend selection with automatic fallback
- YAML configuration and environment variable control
- Bridge layer decoupling framework-specific parameters from standardized op signatures
- Vendor auto-discovery mechanism — same backends work for both sglang-plugin-FL and vllm-plugin-FL
- Support for NVIDIA CUDA, Huawei Ascend, and extensible to other hardware
- Verified models: Qwen3.6-27B, Qwen3.6-35B-A3B, Qwen2.5-14B-Instruct
- Dispatch logging and ATen replacement logging for debugging
- Precision bisection workflow for numerical debugging

Release Notes

Contents

Release Notes#

v0.1.0#