Coverage for src/flag_gems/runtime/backend/_arm/int8/__init__.py: 0%

4 statements  

« prev     ^ index     » next       coverage.py v7.6.9, created at 2026-06-05 07:36 +0800

1"""ARM CPU INT8 model utilities. 

2 

3Drop-in Linear replacement with decode-optimized TLE SDOT GEMV + prefill 

4via torch._int_mm (SVE2 i8mm), plus a helper to replace all nn.Linear 

5layers in a transformers model from a pre-quantized state dict. 

6 

7Usage: 

8 from safetensors.torch import load_file 

9 from flag_gems.runtime.backend._arm.int8 import replace_linears_with_tle_int8 

10 

11 model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-1.7B", dtype=bf16) 

12 state = load_file("Qwen3-1.7B-W8A8-INT8/model.safetensors") 

13 replace_linears_with_tle_int8(model, state) 

14""" 

15 

16from .quantize_live import quantize_and_replace_linears # noqa: F401 

17from .replace import replace_linears_with_tle_int8 # noqa: F401 

18from .tle_int8_linear import TLEInt8Linear, pack_weights_sdot # noqa: F401 

19 

20__all__ = [ 

21 "TLEInt8Linear", 

22 "pack_weights_sdot", 

23 "replace_linears_with_tle_int8", 

24 "quantize_and_replace_linears", 

25]