Skip to main content

Ctrl+K

← Back to FlagOS Documentation

← Back to FlagOS Documentation

FlagTensor Documentation

📑 Release Notes

FlagTensor Release Notes

📚 Guides

FlagTensor Overview
Getting Started with FlagTensor
- Requirements
- Install FlagTensor
Requirements
Install FlagTensor
FlagTensor User Guide

📖 Reference

Reference

Repository
Suggest edit

FlagTensor Known Issues

Contents

Experimental Operators
- block_sparse_tensor_contraction
Known Limitations
Performance Notes
Migration Notes
- Directory Structure Transition
- Registry Transition
Future Work

FlagTensor Known Issues#

This document tracks known issues and limitations in the current FlagTensor implementation.

Experimental Operators#

block_sparse_tensor_contraction#

Status: Experimental
Issue: Sparse tensor contraction support is still under active development
Impact: May have limited shape/dtype coverage compared to dense operators
Recommendation: Use for evaluation only; not for production workloads

Known Limitations#

Operator-Specific Numerical Issues#

CI Environment#

GPU Access: CI workflows run on ubuntu-latest (CPU) without GPU access
- Actual GPU validation must be done via Slurm on cluster nodes
- CI correctness/perf jobs currently validate structure and integration, not actual GPU correctness
Memory: CI runners have limited memory; large shape tests are reduced in smoke mode

Benchmark Mode Coverage#

kernel mode: Fully supported for most operators
operator mode: Supported for subset of operators
wrapper mode: Limited support; mainly for operators where wrapper-level optimization is beneficial

Dtype Coverage#

float16: Fully supported across operators
float32: Fully supported across operators
bfloat16: Supported across unary and contraction operators; verified in correctness tests
complex64/complex128: Supported only for conj operator. Triton’s type system does not natively support complex dtypes; other operators reject complex inputs.

Shape Coverage#

Small shapes: (1024,), (4096,) - covered in correctness and smoke benchmark
Medium shapes: (128, 128), (32, 64, 16) - covered in correctness tests
Large shapes: Up to 2^24 elements - covered in full benchmark runs
Contraction shapes: Specialized shapes for layout/chain validation

Performance Notes#

Triton Autotuner: Current Triton version uses deprecated warmup/rep parameters
- Deprecation warnings appear in benchmark output
- Does not affect functionality; will be addressed in future Triton upgrade
cuTensor Baseline: Performance comparisons against cuTensor C API
- Some operators may show speedup < 1x for certain shapes/dtypes
- This is expected behavior and not necessarily an issue

Migration Notes#

Directory Structure Transition#

ctests/: Legacy correctness test directory; migrated to tests/
benchmark/: Single-operator perf files retained as implementation details; category-level entry points are the formal acceptance interface
tests/: Unified correctness entry with proxy layer for legacy tests
src/flagtensor/testing/: Centralized tolerance/assertion helpers

Registry Transition#

weekly_op_test.txt: Removed; operator list is generated from registry
discover_ops(): Legacy discovery function; being replaced by registry-based filtering
Manual exclusion: --exclude-op flags still supported but registry is preferred

Future Work#

Migrate all correctness tests from ctests/ to tests/ with category organization
- Category directories created (unary/, binary/, contraction/, sparse/)
- Loader supports skipping migrated operators
- Unary operators: 27 migrated
- Binary operators: 4 migrated (add, mul, max, min - all complete)
- Contraction operators: 4 migrated (gett, tgett, ttgt, tensor_contraction_trinary)
- Sparse operators: 1 migrated (block_sparse_tensor_contraction, float16 now active)
Add category-level benchmark entry points (formal acceptance interface)
- test_unary_perf.py
- test_binary_perf.py
- test_contraction_perf.py
- test_sparse_perf.py
Upgrade Triton to remove deprecation warnings
Add GPU runner to CI for actual correctness validation
Expand bfloat16 dtype coverage
Improve wrapper mode coverage
Add acceptance-level performance regression detection

previous

FlagTensor CI Matrix

next

FlagTensor Operator Coverage Matrix

Contents

Experimental Operators
- block_sparse_tensor_contraction
Known Limitations
Performance Notes
Migration Notes
- Directory Structure Transition
- Registry Transition
Future Work

By FlagOS Community

© Copyright 2025-2026, FlagOS Community.

Last updated on Jun 18, 2026.