FlagTensor CI Matrix#
This document describes the CI/CD workflows and their purposes in the FlagTensor acceptance process.
Workflows Overview#
Workflow |
Trigger |
Purpose |
GPU Required |
Output |
|---|---|---|---|---|
|
PR, push, manual |
Static quality checks (pre-commit, build, registry) |
No |
Build artifacts, consistency reports |
|
PR, push, manual |
Smoke-level correctness and performance |
No |
Smoke results, summary |
|
Manual |
Weekly regression on cluster |
Yes (via Slurm) |
Weekly results, artifacts |
|
Manual, weekly schedule |
Acceptance-level full coverage |
No (structure) / Yes (cluster) |
Acceptance results, summary |
Quality Gate Workflow (quality-gate.yaml)#
Jobs#
pre-commit#
Purpose: Run static analysis and formatting checks
Checks: YAML syntax, trailing whitespace, flake8, isort, black, clang-format
Runtime: ~2-3 minutes
Failure Impact: Blocks PR merge
build-check#
Purpose: Verify package can be built and distributed
Steps: Build wheel/sdist, twine check
Runtime: ~1-2 minutes
Failure Impact: Blocks PR merge
registry-consistency#
Purpose: Ensure operator registry is consistent with codebase
Checks:
All impl files exist
All correctness test files exist
All benchmark test files exist
Coverage statistics
Runtime: ~30 seconds
Failure Impact: Blocks PR merge
CI Workflow (ci.yaml)#
Jobs#
correctness-smoke#
Purpose: Validate correctness structure and basic functionality
Scope: All active operators (non-blocked)
Mode: Smoke (reduced shapes/dtypes)
Benchmark Mode: Default kernel, configurable
Runtime: ~5-10 minutes (CPU-only structure check)
Output: summary.json, summary.md, per-operator logs
Failure Impact: Warning only (GPU validation done on cluster)
perf-smoke#
Purpose: Validate benchmark structure and CSV generation
Scope: All active operators (non-blocked)
Mode: Smoke (reduced shapes)
Benchmark Mode: Default kernel, configurable
Runtime: ~5-10 minutes (CPU-only structure check)
Output: summary.json, summary.md, benchmark CSVs
Failure Impact: Warning only (GPU validation done on cluster)
Weekly Workflow (weekly.yaml)#
Jobs#
weekly-entry#
Purpose: Full weekly regression on GPU cluster
Scope: All active operators (non-blocked) from registry
Mode: Full (all shapes/dtypes)
GPU Allocation: Configurable via
--gpusparameterRuntime: 1-2 hours (depends on GPU count)
Output: Weekly results, operator list, artifacts
Failure Impact: Requires investigation
Weekly Parameters#
Parameter |
Default |
Description |
|---|---|---|
|
(generated from registry) |
Optional path to operator list file |
|
|
GPU IDs to use (comma-separated) |
|
|
Benchmark mode (kernel/operator/wrapper) |
Acceptance Workflow (acceptance.yaml)#
Jobs#
correctness-acceptance#
Purpose: Acceptance-level correctness validation
Scope: All active operators or specific category
Mode: Full (all shapes/dtypes)
Category Filter: Optional (unary/binary/contraction/sparse)
Benchmark Mode: Configurable
Runtime: 30-60 minutes (CPU structure) / 1-2 hours (GPU cluster)
Output: ACCEPTANCE_SUMMARY.md, summary.json, per-operator logs
Failure Impact: Blocks acceptance
perf-acceptance#
Purpose: Acceptance-level performance validation
Scope: All active operators or specific category
Mode: Full (all shapes)
Category Filter: Optional (unary/binary/contraction/sparse)
Benchmark Mode: Configurable
Runtime: 30-60 minutes (CPU structure) / 2-4 hours (GPU cluster)
Output: ACCEPTANCE_SUMMARY.md, summary.json, benchmark CSVs, speedup stats
Failure Impact: Blocks acceptance
Acceptance Parameters#
Parameter |
Default |
Description |
|---|---|---|
|
|
Benchmark mode (kernel/operator/wrapper) |
|
|
Operator category filter |
Acceptance Summary Output#
The acceptance workflow generates detailed summaries including:
Total operators tested
Pass/fail counts and pass rate
Failed operators list
Performance speedup statistics (avg, median, min, max)
Per-operator status table
Cluster GPU Validation#
Since GitHub Actions runners do not have GPU access, actual GPU validation is performed on the cluster using Slurm.
Standard Slurm Template#
srun -N 1 --job-name <job_name> \
--nodelist <node_name> \
--gres=gpu:<gpu_count> \
--cpus-per-task=$((24*gpu_count)) \
--mem=$((242144*gpu_count)) \
docker exec -w /workspace/FlagGems/flagtensor triton_cuda12 \
bash -lc "<command>"
Cluster Node#
Primary Node:
bjdb-h20-node-038Container:
triton_cuda12Container Path:
/workspace/FlagGems/flagtensor
Artifact Storage#
All workflows upload artifacts for audit and debugging:
Artifact |
Workflow |
Retention |
Contents |
|---|---|---|---|
|
ci.yaml |
30 days |
Smoke correctness results |
|
ci.yaml |
30 days |
Smoke performance results |
|
weekly.yaml |
90 days |
Weekly regression results |
|
acceptance.yaml |
90 days |
Acceptance correctness results |
|
acceptance.yaml |
90 days |
Acceptance performance results |
|
quality-gate.yaml |
30 days |
Wheel/sdist packages |
CI Status Indicators#
GitHub Step Summary#
Workflows report results to GitHub Step Summary:
Quality Gate: Pre-commit status, build status, registry consistency
CI Smoke: Pass/fail table for correctness and performance
Weekly: Operator count and overall status
Acceptance: Detailed pass/fail statistics, speedup analysis
Exit Codes#
0: All checks passed
1: One or more checks failed
Non-zero: Workflow error (e.g., missing dependencies)
CI Best Practices#
Always run quality gate before PR merge
Use smoke mode for rapid iteration
Run acceptance on GPU cluster before release
Review weekly regression results weekly
Keep registry in sync with codebase
Update category benchmark entries when adding operators
CI vs Local Testing#
Aspect |
CI |
Local |
|---|---|---|
Speed |
Fast (CPU structure) |
Variable |
GPU Access |
No |
Yes (via Slurm) |
Coverage |
Smoke |
Full |
Purpose |
Structure validation |
Functional validation |
Recommendation |
Use for PR checks |
Use for acceptance validation |