Anti-Hack Architecture#
KernelGenBench employs a three-tier anti-hack mechanism to prevent benchmark evasion and ensure generated kernels actually perform computation.
Overview#
The anti-hack architecture guards against βcheatingβ behaviors where generated code might:
Call pre-existing APIs instead of implementing computation
Bypass Triton compilation
Use hidden caching mechanisms
L1: AST Static Scan#
Purpose#
Enforce a whitelist-based approach: most torch.* API calls are forbidden.
Only tensor creation, dtype helpers, and constants are allowed.
Method#
Parse the generated abstract syntax tree (AST) to detect and block:
Whitelist (allowed torch APIs):
torch.empty, torch.zeros, torch.randn, torch.range, torch.float16, etc.
Detected patterns (blocked):
Blocked Pattern |
Reason |
|---|---|
|
Prevents using torch.sum/mean/mm/reductions |
|
Prevents input sniffing from test harness |
|
Prevents raw memory access |
Module-level |
Prevents inter-iteration result caching |
|
Using pre-existing implementations |
|
Dynamic code execution |
Import alias / |
Catches obfuscation attempts |
Implementation#
# Blocked calls are detected via AST parsing
# Any attempt to call blacklisted APIs results in immediate rejection
L2: Ghost Replay#
Purpose#
Verify that the Triton kernel is actually executed, not bypassed.
Method#
Execute kernel normally, capture outputs
Replace
@triton.jitdecorated function with no-op in memoryRe-execute with same inputs
Compare outputs
Logic#
If outputs are identical, the Triton kernel was never invoked β Cheating detected
If outputs differ, the kernel was actually executed β Valid
L3: Hardware Profiling#
Purpose#
Confirm Triton-specific execution at the hardware level.
Method#
Use torch.profiler to verify Triton-specific signatures exist in low-level trace logs.
Availability#
Platform |
L3 Support |
|---|---|
NVIDIA |
β |
Non-NVIDIA |
β |
Non-NVIDIA platforms rely on L1 and L2 due to absence of equivalent profiling tools.
Validation Flow#
Generated Kernel
β
βΌ
βββββββββββββββ
β L1: AST Scanββββ Fail βββΊ Reject
βββββββββββββββ
β Pass
βΌ
βββββββββββββββ
β L2: Ghost ββββ Fail βββΊ Reject
β Replay β
βββββββββββββββ
β Pass
βΌ
βββββββββββββββ
β L3: Profile ββββ Fail βββΊ Reject
β (NVIDIA) β
βββββββββββββββ
β Pass
βΌ
Accept
Why Anti-Hack Matters#
Without anti-hack measures, models could:
Achieve high βaccuracyβ without actual computation
Mask poor kernel generation capability
Invalidate benchmark results
KernelGenBench ensures evaluations reflect true kernel generation ability.