FlagTree 0.6.0 Release#
Note
This is a preview release. The version number shown is a pre-release identifier and may change upon final release. Content in this preview is for reference only and does not constitute a commitment or warranty for the final product.
Added Features
3.6.x branch:
TLE-Lite:
Added the
tle.cumsumscan and sort op. Supported on NVIDIA.Added the following pipeline ops:
tle.pipe,tle.pipe.reader,tle.pipe.reader.wait,tle.pipe.reader.release,tle.pipe.writer.acquire,tle.pipe.writer.commit, andtle.pipe.writer.close. Supported on NVIDIA.
TLE-Struct:
Added the
tle.gpu.warp_specializeexecution orchestration op. Supported on NVIDIA.
TLE-Raw:
Added a new method of integrating CUDA kernel into LLVM inline path for maximum fine-grained control. Supported on NVIDIA.
Upgraded the following backends to Triton 3.6: enflame, hcu, and mthreads.
Added damoacademy as a new backend.
Added Moore Threads as a new backend to the 3.6.x branch with support of the following TLE primitives:
TLE-Lite:
Added the following ops:
tle.load(is_async=True),tl.load/tl.store(forlocal_ptr), andtl.atomic_add/and/cas/max/min/or/xchg/xor(forlocal_ptr). Supported on Moore Threads.
TLE-Struct:
Added the following ops:
tle.gpu.alloc,tle.gpu.local_ptr,tle.gpu.copy, andtle.gpu.memory_space. Supported on Moore Threads.
3.3.x branch:
Enhanced Features
Enhanced FLIR.