Slashing GPU Kernel Launch Overhead in LLM Inference | Berlin .

Members-Only

Recent Talks & Demos are for members only

Exclusive feed

You must be an AI Tinkerers active member to view these talks and demos.

August 13, 2025 · Berlin

Slashing LLM Kernel Overhead

Explore a Rust multi‑process approach that reduces GPU kernel launches via lock‑free shared‑memory batching, achieving 90% launch reduction and 22% speedup.

Overview
Links
Tech stack