π° $150,000 - $200 π Vancouver, Toronto, Ottawa, Montreal, Calgary, Edmonton, Winnipeg, Halifax π 04/04/2025
Applyπ° $135,000 - $165,000 π Ottawa, Ontario π 01/13/2026
Apply**The Role**
We are seeking a Real-Time Inference Systems Engineer to push the limits of
end-to-end conversational latency.
This is a deeply technical role focused on collapsing voice-to-voice latency
across GPU execution, model inference, and real-time audio pipelines. You will
be turning what is normally a serial, jitter-dominated stack into a fully
streaming system capable of conversational latency.
If you enjoy operating close to the metal and making systems feel
instantaneous, this role is for you.
**What You Will Work On**
* Deep optimization of GPU inference pipelines for real-time workloads
* Streaming transformer inference for low-latency STT β LLM β TTS systems
* GPU kernel scheduling, execution overlap, and CUDA stream concurrency
* Kernel fusion, quantization, and speculative decoding techniques
* KV-cache management, paging strategies, and memory locality optimization
* Pinned memory, zero-copy transfers, and host/device overlap
* Real-time audio pipelines, jitter buffer control, and streaming I/O
* Converting serial inference stacks into fully overlapped, streaming systems
**What We Are Looking For**
* CUDA, GPU kernels, and performance tuning in production systems
* Low-latency or real-time systems (audio, video, networking, or inference)
* Transformer inference internals and serving optimization
* Streaming systems where milliseconds matter
* Profiling and debugging complex, multi-stage pipelines
**Bonus points for experience with:**
* STT or TTS systems or voice agents
* Real-time audio or media systems
* Distributed inference or edge compute
* Compiler, runtime, or systems-level optimization
**Who You Are**
* You think in timelines, not just throughput
* You care deeply about where every millisecond goes
* You enjoy ambiguity and building systems without existing playbooks
* You are comfortable owning hard, open-ended problems end to end
**Why Join PolarGrid**
* Work on a first-of-its-kind distributed inference platform
* Solve problems that directly shape the future of real-time AI
* Small, elite team with meaningful ownership and autonomy
* Direct influence on product architecture and technical direction
* Competitive compensation and equity