VP of Engineering at PolarGrid

@ [email protected]
Gmail: 📧 Copy: 📋 Bounce: 🚫

VP of Engineering

💰 $150,000 - $200 🌍 Vancouver, Toronto, Ottawa, Montreal, Calgary, Edmonton, Winnipeg, Halifax 📅 04/04/2025

Apply

Real-Time Inference Systems Engineer

💰 $135,000 - $165,000 🌍 Ottawa, Ontario 📅 01/13/2026

Apply

Job Description

**The Role**

We are seeking a Real-Time Inference Systems Engineer to push the limits of
end-to-end conversational latency.

This is a deeply technical role focused on collapsing voice-to-voice latency
across GPU execution, model inference, and real-time audio pipelines. You will
be turning what is normally a serial, jitter-dominated stack into a fully
streaming system capable of conversational latency.

If you enjoy operating close to the metal and making systems feel
instantaneous, this role is for you.

**What You Will Work On**

* Deep optimization of GPU inference pipelines for real-time workloads
* Streaming transformer inference for low-latency STT → LLM → TTS systems
* GPU kernel scheduling, execution overlap, and CUDA stream concurrency
* Kernel fusion, quantization, and speculative decoding techniques
* KV-cache management, paging strategies, and memory locality optimization
* Pinned memory, zero-copy transfers, and host/device overlap
* Real-time audio pipelines, jitter buffer control, and streaming I/O
* Converting serial inference stacks into fully overlapped, streaming systems

**What We Are Looking For**

* CUDA, GPU kernels, and performance tuning in production systems
* Low-latency or real-time systems (audio, video, networking, or inference)
* Transformer inference internals and serving optimization
* Streaming systems where milliseconds matter
* Profiling and debugging complex, multi-stage pipelines

**Bonus points for experience with:**

* STT or TTS systems or voice agents
* Real-time audio or media systems
* Distributed inference or edge compute
* Compiler, runtime, or systems-level optimization

**Who You Are**

* You think in timelines, not just throughput
* You care deeply about where every millisecond goes
* You enjoy ambiguity and building systems without existing playbooks
* You are comfortable owning hard, open-ended problems end to end

**Why Join PolarGrid**

* Work on a first-of-its-kind distributed inference platform
* Solve problems that directly shape the future of real-time AI
* Small, elite team with meaningful ownership and autonomy
* Direct influence on product architecture and technical direction
* Competitive compensation and equity