Senior Software Engineer, Inference

HybridFull Time$160K – $250K/yr🖥️ AI Infrastructure

inferenceCUDAvLLMTensorRTGPUC++model serving

Job Description

Join Cohere's Inference team to build the systems that serve our language models to enterprise customers at scale. You'll work on optimizing model serving latency and throughput, building efficient batching systems, and developing the infrastructure that powers Cohere's API.

You'll work with cutting-edge hardware (H100s, A100s) and software (TensorRT, vLLM, custom CUDA kernels) to push the boundaries of inference efficiency.

Requirements

5+ years of backend engineering experience
Strong C++ and Python skills
Experience with GPU programming (CUDA, Triton)
Familiarity with model serving frameworks (TensorRT, vLLM, Triton Inference Server)
Understanding of transformer model architectures
Experience with distributed systems