Back to Jobs
Cohere

Senior Software Engineer, Inference

Cohere
HybridFull Time$160K – $250K/yr🖥️ AI Infrastructure
inferenceCUDAvLLMTensorRTGPUC++model serving

Job Description

Join Cohere's Inference team to build the systems that serve our language models to enterprise customers at scale. You'll work on optimizing model serving latency and throughput, building efficient batching systems, and developing the infrastructure that powers Cohere's API.

You'll work with cutting-edge hardware (H100s, A100s) and software (TensorRT, vLLM, custom CUDA kernels) to push the boundaries of inference efficiency.

Requirements

  • 5+ years of backend engineering experience
  • Strong C++ and Python skills
  • Experience with GPU programming (CUDA, Triton)
  • Familiarity with model serving frameworks (TensorRT, vLLM, Triton Inference Server)
  • Understanding of transformer model architectures
  • Experience with distributed systems

Benefits

  • Competitive Canadian salary
  • Equity in a Series C company
  • Comprehensive health benefits
  • Flexible hybrid work
  • Toronto office in a great location

Job Details

Posted
April 5, 2026
Expires
May 5, 2026
Views
0
Applies
0

About the Company

Cohere

Cohere

Toronto, Canada

Cohere provides access to advanced Large Language Models and NLP tools through one easy-to-use API.