Back to Jobs
Featured Listing
Anthropic

Research Engineer, Interpretability

Anthropic
San Francisco, CAFull Time$280K – $480K/yr🛡️ AI Safety
Mechanistic InterpretabilityPyTorchAI SafetyResearchPython

Job Description

Join Anthropic's Interpretability team to understand what's happening inside large language models. You'll work on mechanistic interpretability — reverse-engineering the algorithms learned by neural networks — to make AI systems more transparent, predictable, and safe.

You'll design experiments to probe model internals, build tools for visualizing and analyzing neural network activations, and contribute to research that directly informs how we build safer AI systems.

Requirements

• Strong Python and PyTorch skills • Experience with neural network analysis or visualization • Curiosity about how neural networks work internally • Ability to design and execute rigorous experiments • Research background (publications a plus)

Benefits

• Competitive salary and equity • Comprehensive health benefits • Flexible work arrangements • Annual conference budget • Mission-driven work on AI safety

Job Details

Posted
April 27, 2026
Expires
May 18, 2026
Views
10
Applies
0

About the Company

Anthropic

Anthropic

San Francisco, CA

Anthropic is an AI safety company working to build reliable, interpretable, and steerable AI systems. Creator of Claude.

Research Engineer, Interpretability

Anthropic