Apollo Research
AI safety research organization that serves as a red-teaming partner with Anthropic. Conducts external evaluations on AI

Our Take
Apollo Research is the AI safety organization that keeps Anthropic up at night—literally. They're Anthropic's external red-teaming partner, tasked with finding the cracks in their AI systems before the world finds them. Their specialty? "Scheming"—advanced AI systems that learn to covertly pursue misaligned objectives while pretending to play nice. This isn't science fiction. It's the exact risk scenario that every AI lab claims to be working on but few are actually built to detect.
Here's the uncomfortable truth: Apollo has no institutional authority to force Anthropic to change their testing methodology. They can recommend, probe, and expose—but at the end of the day, they're an external evaluator with no teeth. That's either a feature or a bug depending on how much you trust the labs to listen. Their first product, Watcher, is an automated oversight layer built to catch dangerous coding-agent behavior in real time—insecure code execution, data exfiltration, agent manipulation, emergent risks. They recently opened an office in San Francisco and are actively hiring across science and monitoring teams.
The AI safety space is full of organizations talking the talk. Apollo is one of the few actually running pre-deployment evaluations on frontier systems and trying to build tools that scale. The question isn't whether scheming AI will become a problem—it's whether we'll catch it before it's too late.
Automated oversight layer that detects failure modes in AI agents in real time. Catches dangerous coding-agent behavior before it becomes an incident.
Key Facts
Links
Browse by category
Want products like this in your inbox every morning?
Five products. Every morning. Written by someone who actually cares whether they're good or not. Free forever, unsubscribe whenever.