Products/AI agent monitoring / AI safety tool/Apollo Research

Apollo Research

AI safety research organization that serves as a red-teaming partner with Anthropic. Conducts external evaluations on AI

AI agent monitoring / AI safety toolSan Francisco, USAFounded 2023
Apollo Research

Our Take

Apollo Research is the AI safety organization that keeps Anthropic up at night—literally. They're Anthropic's external red-teaming partner, tasked with finding the cracks in their AI systems before the world finds them. Their specialty? "Scheming"—advanced AI systems that learn to covertly pursue misaligned objectives while pretending to play nice. This isn't science fiction. It's the exact risk scenario that every AI lab claims to be working on but few are actually built to detect.

Here's the uncomfortable truth: Apollo has no institutional authority to force Anthropic to change their testing methodology. They can recommend, probe, and expose—but at the end of the day, they're an external evaluator with no teeth. That's either a feature or a bug depending on how much you trust the labs to listen. Their first product, Watcher, is an automated oversight layer built to catch dangerous coding-agent behavior in real time—insecure code execution, data exfiltration, agent manipulation, emergent risks. They recently opened an office in San Francisco and are actively hiring across science and monitoring teams.

The AI safety space is full of organizations talking the talk. Apollo is one of the few actually running pre-deployment evaluations on frontier systems and trying to build tools that scale. The question isn't whether scheming AI will become a problem—it's whether we'll catch it before it's too late.

Automated oversight layer that detects failure modes in AI agents in real time. Catches dangerous coding-agent behavior before it becomes an incident.

Key Features
Real-time failure mode detection, Insecure code execution detection, Data exfiltration detection, Agent manipulation detection, Emergent risk detection, Works with Tailscale Aperture
Problem It Solves
Monitors and secures frontier AI agents to detect dangerous behavior including insecure code execution, data exfiltration, agent manipulation, and emergent risks.
Use Cases
Monitoring coding agents, Pre-deployment evaluations of frontier AI systems
Differentiator
First automated oversight layer for AI agents that detects scheming and strategic deception in real time
Why Now
As AI capabilities increase, some of the greatest risks will come from scheming AI - advanced systems that covertly pursue misaligned objectives. There is a need to detect these failure modes before they cause incidents.
Traction
Customers Mentioned: OpenAI, Major AI labs (for evaluations) · Press Mentions: BBC, Bloomberg, TIME, MIT Technology Review, US Senate, NY Times, Nature, The Economist, UK AI Safety Summit, US Congress, EU AI Office, UN Advisory Body

Key Facts

Category
AI agent monitoring / AI safety tool
Location
San Francisco, USA
Founded
2023
Discovered via
newsletter:Substack newsletter

Links

Browse by category

Want products like this in your inbox every morning?

Five products. Every morning. Written by someone who actually cares whether they're good or not. Free forever, unsubscribe whenever.

Apollo Research — SLAYREPORT