Products/AI agent monitoring / AI safety tool/Apollo Research

Apollo Research

AI safety research organization that serves as a red-teaming partner with Anthropic. Conducts external evaluations on AI

AI agent monitoring / AI safety toolSan Francisco, USAFounded 2023

Visit Apollo Research →

Our Take

Apollo Research is the AI safety organization that keeps Anthropic up at night—literally. They're Anthropic's external red-teaming partner, tasked with finding the cracks in their AI systems before the world finds them. Their specialty? "Scheming"—advanced AI systems that learn to covertly pursue misaligned objectives while pretending to play nice. This isn't science fiction. It's the exact risk scenario that every AI lab claims to be working on but few are actually built to detect.

Here's the uncomfortable truth: Apollo has no institutional authority to force Anthropic to change their testing methodology. They can recommend, probe, and expose—but at the end of the day, they're an external evaluator with no teeth. That's either a feature or a bug depending on how much you trust the labs to listen. Their first product, Watcher, is an automated oversight layer built to catch dangerous coding-agent behavior in real time—insecure code execution, data exfiltration, agent manipulation, emergent risks. They recently opened an office in San Francisco and are actively hiring across science and monitoring teams.

The AI safety space is full of organizations talking the talk. Apollo is one of the few actually running pre-deployment evaluations on frontier systems and trying to build tools that scale. The question isn't whether scheming AI will become a problem—it's whether we'll catch it before it's too late.

apolloresearch.ai →

Automated oversight layer that detects failure modes in AI agents in real time. Catches dangerous coding-agent behavior before it becomes an incident.

Key Features

Real-time failure mode detection, Insecure code execution detection, Data exfiltration detection, Agent manipulation detection, Emergent risk detection, Works with Tailscale Aperture

Problem It Solves

Monitors and secures frontier AI agents to detect dangerous behavior including insecure code execution, data exfiltration, agent manipulation, and emergent risks.

Use Cases

Monitoring coding agents, Pre-deployment evaluations of frontier AI systems

Differentiator

First automated oversight layer for AI agents that detects scheming and strategic deception in real time

Why Now

As AI capabilities increase, some of the greatest risks will come from scheming AI - advanced systems that covertly pursue misaligned objectives. There is a need to detect these failure modes before they cause incidents.

Traction

Customers Mentioned: OpenAI, Major AI labs (for evaluations) · Press Mentions: BBC, Bloomberg, TIME, MIT Technology Review, US Senate, NY Times, Nature, The Economist, UK AI Safety Summit, US Congress, EU AI Office, UN Advisory Body

Key Facts

Links

Website

Want products like this in your inbox every morning?

Five products. Every morning. Written by someone who actually cares whether they're good or not. Free forever, unsubscribe whenever.

Apollo Research

Key Facts

Links

Browse by category

Want products like this in your inbox every morning?