All insights
SafetyMay 7, 2026· 5 min read

Agents that don't go rogue

SASofia Almeida

Autonomous agents are powerful and terrifying. The guardrail patterns we use to ship agents that are aggressive about work and conservative about risk.

An agent with tools is an agent that can do damage. The goal isn't to make agents timid — it's to make them aggressive within a fence you trust.

We lean on three patterns: hard constraints the model literally cannot violate, human-in-the-loop gates on high-stakes actions, and a kill switch that an operator can hit in one click.

Pair that with thorough logging and you get the best of both worlds: an agent that takes real work off your plate, and a paper trail that lets you sleep at night.

Read next

Evals are the product

Continue