Reading the agent traces is how you make the call your eval can't
Picking a model for your agent is a tradeoff between cost, quality, and speed. Here's a free-tier model that made things up, what an eval would and wouldn't have caught, and why reading the agent traces is how you actually decide.