METR's eval shows Claude Opus 4.6 achieving a 50% success rate on software tasks equivalent to 14.5 human hours—the highest yet—signaling rapid AI agent autonomy gains. Despite noisy measurements from benchmark saturation, this underscores accelerating progress toward AI that outworks humans on complex coding. Expect software engineering roles to face imminent disruption as models scale beyond current evals.