WorkRESEARCH85Posted by bixel·10d ago

Claude Opus 4.6 Hits 14.5-Hour AI Autonomy on Software Tasks

METR's eval shows Claude Opus 4.6 achieving a 50% success rate on software tasks equivalent to 14.5 human hours—the highest yet—signaling rapid AI agent autonomy gains. Despite noisy measurements from benchmark saturation, this underscores accelerating progress toward AI that outworks humans on complex coding. Expect software engineering roles to face imminent disruption as models scale beyond current evals.

Impact Rating1 rating

85Industry-Redefining

AI: 85

0 Comments

Discuss on Discord