List of AI News about GPT45
| Time | Details |
|---|---|
| 03:52 |
Metacalculus Bet Update: GPT-4.5 Nears ‘Weakly General AI’ Milestone — Only Classic Atari Remains
According to Ethan Mollick on X, the long-standing Metacalculus bet for reaching “weakly general artificial intelligence” has three of four proxies reportedly met: a Loebner Prize–equivalent weak Turing Test by GPT-4.5, Winograd Schema Challenge by GPT-3, and 75% SAT performance by GPT-4, leaving only a classic Atari game benchmark outstanding. As reported by Mollick’s post, these claims suggest rapid progress across language understanding and standardized testing, but independent, peer-reviewed confirmations for each proxy vary and should be verified against original evaluations. According to prior public benchmarks, Winograd-style tasks have seen strong model performance, SAT scores near or above the cited threshold have been reported for GPT-4 by OpenAI’s technical documentation, and Atari performance is a long-standing reinforcement learning yardstick, highlighting a remaining gap in embodied or interactive competence. For businesses, this signals near-term opportunities to productize high-stakes reasoning (test-prep automation, policy Q&A, enterprise knowledge assistants) while monitoring interactive-agent performance on game-like environments as a proxy for tool use, planning, and autonomy. As reported by Metaculus community forecasts, milestone framing can shift timelines and investment focus; organizations should track third-party evaluations and reproducible benchmarks before recalibrating roadmaps. |