HLE AI News List

HLE AI News List | Blockchain.News

AI News List

List of AI News about HLE

Time	Details
2026-02-12 21:01	Gemini 3 Deep Think Sets New Benchmark Records: 84.6% ARC-AGI-2, 48.4% HLE, 3455 Codeforces Elo — 2026 Analysis According to Demis Hassabis on X (Twitter), Google DeepMind’s Gemini 3 Deep Think achieved 84.6% on ARC-AGI-2, 48.4% on Humanity’s Last Exam without tools, and a 3455 Elo rating on Codeforces, setting new records in math, science, and reasoning benchmarks. As reported by the post, these scores signal stronger generalization and competitive programming ability, which can translate to higher reliability in enterprise workflows like scientific analysis, code synthesis, and automated testing. According to the announcement, outperforming prior state-of-the-art on ARC-AGI-2 and reaching 3455 Elo positions Gemini 3 Deep Think as a top contender for tasks demanding multi-step reasoning, offering businesses opportunities to cut cycle times in R&D, accelerate software delivery, and reduce inference retries in production LLM pipelines. Source

Time

Details

2026-02-12
21:01

Gemini 3 Deep Think Sets New Benchmark Records: 84.6% ARC-AGI-2, 48.4% HLE, 3455 Codeforces Elo — 2026 Analysis

According to Demis Hassabis on X (Twitter), Google DeepMind’s Gemini 3 Deep Think achieved 84.6% on ARC-AGI-2, 48.4% on Humanity’s Last Exam without tools, and a 3455 Elo rating on Codeforces, setting new records in math, science, and reasoning benchmarks. As reported by the post, these scores signal stronger generalization and competitive programming ability, which can translate to higher reliability in enterprise workflows like scientific analysis, code synthesis, and automated testing. According to the announcement, outperforming prior state-of-the-art on ARC-AGI-2 and reaching 3455 Elo positions Gemini 3 Deep Think as a top contender for tasks demanding multi-step reasoning, offering businesses opportunities to cut cycle times in R&D, accelerate software delivery, and reduce inference retries in production LLM pipelines.

Source