AI benchmark results AI News List

AI benchmark results AI News List | Blockchain.News

AI News List

List of AI News about AI benchmark results

Time	Details
2025-11-10 18:13	Moonshot’s Kimi K2: China Unveils $4.6M Open-Source AI Model Surpassing GPT-5 in Key Benchmarks According to @godofprompt, Chinese AI startup Moonshot has released the Kimi K2 model, a 1 trillion-parameter AI trained for $4.6 million, significantly less than the billions spent by US labs on models like GPT-5. Kimi K2 outperformed OpenAI’s flagship on critical benchmarks, achieving 44.9% on 'humanity’s last exam' compared to proprietary models, and leading in agentic browsing tasks with 60.2% versus GPT-5’s 54.9%. The model executes 200-300 tool calls autonomously, highlighting advancements in reasoning and automation. Unlike many closed US models, Kimi K2 is open-source under a modified MIT license, with 32B active parameters per token, native int4 quantization for double speed, and a 256k context window, making it accessible for commercial AI applications on affordable hardware. This launch demonstrates a shift in the AI race, showing that rapid deployment and open access can rival, or even surpass, high-budget proprietary efforts, creating new business opportunities for AI-driven products and services (source: @godofprompt, Nov 10, 2025). Source
2025-08-01 11:10	AI Model Achieves State-of-the-Art Performance on LiveCodeBench V6 and Humanity’s Last Exam Benchmarks According to @OpenAI, a new AI model has achieved state-of-the-art results compared to other models without tool use, excelling in LiveCodeBench V6—a benchmark that rigorously tests competitive code generation—and Humanity’s Last Exam, which assesses model expertise across challenging domains such as science and mathematics. This performance demonstrates significant advancements in AI’s ability to solve complex, real-world problems without external tool assistance, highlighting new opportunities for deploying AI in enterprise coding, education, and technical domains (source: OpenAI, 2024). Source

Time

Details

2025-11-10
18:13

Moonshot’s Kimi K2: China Unveils $4.6M Open-Source AI Model Surpassing GPT-5 in Key Benchmarks

According to @godofprompt, Chinese AI startup Moonshot has released the Kimi K2 model, a 1 trillion-parameter AI trained for $4.6 million, significantly less than the billions spent by US labs on models like GPT-5. Kimi K2 outperformed OpenAI’s flagship on critical benchmarks, achieving 44.9% on 'humanity’s last exam' compared to proprietary models, and leading in agentic browsing tasks with 60.2% versus GPT-5’s 54.9%. The model executes 200-300 tool calls autonomously, highlighting advancements in reasoning and automation. Unlike many closed US models, Kimi K2 is open-source under a modified MIT license, with 32B active parameters per token, native int4 quantization for double speed, and a 256k context window, making it accessible for commercial AI applications on affordable hardware. This launch demonstrates a shift in the AI race, showing that rapid deployment and open access can rival, or even surpass, high-budget proprietary efforts, creating new business opportunities for AI-driven products and services (source: @godofprompt, Nov 10, 2025).

Source

2025-08-01
11:10

AI Model Achieves State-of-the-Art Performance on LiveCodeBench V6 and Humanity’s Last Exam Benchmarks

According to @OpenAI, a new AI model has achieved state-of-the-art results compared to other models without tool use, excelling in LiveCodeBench V6—a benchmark that rigorously tests competitive code generation—and Humanity’s Last Exam, which assesses model expertise across challenging domains such as science and mathematics. This performance demonstrates significant advancements in AI’s ability to solve complex, real-world problems without external tool assistance, highlighting new opportunities for deploying AI in enterprise coding, education, and technical domains (source: OpenAI, 2024).

Source