GSM1k AI News List | Blockchain.News
AI News List

List of AI News about GSM1k

Time Details
2026-02-04
09:35
Latest Analysis: Phi and Mistral Models Show 13% Accuracy Drop on GSM1k vs GSM8k, Revealing Memorization Issues

According to God of Prompt on Twitter, recent testing shows that the Phi and Mistral models experienced a significant 13% accuracy drop when evaluated on the GSM1k benchmark compared to GSM8k. Some model variants saw drops as high as 13.4 percentage points. The analysis suggests these models are not demonstrating true reasoning abilities but rather memorization, as they were exposed to the correct answers during training. This finding highlights critical concerns about the generalization and reliability of these AI models for business and research applications.

Source