June 2, 2025

Top AI Models Flunk Graduate-Level History Exam

 Study Finds 
  • Even the most advanced AI models, like GPT-4-Turbo, failed to demonstrate expert-level understanding of global history, scoring only 46% on a rigorous benchmark test designed for graduate-level inquiry.
  • AI models performed better on ancient history than more recent events, and consistently struggled with regions outside the Western world, especially Sub-Saharan Africa and Oceania, highlighting biases in training data.
  • The study underscores a major limitation of current AI: while they excel at surface-level fact recall, they lack the deep contextual reasoning and global coverage needed for sophisticated historical analysis.

 

No comments: