Large Language Models & Emotional Intelligence: What the Latest Nature Study Reveals
Introduction Emotional intelligence (EI) —the capacity to perceive, understand, manage, and use emotions effectively—has traditionally been regarded as a human domain rooted in social cognition and self-awareness. A recent study published in Nature’s Communications Psychology reveals that advanced large language models (LLMs), when evaluated on standardised human EI measures, not only perform at a high level but also generate test items with psychometric rigour comparable to those created by human experts. This development indicates a significant shift in how artificial intelligence engages with affective reasoning and social understanding. Study Objectives and Design The research had two primary goals: Evaluate LLM Performance on Established Emotional Intelligence Tests The study administered five widely used ability-based EI assessments to multiple state-ofthe-art LLMs. WANT MORE TRAFFIC IN THE UK? Assess the Ability of LLMs to Generate New EI Test Items One model (ChatGPT-4) was instructed to produce new test items for each of the EI instruments. These AI-generated items were then administered to human participants alongside original items to evaluate psychometric quality. The LLMs assessed included ChatGPT-4, ChatGPT-o1, Gemini 1.5 Flash, Copilot 365, Claude 3.5 Haiku, and DeepSeek V3. Evaluations considered accuracy, item difficulty, internal consistency, and correlation with external measures.
LLM Performance on Emotional Intelligence Assessments LLMs achieved an average accuracy of approximately 81 per cent across five established EI tests, significantly above the historical human baseline average of 56 per cent. Performance varied across models but consistently exceeded typical human scores. Key Observations Models demonstrated strong capabilities in tasks involving emotional understanding, scenario appraisal, and selection of emotion regulation strategies. Performance patterns suggest that LLMs can reason about emotional contexts and outcomes in structured testing environments. Interpretation High performance on standardised EI tests indicates that LLMs have internalised patterns of emotional semantics and situation-based reasoning through training on large corpora. However, test proficiency does not necessarily equate to accurate experiential emotional understanding; rather, it reflects robust pattern recognition and context inference. AI-Generated Test Items: Psychometric Evaluation Beyond solving EI tests, the study examined whether LLMs can generate valid test items with properties comparable to those of established instruments. Findings Difficulty Equivalence: AI-generated items exhibited levels of difficulty statistically similar to original human-crafted items. Internal Consistency: Measures such as Cronbach’s alpha indicated that test reliability remained consistent between original and AI-generated sets. Correlation with External Measures: Scores on AI-generated items correlated meaningfully with other EI instruments, reinforcing construct validity. Human Perception: Differences in realism and diversity between original and generated items were detectable but of small magnitude (effect sizes < 0.25). LLMs can generate EI test items that align closely with psychometric standards. While nuances in item quality and creativity remain distinguishable, AI-generated content passes key statistical thresholds for test construction. Implications for AI and Emotional Intelligence The findings carry implications for multiple domains: 1. AI in Affective and Social Contexts LLMs could be harnessed in applications requiring emotional reasoning: Virtual agents for coaching, counselling, or emotional skills training. Customer service systems capable of sensitive and nuanced responses. Educational tools that adapt feedback based on inferred emotional state. These systems may lack subjective emotional experience, but their proficiency in structured emotional reasoning tasks supports their utility in affective contexts.
2. Assessment Development and Psychometrics AI-assisted generation of test items offers potential efficiencies: Automated creation and preliminary validation of EI assessments. Expansion of item pools with controlled difficulty parameters. Supplementation of expert item writers in large-scale testing programs. The use of AI in test development should be subject to rigorous human oversight and iterative validation. 3. Ethical and Operational Considerations High LLM performance on EI measures invites critical evaluation of deployment risks: Overreliance on AI for emotional engagement without human oversight. Misalignment between algorithmic reasoning and lived human experience. Misinterpretation of model “understanding” versus learned statistical patterns. Developers and stakeholders must clearly define the role of EI-capable models and ensure safeguards against misuse. Conclusion The study demonstrates that advanced LLMs excel at solving and generating items from the Emotional Intelligence Test, with performance that, in many respects, matches or exceeds human baselines. These capabilities expand the paradigm of AI interaction beyond syntactic language tasks into domains historically associated with social cognition. Strategic adoption of these capabilities can enhance digital tools across education, health, and customer engagement, provided that ethical and operational frameworks guide their integration. References Schlegel, K., Sommer, N. R., & Mortillaro, M. “Large language models are proficient in solving and creating emotional intelligence tests.” Communications Psychology 3, Article 80 (2025). https://www.nature.com/articles/s44271-025-00258-x FAQ 1. What does the Nature study reveal about emotional intelligence in AI? The study shows that advanced large language models can solve standardised emotional intelligence tests with significantly higher accuracy than average human benchmarks. It can also generate new test items with comparable psychometric validity. 2. How well do large language models perform compared to humans in emotional intelligence tests? Across multiple established EI assessments, LLMs achieved approximately 81 per cent accuracy, compared to around 56 per cent for human test-takers in historical validation samples. 3. Can AI truly understand emotions like humans do?
AI does not experience emotions subjectively. Its performance reflects advanced pattern recognition, contextual reasoning, and learned emotional semantics rather than lived emotional awareness. 4. Why is AI-generated emotional intelligence testing necessary? AI-generated test items can accelerate assessment development, expand item pools, and support psychometric research, provided they are validated and reviewed by human experts. 5. What are the real-world applications of emotionally intelligent AI? Potential applications include emotionally aware customer support systems, coaching tools, educational platforms, mental health screening assistants, and adaptive user experience systems. DOAGURU INFOSYSTEMS At DOAGuru Infosystems, this research reinforces a critical truth the industry is beginning to acknowledge: the future of AI is not just technical intelligence, but emotional and contextual intelligence. As organisations increasingly adopt AI-driven platforms, DOAGuru focuses on building systems that balance analytical precision with human-centric understanding— whether in digital marketing, automation, CRM solutions, or intelligent customer engagement. Insights from studies like this guide DOAGuru’s approach to designing AIpowered solutions that communicate intelligently, respond empathetically, and align technology with real human behaviour rather than surface-level automation.