The illusion of judgement in Large Language Models

A study by Sapienza University's Department of Computer Science, published in PNAS, reveals the risks arising from growing confidence in the evaluative capabilities of AI platforms

Large Language Models (LLMs) are increasingly being used not only to search for information, but also to understand, evaluate and make decisions in various fields. However, what appears to be an alignment with human or expert judgements may conceal a more profound change: the tendency, especially among younger generations, to delegate not only information, but also the very structure of judgement and, more generally, the cognitive process.

These results, published in the journal PNAS, were obtained by a team of researchers from the Department of Computer Science of Sapienza University, coordinated by Walter Quattrociocchi, through a series of experiments involving six of the main language models currently in use and groups of human evaluators, including experts.

In an initial experiment, everyone, models and people alike, was given the same task: to judge the credibility of hundreds of news sites. The results showed that the models' judgements were similar to those of the experts, at least on the surface, but the processes were different. LLMs are based on lexical associations, not critical reasoning. They identify keywords and frequent signals; they do not read content as a person would. And when they produce an explanation, they are not arguing: they are statistically extending an instruction.

Another experiment led to the identification of systematic political biases and structural divergence in the judgement criteria adopted by the models. This is because they reflect the dominant patterns in the data on which they were trained. In academic, journalistic and digital environments, certain political positions are more frequently treated critically. The models learn this pattern and reproduce it without understanding it.

Finally, the study investigated what happens when LLMs behave as agents, i.e. they collect information, select sources, combine responses, and make decisions on a modular basis.

The authors, in collaboration with researchers from the Department of Psychology, constructed an experiment in which both models and humans operated as agents.

Here too, it emerged that while people use rhetorical, stylistic and emotional criteria, models rely on structural traces and lexical signals associated with reputation or ideology.

The results of this experimental study have therefore highlighted how AI platforms influence what millions of people consider to be “true”. Understanding how models judge reliability means going beyond the issue of disinformation (infodemic) to address the new challenge of epistemic: the illusion of knowledge created by the linguistic plausibility of AI systems.

References:
Di Marco Niccolò, Edoardo Loru, Anita Bonetti, Alessandra Olga Grazia Serra, Matteo Cinelli, and Walter Quattrociocchi. "Patterns of linguistic simplification on social media platforms over time." Proceedings of the National Academy of Sciences121, no. 50 (2024): e2412105121
DOI: https://doi.org/10.1073/pnas.2412105121

Wednesday, 15 October 2025

© Sapienza Università di Roma - Piazzale Aldo Moro 5, 00185 Roma - (+39) 06 49911 - CF 80209930587 PI 02133771002