Millions of people ask a machine every day what is true. The machine always answers. Whether it knows or not.
We rely on systems whose answers sound plausible. Whether they are right or not, almost no one checks.
More than a billion people each week ask a handful of systems of the same kind. How do I fix this code, what helps against a migraine, is this dismissal lawful, what should I give my child for their 14th birthday, is my behaviour in this relationship normal. The answers come at once, in full sentences, in the tone of someone who knows. They are often useful. Sometimes they are wrong. Which belong in which category you cannot tell from the outside.
Language models know nothing. They calculate which word is likely to come next, weighed against the billions of texts they saw in training. Out of that come sentences that sound as if they came from someone who understood the topic. Sometimes they match reality, sometimes they don’t. The model itself has no access to the difference. It is trained to come across as helpful, not to know things correctly. “I don’t know” rarely comes back as an answer, because “I don’t know” scores worse in the evaluation systems than a plausible guess. This property is built in.
In a single case it is often harmless. Anyone baking a cake who gets the wrong ratio of flour to sugar notices it at the latest on the first bite. It is different when the question is the diagnosis of a symptom, the reading of a contract, the building of a political position. There the immediate feedback is missing. Plausible falsehood stays standing. And it doesn’t stay with one person. When millions get the same wrong note from the same machine, what a society treats as consensus knowledge shifts. Not through deliberate manipulation. Through statistics.
Misinformation now scales faster than anyone can check it, and it arrives in the tone of full certainty. Against that we had a cultural skill, practiced in school and in the media: check your sources, hold claims against evidence, distrust a single authority. That skill is being replaced by a machine that never admits it doesn’t know. Most people don’t notice. Schools don’t teach it. Politics hasn’t even named the problem yet.
We need to talk about this
How often do the models really hallucinate?
The US company Vectara has run a live leaderboard since 2023 that measures how often language models, when summarising short documents, invent content that is not in the text. The testing tool is called HHEM-2.3, and it checks more than 7,700 articles from news, law, medicine, science, sport, business, education and tech. On longer, ambiguous texts that come closer to real use, larger models like Claude Sonnet 4.5, Grok-4 or o3-Pro land, depending on configuration, somewhere between ten and over twenty percent. The best models drop below two percent only on short, clearly structured texts.
These are hallucinations under ideal conditions: the source text is right there, and the instruction is explicitly to summarise only that text. Without a source, on open questions, the rates are higher. OpenAI itself confirmed in a September 2025 paper that hallucinations are not a bug but a consequence of how models are trained and evaluated.
What this means day to day
A Kaiser Family Foundation survey from the summer of 2024, covering 2,428 US adults, found that 17 percent use chatbots at least once a month to get medical information. Among the under-30s it is 25 percent. At the same time, 56 percent of the population say they could not reliably tell whether the answers are correct. Among the users themselves it is still about half.
In law there are documented cases of lawyers putting ChatGPT-invented precedents into their filings, and they keep happening. The best-known was Mata v. Avianca in 2023 (a 5,000 dollar fine). In 2025 a California lawyer was fined 10,000 dollars; 21 of the 23 citations in his appellate brief were fabricated. In Johnson v. Dunn, a US district court in Alabama found fines insufficient and barred the lawyers from the case.
Why this won't fix itself
According to OpenAI, ChatGPT reaches around 900 million weekly users in February 2026, up from 500 million in March 2025. In a single year usage has nearly doubled. Add Claude, Gemini, Llama, Mistral and dozens of smaller models, built into search engines, office applications, customer service, school platforms.
Schools and universities react helplessly. Some ban, some tolerate, some integrate. What children and students are rarely taught: that a language model is not a source of knowledge but a probability generator for sentences that sound like an answer.
Sources
- Vectara Hallucination Leaderboard (GitHub)
- Kalai et al.: Why Language Models Hallucinate (arXiv, September 2025)
- KFF Tracking Poll: Use of AI For Health Information and Advice (August 2024)
- TechCrunch: ChatGPT reaches 900M weekly active users (February 2026)
- Mata v. Avianca, Inc.: Court Order (S.D.N.Y. 2023, via CourtListener)
- CalMatters: California lawyer fined for ChatGPT-generated brief (September 2025)