Topics

Are bad incentives to blame for AI hallucinations?

Join 10k+ tech and VC leaders for growth and connections at Disrupt 2025

Netflix, Box, a16z, ElevenLabs, Wayve, Sequoia Capital, Elad Gil — just some of the 250+ heavy hitters leading 200+ sessions designed to deliver the insights that fuel startup growth and sharpen your edge. Don’t miss the 20th anniversary of TechCrunch, and a chance to learn from the top voices in tech. Grab your ticket before Sept 26 to save up to $668.

Join 10k+ tech and VC leaders for growth and connections at Disrupt 2025

Netflix, Box, a16z, ElevenLabs, Wayve, Sequoia Capital, Elad Gil — just some of the 250+ heavy hitters leading 200+ sessions designed to deliver the insights that fuel startup growth and sharpen your edge. Don’t miss the 20th anniversary of TechCrunch, and a chance to learn from the top voices in tech. Grab your ticket before Sept 26 to save up to $668.

Most Popular

Scale AI’s former CTO launches AI agent that could solve big data’s biggest problem

OpenAI announces AI-powered hiring platform to take on LinkedIn

Tesla’s 4th ‘Master Plan’ reads like LLM-generated nonsense

BMW, I am so breaking up with you

US and Indian VCs just formed a $1B+ alliance to fund India’s deep tech startups

Homicide at Burning Man turns Silicon Valley’s desert playground into a crime scene

I’m really impressed with this $400 portable projector

Latest

Amazon

Apps

Biotech & Health

Climate

Cloud Computing

Commerce

Crypto

Enterprise

EVs

Fintech

Fundraising

Gadgets

Gaming

Google

Government & Policy

Hardware

Instagram

Layoffs

Media & Entertainment

Meta

Microsoft

Privacy

Robotics

Security

Social

Space

Startups

TikTok

Transportation

Venture

Events

Startup Battlefield

StrictlyVC

Newsletters

Podcasts

Videos

Partner Content

TechCrunch Brand Studio

Crunchboard

Are bad incentives to blame for AI hallucinations? Anthony Ha PM PDT · September 7, 2025 A new research paper from OpenAI asks why large language models like GPT-5 and chatbots like ChatGPT still hallucinate, and whether anything can be done to reduce those hallucinations.

In a blog post summarizing the paper, OpenAI defines hallucinations as “plausible but false statements generated

To illustrate the point, researchers say that when they asked “a widely used chatbot” about the title of Adam Tauman Kalai’s Ph.D. dissertation, they got three different answers, all of them wrong. (Kalai is one of the paper’s

How can a chatbot be so wrong — and sound so confident in its wrongness? The researchers suggest that hallucinations arise, in part, because of a pretraining process that focuses on getting models to correctly predict the next word, without true or false labels attached to the training statements: “The model sees only positive examples of fluent language and must approximate the overall distribution.”

“Spelling and parentheses follow consistent patterns, so errors there disappear with scale,” they write. “But arbitrary low-frequency facts, like a pet’s birthday, cannot be predicted from patterns alone and hence lead to hallucinations.”

The paper’s proposed solution, however, focuses less on the initial pretraining process and more on how large language models are evaluated. It argues that the current evaluation models don’t cause hallucinations themselves, but they “set the wrong incentives.”

The researchers compare these evaluations to the kind of multiple choice tests random guessing makes sense, because “you might get lucky and be right,” while leaving the answer blank “guarantees a zero.”

Techcrunch event Join 10k+ tech and VC leaders for growth and connections at Disrupt 2025 Netflix, Box, a16z, ElevenLabs, Wayve, Sequoia Capital, Elad Gil — just some of the 250+ heavy hitters leading 200+ sessions designed to deliver the insights that fuel startup growth and sharpen your edge. Don’t miss the 20th anniversary of TechCrunch, and a chance to learn from the top voices in tech. Grab your ticket before Sept 26 to save up to $668. Join 10k+ tech and VC leaders for growth and connections at Disrupt 2025 Netflix, Box, a16z, ElevenLabs, Wayve, Sequoia Capital, Elad Gil — just some of the 250+ heavy hitters leading 200+ sessions designed to deliver the insights that fuel startup growth and sharpen your edge. Don’t miss the 20th anniversary of TechCrunch, and a chance to learn from the top voices in tech. Grab your ticket before Sept 26 to save up to $668. San Francisco | October 27-29, 2025 REGISTER NOW “In the same way, when models are graded only on accuracy, the percentage of questions they get exactly right, they are encouraged to guess rather than say ‘I don’t know,’” they say.

The proposed solution, then, is similar to tests (like the SAT) that include “negative [scoring] for wrong answers or partial

And the researchers argue that it’s not enough to introduce “a few new uncertainty-aware tests on the side.” Instead, “the widely used, accuracy-based evals need to be updated so that their scoring discourages guessing.”

“If the main scoreboards keep rewarding lucky guesses, models will keep learning to guess,” the researchers say.

Topics

Anthony Ha Anthony Ha is TechCrunch’s weekend editor. Previously, he worked as a tech View Bio October 27-29, 2025 San Francisco Founders: land your investor and sharpen your pitch. Investors: discover your next breakout startup. Innovators: claim a front-row seat to the future. Join 10,000+ tech leaders at the epicenter of innovation. Register now and save up to $668.Regular Bird rates end September 26

Most Popular Scale AI’s former CTO launches AI agent that could solve big data’s biggest problem Julie Bort

OpenAI announces AI-powered hiring platform to take on LinkedIn Maxwell Zeff

Tesla’s 4th ‘Master Plan’ reads like LLM-generated nonsense Sean O'Kane

BMW, I am so breaking up with you Connie Loizos

US and Indian VCs just formed a $1B+ alliance to fund India’s deep tech startups Jagmeet Singh

Homicide at Burning Man turns Silicon Valley’s desert playground into a crime scene Connie Loizos

I’m really impressed with this $400 portable projector Lauren Forristal

X LinkedIn Facebook Instagram youTube Mastodon Threads Bluesky TechCrunchStaffContact UsAdvertiseCrunchboard JobsSite Map Terms of ServicePrivacy PolicyRSS Terms of UseCode of Conduct TelsaAnthropicTelexKlarnaMrBeastTech LayoffsChatGPT © 2025 TechCrunch Media LLC.