AI boosted student scores 48%, then crashed them 17%

AI boosted student scores 48%, then crashed them 17%

·4 min readLearning & Mental Models

Students who practiced math with ChatGPT solved 48% more problems correctly than classmates working without it. Then the AI was taken away, and they scored 17% worse on the exam than students who never used AI at all.

That is the core finding from a Wharton and University of Pennsylvania study led by Hamsa Bastani, involving nearly 1,000 high school students in Turkey. The pattern it reveals has a name now: cognitive debt. You borrow understanding from the machine, perform brilliantly while the loan is active, and go bankrupt the moment the credit line is cut.

What cognitive debt looks like inside your brain

MIT Media Lab researcher Nataliya Kosmyna wanted to see what happens neurologically when people write with ChatGPT. Her team placed EEG sensors on 54 participants, split them into three groups (ChatGPT, search engine, brain-only), and tracked neural connectivity across four months.

Students writing without tools showed the strongest brain networks, particularly in alpha and theta bands (the frequencies tied to creativity, memory encoding, and deep processing). Search engine users showed moderate engagement. ChatGPT users displayed the weakest connectivity across every measured band. Most striking: 83% of ChatGPT users could not recall a single passage from essays they had written minutes earlier. Their brains had outsourced the work so completely that no memory trace remained. As the MIT study documented, LLM users "consistently underperformed at neural, linguistic, and behavioral levels" over the entire four-month period.

The dependency trap: better scores, emptier understanding

Bastani's study makes the mechanism clear. Students in the unrestricted ChatGPT group could ask the AI anything. They breezed through practice sessions, solving problems faster and more accurately. But they were not learning the underlying concepts; they were blindly following AI advice the software provided.

When exam day arrived and the AI was gone, the gap reversed. The control group, which had struggled through problems on their own, retained the concepts. The AI-assisted group had built a scaffold on borrowed understanding and could not stand without it.

A separate experiment at Corvinus University of Budapest reinforced this. Researchers Benedek and Sziklai found students demonstrated knowledge levels 20 to 40 percentage points lower than previous cohorts. By the final exam, AI detection tools flagged a median of 100% AI-generated content in submissions from both permitted and non-permitted groups. Even students told not to use AI apparently could not stop.

Why the "good" AI tutor barely helps either

The Wharton study also tested a second group: students given a custom AI tutor with pedagogical safeguards, offering hints instead of answers and guiding reasoning steps rather than handing out solutions.

Those students performed roughly the same as the control group on the exam. Not 17% worse, but not better either. Even a carefully designed AI tool produced zero measurable learning gains, because students still outsourced cognitive effort rather than building their own mental pathways.

This tracks with research showing that prolonged AI exposure led to memory decline in 73 undergraduates, even when pretesting initially improved engagement. The brain adapts to convenience by doing less, and doing less means encoding less.

The compounding cost nobody is measuring

Cognitive debt does not erase one test's worth of knowledge. It compounds. Each time a student uses AI to skip the effortful part of learning, their capacity for independent problem-solving weakens. Research involving 206 vocational students in Nigeria found that frequent AI use posed significant threats to critical thinking, with the effect worsening over time.

This mirrors what happens in workplaces, where cognitive overload from too many AI tools degrades performance rather than improving it. Students losing learning capacity and professionals losing decision-making capacity are the same mechanism in different contexts.

What actually builds lasting knowledge

The uncomfortable answer is friction. The struggle, the mistakes, the slow processing that AI eliminates are exactly what builds durable neural pathways. Study methods that build genuine retention, like interleaving and retrieval practice, feel harder in the moment but produce dramatically stronger long-term recall.

The MIT study confirmed this directly: the brain-only group reported the highest satisfaction, the strongest sense of ownership, and the most robust memory of what they produced. The struggle was not a bug; it was the entire mechanism of learning.

If you are a student, a parent, or anyone using AI to learn something new, the research points to one rule: use AI to check your work, not to do it. The 48% performance boost disappears the moment the tool does, and what remains is a brain that practiced not thinking.


Related Reading:

Sources and References

  1. Wharton School, University of PennsylvaniaNearly 1,000 Turkish high school students using unrestricted ChatGPT solved 48% more practice problems but scored 17% worse on exams when AI was removed.
  2. MIT Media LabEEG monitoring of 54 participants over four months showed ChatGPT users had weakest neural connectivity. 83% could not recall essays they wrote minutes earlier.
  3. PMC reviewMeta-review confirming cognitive paradox: AI-assisted students scored 17% lower on concept understanding tests.
  4. Corvinus University of BudapestStudents demonstrated knowledge levels 20-40pp lower than previous cohorts.

Read about our editorial standards

You might also like: