LLM Support for Tutors GPT-4 boosts remote tutors’ performance in real time, study finds

Published

Mar 26, 2025

Reading time

2 min read

Students benefit from tutoring, but training tutors is expensive. A study shows that large language models can boost tutors’ effectiveness in real time.

What’s new: Rose Wang and colleagues at Stanford built Tutor CoPilot, a tool for remote, online tutors that uses GPT-4 to generate hints, explanations, questions, and other helpful responses to students.

Key insight: When a student makes an error, according to previous work by some of the same authors, effective teachers choose a strategy for addressing the mistake. The authors identified 11 strategies, such as ask a question, explain a concept, provide a hint, or encourage the student. Moreover, they found that an LLM that executed a strategy chosen by an expert teacher performed significantly better than an LLM that was prompted with a strategy chosen at random or no specific strategy. Letting inexperienced tutors choose a strategy while an LLM generates a response helps them learn how to execute the strategy. Students, in turn, benefit from responses that mimic those of an experienced teacher.

How it works: The authors outfitted a remote tutoring application with GPT-4.

The application included a tutor-student chat window, a problem display, and a whiteboard. The authors added a button that enabled the tutor to turn Tutor CoPilot on or off.
When a tutor engaged Tutor CoPilot, the system prompted GPT-4 to behave as an experienced elementary math teacher and provided context in the form of the 10 most recent messages, the current lesson topic, and a default strategy from the list. GPT-4 responded with guidance. (To preserve the tutor’s and student’s privacy, the system redacted their names using the open source library Edu-ConvoKit.)
The system prompted GPT-4 three times, each time changing the strategy, and presented the tutor with three potential responses.
The tutor could re-generate or edit GPT-4’s responses, or select a strategy and generate a new response before adding it to the chat window.

Results: The authors partnered with a virtual tutoring company and a school district in the United States for a two-month study of 874 tutors and 1,787 students between grades 3 and 8. They divided the participants into two groups. In one group, tutors conducted sessions with students as usual. In the other, tutors had access to Tutor CoPilot. The authors measured success by the percentage of students who passed a test at the end of a lesson.

In the group that didn’t use Tutor CoPilot, 62 percent of students passed the test.
In the group with TutorCopilot, 66 percent passed.
The effect was most pronounced among the one-third of tutors who had the lowest ratings (9 percent higher) and least experience (7 percent higher).
The API cost was approximately $3.31 per tutor, or roughly $20 per tutor per year.

Yes, but: The authors found statistically significant improvements as measured by test results per lesson, but not in end-of-year exam results. The study’s two-month duration may account for the lack of evidence for longer-term effects.

Why it matters: LLMs hold great promise for helping to educate students, but they also show potential in educating teachers. For inexperienced tutors who are learning how to interact with students, an LLM’s general knowledge and pedagogical insights gleaned from expert teachers make a powerful combination.

We’re thinking: Although it relies on sophisticated technology, the authors’ approach is simple: Prompt an LLM to apply proven teaching principles. Presumably such principles apply beyond elementary math, which would make this approach useful for teaching a variety of disciplines.

Subscribe to The Batch