We designed an AI tutor that helps college students reason rather than give them answers
- Written by Saharnaz Babaei-Balderlou, Teaching Assistant Professor of Economics, University of Wisconsin-La Crosse
Students using AI to cheat on homework[1] or tests is a source of much discussion[2]. But some scholars[3] argue the greater risk of students using AI is that they will simply not learn.
Approximately 90% of 1,100 U.S. students surveyed at two-year and four-year colleges in 2025 reported using generative AI[4] for everything from drafting assignments to clarifying complex concepts.
But when students use AI as a tutor or study partner, not as an immediate answer generator, does it make it easier or harder for them to learn?
We are economists[5] who tried to answer[6] this question by[7] designing an AI tool using ChatGPT’s custom GPT feature, with the web access of the chatbot disabled.
We named the tool Macro Buddy[8] and trained it to guide some students at one of our undergraduate macroeconomics classes at the University of Wisconsin, La Crosse, through their reasoning rather than giving them direct answers.
We found in our research[9], conducted in spring 2025, that students who used Macro Buddy, alongside peer discussion, earned higher exam scores than students who worked alone, without this AI tutor.
Meet your new tutor
One of our macroeconomics courses enrolled 140 undergraduate students, mostly in their first or second year of college, divided across four sections.
Students’ course materials, assignments and exams were identical across all four sections. Students were generally not allowed to use AI tools or collaborate with classmates during exams. Students took all tests in person and were not allowed to reference any notes or other materials during the exam.
As a result, exam scores reflected what students understood and could explain on their own – without the help of AI or any other outside source.
After all students took their first exam, we randomly assigned the four class sections to take on a different study format.
We prompted one group of students to work individually, without Macro Buddy; another group of students worked in groups, without Macro Buddy; a third group of students worked individually, with Macro Buddy; and a fourth group of students worked in groups, with Macro Buddy.
We wanted to compare how different study approaches – working alone, working with classmates, using Macro Buddy or combining both – altered how well students did on exams.
Macro Buddy’s skills
We trained Macro Buddy with the help of lecture transcripts, slides and homework questions specifically from this macroeconomics course.
Macro Buddy had internet access turned off, so it relied only on the instructor’s course materials.
Macro Buddy was designed to act like a tutor, not an answer machine. Instead of giving students complete solutions, Macro Buddy asked follow-up questions meant to guide students toward an answer.
For example, if a student asked why lower prices might increase consumers’ spending, Macro Buddy would not offer a quick, full explanation. It might instead ask what happens to people’s purchasing power when prices fall. The student would then have to connect the concepts and explain their reasoning, in their own words, step by step.
This distinction between explaining an idea and receiving a finished answer matters.
An AI tool that simply delivers answers can allow students to skip thinking through a problem. One study found that when college students rely on a chatbot as a crutch, they perform worse when they no longer have access[11] to it. A tool that asks questions requires students to do the work themselves, even while receiving guidance. This is the very process that makes learning stick[12].
What happened to students’ learning
The one group of students that continued working individually, without AI, served as our control group.
The other three groups changed how they studied: One began working in groups without AI, one worked individually with Macro Buddy, and the last group combined group work with Macro Buddy.
All of the students’ average scores declined when they took their second exam, across all four study groups.
By the third exam, however, differences across sections became clearer.
Students who used both Macro Buddy and group discussion earned the highest average scores. Students who used Macro Buddy alone also scored higher than those who worked alone without Macro Buddy. Students who worked in groups without Macro Buddy showed smaller improvements, when compared to the students in other groups.
The third exam happened several weeks after we introduced the new study formats.
By that point, students in the combined group may have grown more comfortable using Macro Buddy to test their understanding, while also explaining ideas to classmates. Working with peers meant having to articulate reasoning clearly and respond to questions, which can deepen understanding over time.
Why this matters
Some critics of AI worry that students will rely on AI to do the hardest parts of learning for them[13]. This reflects a fear that students may stop practicing the skills that build expertise. Students become experts in their fields while struggling with confusing material, revising explanations and seeing whether they truly understand an idea.
Our experiment suggests erosion of learning when using AI is not inevitable.
We found that when AI is designed as a tutor that asks questions instead of simply giving answers – and when students are also required to explain their reasoning to classmates – the technology can support learning rather than replace it.
Most students today use general-purpose chatbots that are not designed as tutors. They type in a question and receive a response. But our findings suggest that even small design choices, such as building an AI chatbot with guiding questions, can shape how students engage with the material.
Peer discussion also adds something to the learning process that AI cannot provide: social accountability and exposure to alternative reasoning.
Together, these practices encourage students to think through problems more actively.
The evidence[14] from our experiment highlights a practical distinction: AI can be used to replace thinking, or it can be used to support it. The impact may depend less on the technology itself and more on how it is structured and integrated into learning.
References
- ^ cheat on homework (www.forbes.com)
- ^ much discussion (www.nbcnews.com)
- ^ some scholars (theconversation.com)
- ^ reported using generative AI (www.forbes.com)
- ^ are economists (saharnaz.org)
- ^ tried to answer (papers.ssrn.com)
- ^ this question by (scholar.google.com)
- ^ Macro Buddy (chatgpt.com)
- ^ found in our research (dx.doi.org)
- ^ Maskot/iStock/Getty Images (www.gettyimages.com)
- ^ they perform worse when they no longer have access (dx.doi.org)
- ^ very process that makes learning stick (theconversation.com)
- ^ will rely on AI to do the hardest parts of learning for them (www.edweek.org)
- ^ The evidence (dx.doi.org)
Authors: Saharnaz Babaei-Balderlou, Teaching Assistant Professor of Economics, University of Wisconsin-La Crosse

