AI in PT: How One Professor Is Leading the Charge
Benjamin Stern, an assistant professor at Tufts University School of Medicine, started playing with the artificial intelligence (AI) tool ChatGPT several years ago, shortly after he arrived at Tufts. Stern helmed two or three classes each semester that registered up to 100 students, making it challenging to provide quality feedback and assessments for each person. Could AI, he wondered, streamline that process?
When he started at Tufts in 2022, generative large language models were not incredibly sophisticated. “It was really a matter of getting used to this cycle of trial and error,” says Stern, who’s part of the Doctor of Physical Therapy program in Phoenix.
But he saw the promise, and over time the models improved. In 2023, Stern secured two grants to support his work applying machine learning approaches to health care prediction and education. Working with collaborators, he was able to build a group of AI-powered agents that could help him write multiple-choice questions for exams, building questions from information in course lectures, modifying difficulty levels, and tailoring the questions to fit within the bounds of guidelines from professional organizations like the American Board of Physical Therapy Specialties (ABPTS).
“The time saving in the classroom is really incredible,” he says. “The agents make it much simpler to customize helpful and specific questions for exams.”
As advancements continued, so did Stern’s fluency with generative AI tools. The models he now uses are embedded across his work: constructing individualized feedback for students, identifying concepts where specific students excel and struggle, and guiding teaching. He also collaborated on a recent paper, published in Nature Communications, in which he and engineers at Arizona State University (ASU) trained machine learning to extrapolate a dataset from incomplete results. It’s an example of how artificial intelligence can support professors in the classroom as well as in the laboratory.
Crafting Questions, Assessing Performance
Stern’s educational AI work builds on his experience exploring predictive models for non-contact injuries and changes in health state. Stern’s journey with technology began in 2013 with funding from the National Institutes of Health (NIH) to develop a wireless rehabilitation system for stroke patients. This led to collaborations beginning in 2015 with researchers at University of California San Diego, UC Santa Cruz, and ASU, with whom he worked on the application of machine learning techniques to predict non-contact injuries in athletes. His 2020 paper published in Physical Therapy in Sport discussed how novel computational methods may expose critical underlying patterns missed by traditional linear approaches.
In a physical therapy setting, students are often tasked with reading a clinical query and providing responses on a multiple-choice exam. Working with collaborators, including Peter Nadel, digital humanities and natural language processing specialist at Tufts Research Technology Services, Stern and the group integrated a program that screened each multiple-choice item for 19 different item-writing flaws.
They also created a team of AI agents, or specialized versions of chatbots that work together like experts in different fields collaborating on a project. Each AI agent has a distinct role: one creates clinical scenarios, another acts as a critic to identify flaws, a third adjusts question difficulty, and others make sure the items meet standards from organizations like the National Board of Medical Examiners and the ABPTS. The agents can even take existing exam questions and make them harder or easier, something that previously required hours of manual work.
Along with colleagues at Research Technology, the School of Medicine, and the School of Arts and Sciences, Stern and Nadel secured a Tufts Data Intensive Studies Center award to develop ways to integrate AI models into Tufts curricula. Although each item is still verified by a human (the team values keeping a human in the loop), this approach has resulted in a noticeable improvement in educational efficiency. Instead of spending hours crafting individual questions, the AI team can generate sophisticated clinical scenarios that examine students’ reasoning.
Similar models can assess how students perform on different tests and on a variety of questions across a course, providing individual-level monitoring of a student’s learning and skills growth throughout a semester.
“Even if we have a whole bunch of that data missing from a dataset, machine learning can find and follow patterns within the system to fill it in.” The model’s accuracy is incredibly high, even when “observations are random and sparse.”
Practicing on Digital Patients
Two years ago, Stern also began developing digitally simulated patients, with which students could interact, that had personality traits pulled from published articles of a specific case study. These virtual patients remember each conversation they have with a user, and they update their memories based on the results of those conversations.
A student might meet a 20-year-old patient with anxiety and mild hip pain, and when they reconnect weeks later, the patient remembers their previous conversation—including personal details like the student’s empathy or treatment suggestions. As the semester progresses, the patient ages and their condition evolves from mild pain to osteoarthritis to rehabilitation after a total hip replacement, but the relationship with the student is maintained throughout this journey.
Beyond encapsulating the medical history and diagnoses of the patient case, this approach using a digital patient allows users to assign specific qualities, such as emotional state, socio-cultural background, language fluency, or even bringing in distractions, like a caregiver asking questions or an active child running around into the simulation.
This gives Tufts students the ability to interact with a wide swath of patients in a digital landscape, providing practice before they step into a clinic. He recently published a simplified version of this approach in the Journal of Physical Therapy Education that only requires access to a chatbot, like ChatGPT, along with a set of instructions for implementation. Those techniques earned Stern a Physical Therapy Learning Institute Education Influencer Award after he presented at the Physical Therapy Education Leadership Conference in 2024.
Filling in the Blanks
Stern’s growing interest led to his recent collaboration with ASU engineers on a novel machine learning approach. The team developed a hybrid machine learning approach that can reconstruct complete time series from sparse, incomplete data—even from systems it has never encountered before. The computing method trains on synthetic data from known systems—like the “butterfly effect” system—then applies that knowledge to predict the behavior of entirely new systems from even very limited observations.
For Stern, the tool specifically aids exploring non-contact injuries using sensors worn by people, such as GPS-embedded devices that measure a person’s movement for months or years. These studies aim to determine if certain types of athlete training responses or situations signal that an athlete is more likely to become injured in the coming weeks. This could allow coaches, trainers, or health care staff to modify the athlete’s training and keep them healthy.
But often, pulling meaning from the sensor data is a challenge, in part because batteries wear out, the devices can break, GPS or wi-fi signals get lost, or wearers forget to turn them on. A hybrid machine learning approach may be able to fill in those gaps created by the lost information.
“Even if we have a whole bunch of that data missing from a dataset,” he says, “machine learning can find and follow patterns within the system to fill it in.”
The model’s accuracy is incredibly high, the team found, even when “observations are random and sparse.”
Stern views his interest and experimentation with AI as not only positive for his teaching and research, but also for engaging with his students. Using and discussing these tools in the classroom, he said, allows his students to learn competency and comfort with AI-based tools—both their strengths and their weaknesses. His current students, he says, are keenly aware of how generative AI will play a role in their future careers and they are eager to learn where it may be helpful, and where it may fall short.
Each of these generative AI approaches—assessment writing, providing unique feedback to students, digitally simulated patients—will be integrated into the Accelerated Development of Excellence in Excellence in Physical Therapy (ADEPT) program at Tufts University College.
“Honestly, one of the most useful things that happens is when we use these tools in class and they break,” he says. “It’s not some kind of infallible system.”









