Dr. Steven Allon has spent the past two years advancing conversations about AI’s role in LGBTQ+ medical education within SGIM. He co-presented Harnessing AI to combat, not reinforce, LGBTQ+ stereotypes in medical education at SGIM25 and will co-lead a new workshop, Building LLM-powered bots to acquire and critically appraise the medical literature, at SGIM26.
His current work centers on developing an AI-driven standardized patient (SP) to support more authentic, accessible, and ethically grounded training in transgender and gender-diverse (TGD) healthcare. In partnership with TGD community members, his team is building a model designed to be realistic, bias-aware, privacy-protective, and reflective of lived experience.
What inspired you and your collaborators to pursue the development of a generative AI–based standardized patient focused specifically on transgender and gender-diverse healthcare training?
We began with two challenges: learners often feel underprepared communicating with TGD patients, and access to TGD standardized patients is limited partly due to a lack of best practices and the need to ensure SPs are supported.
In a 2025 pilot study, we worked with TGD community members to clarify how TGD SPs should be portrayed. A key insight was that cisgender portrayals are acceptable only in narrow situations, making it harder to scale training opportunities. At the same time, constraints on diversity-oriented education were growing, prompting us to explore new teaching methods. Seeing examples from community organizations using generative AI for blended training, we recognized an opportunity to supplement limited TGD SP availability and expand communication training for medical students and residents.
How are transgender and gender-diverse (TGD) community partners shaping the design, authenticity, and ethical safeguards of this AI SP?
TGD community leadership is central. Our team includes transgender scholars who guide design decisions and help define what authentic responses should look like. We train the model using de-identified transcripts of real encounters between students and TGD SPs to build realistic language patterns. Next, TGD reviewers with lived and healthcare-adjacent experience will assess blinded transcripts from both AI and real SP encounters. Their feedback on authenticity and acceptability will determine whether the model moves forward. No learner testing will proceed unless community reviewers approve the model’s performance.
What have you learned so far from the feasibility and acceptability testing phase, particularly from community members with lived experience or healthcare-adjacent expertise?
We’ve learned how differently language models behave when asked to simulate TGD patients. Clear prompting, like emphasizing, “act as a patient” rather than “act as an SP”, improves realism. We also developed a taxonomy of biases commonly produced by models, such as pathologizing gender identity or over-accommodating in ways that feel inauthentic. To counter these issues, we rely on real, de-identified training data from TGD SP encounters. This significantly improves the model’s ability to produce grounded, believable responses.
How is your group addressing challenges like bias, privacy, and representation when using AI as you build an ethically grounded model for sensitive clinical encounters?
Bias and privacy guide every decision. We use de-identified transcripts from TGD SP encounters only with prior informed consent from participating students. To safeguard this data, we avoid cloud-based systems entirely and run the model locally. Representation is addressed through direct TGD community involvement, both in model development and in the gating process, that determines whether the model is acceptable for learner use. This co-development approach ensures the model meaningfully reflects lived experience and helps reduce trainee bias rather than reinforce it.
Looking ahead, how do you envision AI-driven standardized patients expanding or transforming LGBTQ+ health training within medical education, both at Vanderbilt and beyond?
If the model meets community standards, we envision AI-driven SPs being used alongside, not instead of, TGD SPs. This blended approach could expand practice opportunities, especially for communication skills that require repetition and individualized feedback. Integrating an AI SP into existing courses, clerkships, or modules could help standardize exposure to TGD healthcare topics and offer scalable practice that many institutions currently lack. Ultimately, we hope this work helps prepare trainees to care for TGD patients with greater confidence, competence, and respect.
Interested in learning more?
Contact Dr. Allon to explore collaboration opportunities or discuss how this approach might be implemented at your institution.
SGIM Presentations
Past (SGIM25):
Terndrup C, Allon S, Zucker S, Mayer G. Harnessing AI to combat, not reinforce, LGBTQ+ stereotypes in medical education. Workshop at: Society of General Internal Medicine, Annual National Conference; 2025 May 15; Hollywood, FL.
Future (SGIM26):
Burke H, Allon S, Goldstein J, Diemer M. Building LLM-powered bots to acquire and critically appraise the medical literature. Workshop at: Society of General Internal Medicine, Annual National Conference; 2026 May 7; Washington, D.C.


