Evaluating human-in-the-loop strategies for artificial intelligence-enabled translation of patient discharge instructions: a multidisciplinary analysis.
Publication Title
NPJ Digit Med
Document Type
Article
Publication Date
10-24-2025
Keywords
california; santa monica; psjmc; artificial intelligence
Abstract
Machine translation supported by artificial intelligence (AI) may enhance linguistically-concordant care for patients speaking languages other than English. This assessment of free-text inpatient discharge instructions in Arabic, Armenian, Bengali, simplified Chinese, Somali, and Spanish compared linguist, clinician, and family caregiver evaluations of translations generated by (1) ChatGPT-4o, (2) professional linguists, and (3) human-in-the-loop (AI-generated, professional linguist post-edited). Likert scales (1-5; higher is better) evaluated linguistic and clinical characteristics of each translation. ChatGPT-4o exhibited variable performance relative to professional translations, with poorest ratings for digitally underrepresented languages (Armenian and Somali). Conversely, human-in-the-loop translations achieved comparable, often better, outcomes to professional translations for all languages, (e.g., Armenian mean overall quality: 3.9 [95% CI 3.7-4.2] vs. professional 3.6 [3.4-3.9], p = 0.01), were most frequently preferred (46.5% vs. 28.4%) and had shorter mean translation time (7.1 [5.4-8.8] vs. 16.8 [13.7-19.9] min, p < 0.001). Human-in-the-loop strategies may enable safe, efficient, equitable machine translation application in clinical practice.
Specialty/Research Institute
Health Information Technology
Specialty/Research Institute
Hospital Medicine
DOI
10.1038/s41746-025-02055-6