R.A.I.S.E - A novel framework for evaluating foundational AI models in medical deployment: Moving beyond traditional metrics to real-world deployability

Adendorff, JAELourens, Roger LDelport, RMarivate, VGichoya, JW2026-01-272026-01-272025-12978-1-0370-5280-4http://hdl.handle.net/10204/14650The shift from “narrow” traditional deep learning models to more generalist foundation models represents a paradigm shift for AI in medicine with the emergence of unimodal and multimodal systems such as MedGemma, Biomedclip, DINO models, and MedImageInsight. While these generalist models promise broad capabilities, they demand large datasets and high computational resources for training, and carry risks such as hallucinations, which can be hazardous in clinical use. In medicine, whether a model can be securely incorporated into actual clinical workflows is more important than whether it passes standard ized tests. Current assessment techniques for foundation models are fre quently based on multiple choice questions and do not account for real world deployment scenarios. At a two-day datathon (16-17 July 2025), we explored deploying MedGemma for chest X-ray reporting in South Africa. We proposed a gradual, radiologist-guided integration focused on controlled, automatable tasks rather than full diagnostic use. Our three pronged evaluation framework creates a uniform readiness score and al lows for continuous real-world monitoring by combining tailored deploy ment paths and hierarchical decision making with Go/No-Go thresholds.FulltextenRadiologyImplementation FrameworkHealthcareAr tificial IntelligenceFoundation ModelR.A.I.S.E - A novel framework for evaluating foundational AI models in medical deployment: Moving beyond traditional metrics to real-world deployabilityConference PresentationN/A