Medical Imaging
8 TopicsMonitoring and Evaluating LLMs in Clinical Contexts with Azure AI Foundry
👀 Missed Session 02? Don’t worry—you can still catch up. But first, here’s what AI HLS Ignited is all about: What is AI HLS Ignited? AI HLS Ignited is a Microsoft-led technical series for healthcare innovators, solution architects, and AI engineers. Each session brings to life real-world AI solutions that are reshaping the Healthcare and Life Sciences (HLS) industry. Through live demos, architectural deep dives, and GitHub-hosted code, we equip you with the tools and knowledge to build with confidence. Session 02 Recap: In this session, we introduced MedEvals, an end-to-end evaluation framework for medical AI applications built on Azure AI Foundry. Inspired by Stanford’s MedHELM benchmark, MedEvals enables providers and payers to systematically validate performance, safety, and compliance of AI solutions across clinical decision support, documentation, patient communication, and more. 🧠 Why Scalable Evaluation Is Critical for Medical AI "Large language models (LLMs) hold promise for tasks ranging from clinical decision support to patient education. However, evaluating the performance of LLMs in medical contexts presents unique challenges due to the complex and critical nature of medical information." — Evaluating large language models in medical applications: a survey As AI systems become deeply embedded in healthcare workflows, the need for rigorous evaluation frameworks intensifies. Although large language models (LLMs) can augment tasks ranging from clinical documentation to decision support, their deployment in patient-facing settings demands systematic validation to guarantee safety, fidelity, and robustness. Benchmarks such as MedHELM address this requirement by subjecting models to a comprehensive battery of clinically derived tasks built on dataset (ground truth), enabling fine-grained, multi-metric performance assessment across the full spectrum of clinical use cases. However, shipping a medical LLM is only step one. Without a repeatable, metrics-driven evaluation loop, quality erodes, regulatory gaps widen, and patient safety is put at risk. This project accelerates your ability to operationalize trustworthy LLMs by delivering plug-and-play medical benchmarks, configurable evaluators, and CI/CD templates—so every model update triggers an automated, domain-specific “health check” that flags drift, surfaces bias, and validates clinical accuracy before it ever reaches production. 🚀 How to Get Started with MedEvals Kick off your MedEvals journey by following our curated labs. Newcomers to Azure AI Foundry can start with the foundational workflow; seasoned practitioners can dive into advanced evaluation pipelines and CI/CD integration. 🧪 Labs 🧪 Foundry Basics & Custom Evaluations: 🧾 Notebook Authenticate, initialize a Foundry project, run built-in metrics, and build custom evaluators with EvalAI and PromptEval. 🧪 Search & Retrieval Evaluations: 🧾 Notebook Prepare datasets, execute search metrics (precision, recall, NDCG), visualize results, and register evaluators in Foundry. 🧪 Repeatable Evaluations & CI/CD: 🧾 Notebook Define evaluation schemas, build deterministic pipelines with PyTest, and automate drift detection using GitHub Actions. 🏥 Use Cases 📝 Creating Your Clinical Evaluation with RevCycle Determinations Select a model and metric that best supports the determination behind the rationale made on AI-assisted prior authorizations based on real payor policy. This notebook use case includes: Selecting multiple candidate LLMs (e.g., gpt-4o, o1) Breaking down determinations both in deterministic results (approved vs rejected) and the supporting rationale and logic. Running evaluations across multiple dimensions Combining deterministic evaluators and LLM-as-a-Judge methods Evaluating the differential impacts of evaluators on the rationale across scenarios 🧾Get Started with the Notebook Why it matters: Enables data-driven metric selection for clinical workflows, ensures transparent benchmarking, and accelerates safe AI adoption in healthcare. 📝 Evaluating AI Medical Notes Summarization Applications Systematically assess how different foundation models and prompting strategies perform on clinical summarization tasks, following the MedHELM framework. This notebook use case includes: Preparing real-world datasets of clinical notes and summaries Benchmarking summarization quality using relevance, coherence, factuality, and harmfulness metrics Testing prompting techniques (zero-shot, few-shot, chain-of-thought prompting) Evaluating outputs using both automated metrics and human-in-the-loop scoring 🧾Get Started with the Notebook Why it matters: Ensures responsible deployment of AI applications for clinical summarization, guaranteeing high standards of quality, trustworthiness, and usability. 📣 Join Us for the Next Session Help shape the future of healthcare by sharing AI HLS Ignited with your network—and don’t miss what’s coming next! 📅 Register for the upcoming session → AI HLS Ignited Event Page 💻 Explore the code, demos, and architecture → AI HLS Ignited GitHub Repository811Views0likes0CommentsBuilding AI-Powered Clinical Knowledge Stores with Azure AI Search
👀 Missed Session 01? Don’t worry—you can still catch up. But first, here’s what AI HLS Ignited is all about: What is AI HLS Ignited? AI HLS Ignited is a Microsoft-led technical series for healthcare innovators, solution architects, and AI engineers. Each session brings to life real-world AI solutions that are reshaping the Healthcare and Life Sciences (HLS) industry. Through live demos, architectural deep dives, and GitHub-hosted code, we equip you with the tools and knowledge to build with confidence. Session 01 Recap: In our first session, we introduced the accelerator MedIndexer - which is an indexing framework designed for the automated creation of structured knowledge bases from unstructured clinical sources. Whether you're dealing with X-rays, clinical notes, or scanned documents, MedIndexer converts these inputs into a schema-driven format optimized for Azure AI Search. This will allow your applications to leverage state-of-the-art retrieval methodologies, including vector search and re-ranking. Moreover, by applying a well-defined schema and vectorizing the data into high-dimensional representations, MedIndexer empowers AI applications to retrieve more precise and context-aware information... The result? AI systems that surface more relevant, accurate, and context-aware insights—faster. 🔍 Turning Your Unstructured Data into Value "About 80% of medical data remains unstructured and untapped after it is created (e.g., text, image, signal, etc.)" — Healthcare Informatics Research, Chungnam National University In the era of AI, the rise of AI copilots and assistants has led to a shift in how we access knowledge. But retrieving clinical data that lives in disparate formats is no trivial task. Building retrieval systems takes effort—and how you structure your knowledge store matters. It’s a cyclic, iterative, and constantly evolving process. That’s why we believe in leveraging enterprise-ready retrieval platforms like Azure AI Search—designed to power intelligent search experiences across structured and unstructured data. It serves as the foundation for building advanced retrieval systems in healthcare. However, implementing Azure AI Search alone is not enough. Mastering its capabilities and applying well-defined patterns can significantly enhance your ability to address repetitive tasks and complex retrieval scenarios. This project aims to accelerate your ability to transform raw clinical data into high-fidelity, high-value knowledge structures that can power your next-generation AI healthcare applications. 🚀 How to Get Started with MedIndexer New to Azure AI Search? Begin with our guided labs to build a strong foundation and get hands-on with the core capabilities. Already familiar with the tech? Jump ahead to the real-world use cases—learn how to build Coded Policy Knowledge Stores and X-ray Knowledge Stores. 🧪 Labs 🧪 Building Your Azure AI Search Index: 🧾 Notebook - Building your first Index Learn how to create and configure an Azure AI Search index to enable intelligent search capabilities for your applications. 🧪 Indexing Data into Azure AI Search: 🧾 Notebook - Ingest and Index Clinical Data Understand how to ingest, preprocess, and index clinical data into Azure AI Search using schema-first principles. 🧪 Retrieval Methods for Azure AI Search: 🧾 Notebook - Exploring Vector Search and Hybrid Retrieval Dive into retrieval techniques such as vector search, hybrid retrieval, and reranking to enhance the accuracy and relevance of search results. 🧪 Evaluation Methods for Azure AI Search: 🧾 Notebook - Evaluating Search Quality and Relevance Learn how to evaluate the performance of your search index using relevance metrics and ground truth datasets to ensure high-quality search results. 🏥 Use Cases 📝 Creating Coded Policy Knowledge Stores In many healthcare systems, policy documents such as pre-authorization guidelines are still trapped in static, scanned PDFs. These documents are critical—they contain ICD codes, drug name coverage, and payer-specific logic—but are rarely structured or accessible in real-time. To solve this, we built a pipeline that transforms these documents into intelligent, searchable knowledge stores. This diagram shows how pre-auth policy PDFs are ingested via blob storage, passed through an OCR and embedding skillset, and then indexed into Azure AI Search. The result: fast access to coded policy data for AI apps. 🧾 Notebook - Creating Coded Policies Knowledge Stores Transform payer policies into machine-readable formats. This use case includes: Preprocessing and cleaning PDF documents Building custom OCR skills Leveraging out-of-the-box Indexer capabilities and embedding skills Enabling real-time AI-assisted querying for ICDs, payer names, drug names, and policy logic Why it matters: This streamlines prior authorization and coding workflows for providers and payors, reducing manual effort and increasing transparency. 🩻 Creating X-ray Knowledge Stores In radiology workflows, X-ray reports and image metadata contain valuable clinical insights—but these are often underutilized. Traditionally, they’re stored as static entries in PACS systems or loosely connected databases. The goal of this use case is to turn those X-ray reports into a searchable, intelligent asset that clinicians can explore and interact with in meaningful ways. This diagram illustrates a full retrieval pipeline where radiology reports are uploaded, enriched through foundational models, embedded, and indexed. The output powers an AI-driven web app for similarity search and decision support. 🧾 Notebook - Creating X-rays Knowledge Stores Turn imaging reports and metadata into a searchable knowledge base. This includes: Leveraging push APIs with custom event-driven indexing pipeline triggered on new X-ray uploads Generating embeddings using Microsoft Healthcare foundation models Providing an AI-powered front-end for X-ray similarity search Why it matters: Supports clinical decision-making by retrieving similar past cases, aiding diagnosis and treatment planning with contextual relevance. 📣 Join Us for the Next Session Help shape the future of healthcare by sharing AI HLS Ignited with your network—and don’t miss what’s coming next! 📅 Register for the upcoming session → AI HLS Ignited Event Page 💻 Explore the code, demos, and architecture → AI HLS Ignited GitHub RepositoryCancer Survival with Radiology-Pathology Analysis and Healthcare AI Models in Azure AI Foundry
The integration of radiology and pathology is transforming predictive analytics in healthcare. By combining MRI imaging with histopathology (H&E slides), this multimodal approach leverages pre-trained models from Azure AI Foundry to seamlessly connect macro-level and micro-level insights.General Availability - Medical imaging DICOM® in healthcare data solutions in Microsoft Fabric
As part of the healthcare data solutions in Microsoft Fabric, the DICOM® (Digital Imaging and Communications in Medicine) data transformation is now generally available. Our Healthcare and Life Sciences customers and partners can now ingest, store, transform and analyze DICOM® imaging datasets from various modalities, such as X-rays, CT scans, and MRIs, directly within Microsoft Fabric. This was made possible by providing a purpose-built data pipeline built to top of the medallion Lakehouse architecture. The imaging data transformation capabilities enable seamless transformation of DICOM® (imaging) data into tabular formats that can persist in the lake in FHIR® (Fast Healthcare Interoperability Resources) (Silver) and OMOP (Observational Medical Outcomes Partnership) (Gold) formats, thus facilitating exploratory analysis and large-scale imaging analytics and radiomics. Establishing a true multi-modal biomedical Lakehouse in Microsoft Fabric Along with other capabilities in the healthcare data solutions in Microsoft Fabric, this DICOM® data transformation will empower clinicians and researchers to interpret imaging findings in the appropriate clinical context by making imaging pixel and metadata available alongside the clinical history and laboratory data. By integrating DICOM® pixels and metadata with clinical history and laboratory data, our customers and partners can achieve more with their multi-modal biomedical data estate, including: Unify your medical imaging and clinical data estate for analytics Establish a regulated hub to centralize and organize all your multi-model healthcare data, creating a foundation for predictive and clinical analytics. Built natively on well-established industry data models, including DICOM®, FHIR® and OMOP. Build fit-for-purpose analytics models Start constructing ML and AI models on a connected foundation of EHR and pixel-data. Enable researchers, data scientists and health informaticians to perform analysis on large volumes of multi-model datasets to achieve higher accuracy in diagnosis, prognosis and improved patient outcomes 1 . Advance research, collaboration and sharing of de-identified imaging Build longitudinal views of patients’ clinical history and related imaging studies with the ability to apply complex queries to identify patient cohorts for research and collaboration. Apply text and imaging de-identification to enable in-place sharing of research datasets with role-based access control. Reduce the cost of archival storage and recovery Take advantage of the cost-effective, HIPAA compliant and reliable cloud-based storage to back up your medical imaging data from the redundant storage of on-prem PACS and VNA systems. Improve your security posture with a 100% off-site cloud archival of your imaging datasets in case of unplanned data loss. Employ AI models to recognize pixel-level markers and patterns Deploy existing precision AI models such as Microsoft’s Project InnerEye and NVIDIA’s MONAI to enable automated segmentation of 3D radiology imaging that can help expedite the planning of radiotherapy treatments and reduce waiting times for oncology patients. Conceptual architecture The DICOM® data transformation capabilities in Microsoft Fabric continue to offer our customers and partners the flexibility to choose the ingestion pattern that best meets their existing data volume and storage needs. At a high level, there are three patterns for ingesting DICOM® data into the healthcare data solutions in Microsoft Fabric. Depending on the chosen ingestion pattern, there are up to eight end-to-end execution steps to consider from the ingestion of the raw DICOM® files to the transformation of the Gold Lakehouse into the OMOP CDM format, as depicted in the conceptual architecture diagram below. To review the eight end-to-end execution steps, please refer to the Public Preview of the DICOM® data ingestion in Microsoft Fabric. Conceptual architecture and ingestion patterns of the DICOM® data ingestion capability in Microsoft Fabric You can find more details about each of those three ingestion patterns in our public documentation: Use DICOM® data ingestion - Microsoft Cloud for Healthcare | Microsoft Learn Enhancements in the DICOM® data transformation in Microsoft Fabric. We received great feedback from our public preview customers and partners. This feedback provided an objective signal for our product group to improve and iterate on features and the product roadmap to make the DICOM® data transformation capabilities more conducive and sensible. As a result, several new features and improvements in DICOM® data transformation are now generally available, as described in the following sections: All DICOM® Metadata (Tags) are now accessible in the Silver Lakehouse We acknowledge the importance and practicality to avail all DICOM® metadata, i.e. tags, in the Silver Lakehouse closer to the clinical and ImagingStudy FHIR® resources. This makes it easier to explore any existing DICOM® tags from within the Silver Lakehouse. It also helps position the DICOM® staging table in the Bronze Lakehouse (ImagingDICOM) as a transient store, i.e., after the DICOM® metadata is processed and transformed from the bronze Lakehouse to the Silver Lakehouse, the data in the bronze staging table can now be considered as ready to be purged. This ensures cost and storage efficiency and reduces data redundancy between source files and staging tables in the bronze Lakehouse. Unified Folder Structure OneLake in Microsoft Fabric offers a logical data lake for your organization. Healthcare data solutions in Microsoft Fabric provide a unified folder structure that helps organize data across various modalities and formats. This structure streamlines data ingestion and processing while maintaining data lineage at the source file and source system levels in the bronze Lakehouse. A complete set of unified folders, including the Imaging modality and DICOM® format, is now deployed as part of the healthcare data foundation deployment experience in the healthcare data solutions in Microsoft Fabric. Purpose-built DICOM® data transformation pipeline Healthcare data foundations offer ready-to-run data pipelines that are designed to efficiently structure data for analytics and AI/machine learning modeling. We introduce an imaging data pipeline to streamline the end-to-end execution of all activities in the DICOM® data transformation capabilities. The DICOM® data transformation in the imaging data pipeline consists of the following stages: The pipeline ingests and persists the raw DICOM® imaging files, present in the native DCM format, in the bronze Lakehouse. Then, it extracts the DICOM® metadata (tags) from the imaging files and inserts them into the ImagingDICOM table in the bronze Lakehouse. The data in the ImagingDICOM will then be converted to FHIR® ImagingStudy NDJSON files, stored in OneLake. The data in the ImagingStudy NDJSON files will be transformed to relational FHIR® format and ingested in the ImagingStudy delta table in the Silver Lakehouse. Compression-by-design Healthcare data solutions in Microsoft Fabric support compression-by-design across the medallion Lakehouse design. Data ingested into the delta tables across the medallion Lakehouse are stored in a compressed, columnar format using parquet files. In the ingest pattern, when the files move from the Ingest folder to the Process folder, they will be compressed by default after successful processing. You can configure or disable the compression as needed. The imaging data transformation pipeline can also process the DICOM® files in a raw format, i.e. dcm files, and/or in a compressed format, i.e. ZIP format of dcm files/folders. Global configuration The admin Lakehouse was introduced in this release to manage cross-Lakehouse configuration, global configuration, status reporting, and tracking for healthcare data solutions in Microsoft Fabric. The admin Lakehouse system-configurations folder centralizes the global configuration parameters. The three configuration files contain preconfigured values for the default deployment of all healthcare data solutions capabilities. You can use the global configuration to repoint the data ingestion pipeline to any source folder other than the unified folder configured by default. You can also configure any of the input parameters for each activity in the imaging data transformation pipeline. Sample Data In this release, a more comprehensive sample data is provided to help you run the data pipelines in DICOM® data transformation end-to-end and explore the data processing in each step through the medallion Lakehouse, Bronze, Silver and Gold. The imaging sample data may not be clinically meaningful, but they are technically complete and comprehensive to demonstrate the full DICOM® data transformation capabilities 2 . In total, the sample data for DICOM® data transformation contains 340, 389 and 7739 DICOM® studies, series and instances respectively. One of those studies, i.e. dcm files, is an invalid DICOM® study, which was intentionally provided to showcase how the pipeline manages files that do not conform to the DICOM® format. Those sample DICOM® studies are related to 302 patients and those patients are also included in the sample data for the clinical ingestion pipeline. Thus, when you ingest the sample data for the DICOM® data transformation and clinical data ingestion, you will have a complete view that depicts how the clinical and imaging data would appear in a real-world scenario. Enhanced data lineage and traceability All delta tables in the Healthcare Data Model in the Silver Lakehouse now have the following columns to ensure lineage and traceability at the record and file level. msftCreatedDatetime: the datatime at which the record was first created in the respective delta table in the Silver Lakehouse msftModifiedDatetime: the datatime at which the record was last modified in the respective delta table in the Silver Lakehouse msftFilePath: the full path to the source file in the Bronze Lakehouse (including shortcut folders) msftSourceSystem: the source system of this record. It corresponds to the [Namespace] that was specified in the unified folder structure. As such, and to ensure lineage and traceability extend to the entire medallion Lakehouse, the following columns are added to the OMOP delta table in the Gold Lakehouse: msftSourceRecordId: the original record identifier from the respective source delta table in the Silver Lakehouse. This is important because OMOP records will have newly generated IDs. More details are provided here. msftSourceTableName: the name of the source delta table in the Silver Lakehouse. Due to the specifics of FHIR-to-OMOP mappings, there are cases where many OMOP tables in the Gold Lakehouse may be sourced from the same/single FHIR table in the Silver Lakehouse, such as the OBSERVATION and MEASUREMENT OMOP delta tables in the Gold Lakehouse that are both sources from the Observation FHIRL delta table in the Silver Lakehouse. There is also the case where a single delta table in the Gold Lakehouse may be sourced from many delta tables in the Silver Lakehouse, such as the LOCATION OMOP table that could be sourced from either the Patient or Organization FHIR table. msftModifiedDatetime: the datatime at which the record was last modified in the respective delta table in the Silver Lakehouse. In summary, this article provides comprehensive details on how the DICOM® data transformation capabilities in the healthcare data solutions in Microsoft Fabric offer a robust and all-encompassing solution for unifying and analyzing the medical imaging data in a harmonized pattern with the clinical dataset. We also listed major enhancements to these capabilities that are now generally available for all our healthcare and life sciences customers and partners. For more details, please refer to our public documentation: Overview of DICOM® data ingestion - Microsoft Cloud for Healthcare | Microsoft Learn 1 S. Kevin Zhou, Hayit Greenspan, Christos Davatzikos, James S. Duncan, Bram van Ginneken, Anant Madabhushi, Jerry L. Prince, Daniel Rueckert, Ronald M. Summers A review of deep learning in medical imaging: Imaging traits, technology trends, case studies with progress highlights, and future promises. arXiv:2008.09104 2 Microsoft provides the Sample Data in the Healthcare data solutions in Microsoft Fabric on an "as is" basis. This data is provided to test and demonstrate the end-to-end execution of data pipelines provided within the Healthcare data solutions in Microsoft Fabric. This data is not intended or designed to train real-world or production-level AI/ML models, or to develop any clinical decision support systems. Microsoft makes no warranties, express or implied, guarantees or conditions with respect to your use of the datasets. To the extent permitted under your local law, Microsoft disclaims all liability for any damages or losses, including direct, consequential, special, indirect, incidental, or punitive, resulting from your use of this data. The Sample Data in the Healthcare data solutions in Microsoft Fabric is provided under the Community Data License Agreement – Permissive – Version 2.0 DICOM® is the registered trademark of the National Electrical Manufacturers Association (NEMA) for its Standards publications relating to digital communications of medical information. FHIR® is a registered trademark of Health Level Seven International, registered in the U.S. Trademark Office, and is used with their permission.Empowering multi-modal analytics with the medical imaging capability in Microsoft Fabric
This blog is part of a series that explores the recent announcement of the public preview of healthcare data solutions in Microsoft Fabric. The DICOM® (Digital Imaging and Communications in Medicine) data ingestion capability within the healthcare data solutions in Microsoft Fabric enables the storage, management, and analysis of imaging metadata from various modalities, including X-rays, CT scans, and MRIs, directly within Microsoft Fabric. It fosters collaboration, R&D and AI innovation for healthcare and life science use cases. Our customers and partners can now integrate DICOM® imaging datasets with clinical data stored in FHIR® (Fast Healthcare Interoperability Resources) format. By making imaging pixels and metadata accessible alongside clinical history and laboratory data, it enables clinicians and researchers to interpret imaging findings in the appropriate clinical context. This leads to enhanced diagnostic accuracy, informative clinical decision-making, and ultimately, improved patient outcomes.