Deep learning models connecting images and text: A primer for radiologists

Wu, An Ni, Kulbay, Merve, Cheng, Phillip M., Cadrin-Chênevert, Alexandre, Létourneau-Guillon, Laurent, Chartrand, Gabriel, Chong, Jaron, Montagnon, Emmanuel, Ben Ayed, Ismail et Tang, An. 2025. « Deep learning models connecting images and text: A primer for radiologists ». Radiographics, vol. 45, nº 9.

Prévisualisation

PDF
BenAyed-I-2025-31945.pdf - Version publiée
Licence d'utilisation : Creative Commons CC BY.
Télécharger (2MB) | Prévisualisation

URL Officielle: https://doi.org/10.1148/rg.240103

Résumé

In radiology practice, medical images are described and interpreted by radiologists in text reports. Recent technical developments enabling deep learning models to connect images and text may facilitate the radiologic workflow. These developments include advances in data embedding, self-supervised learning, zero-shot learning, and transformer-based model architectures. Models connecting images and text can be divided into four categories: (a) Text-image alignment models associate text descriptions with corresponding images. (b) Image-to-text models create text descriptions from images. (c) Text-to-image models generate images from text descriptions. (d) Multimodal models integrate and interpret multiple types of data such as images, videos, text, and numbers simultaneously. Potential clinical applications of these models include automated captioning of medical images, generation of the preliminary radiology report, and creation of educational images. These advances may enable case prioritization, streamlining of clinical workflows, and improvements in diagnostic accuracy.

Type de document:	Article publié dans une revue, révisé par les pairs
Professeur:	Professeur Ben Ayed, Ismail
Affiliation:	Génie des systèmes
Date de dépôt:	19 sept. 2025 13:35
Dernière modification:	24 sept. 2025 23:48
URI:	https://espace2.etsmtl.ca/id/eprint/31945

Actions (Authentification requise)

Dernière vérification avant le dépôt