Wu, An Ni, Kulbay, Merve, Cheng, Phillip M., Cadrin-Chênevert, Alexandre, Létourneau-Guillon, Laurent, Chartrand, Gabriel, Chong, Jaron, Montagnon, Emmanuel, Ben Ayed, Ismail and Tang, An.
2025.
« Deep learning models connecting images and text: A primer for radiologists ».
Radiographics, vol. 45, nº 9.
Preview |
PDF
BenAyed-I-2025-31945.pdf - Published Version Use licence: Creative Commons CC BY. Download (2MB) | Preview |
Abstract
In radiology practice, medical images are described and interpreted by radiologists in text reports. Recent technical developments enabling deep learning models to connect images and text may facilitate the radiologic workflow. These developments include advances in data embedding, self-supervised learning, zero-shot learning, and transformer-based model architectures. Models connecting images and text can be divided into four categories: (a) Text-image alignment models associate text descriptions with corresponding images. (b) Image-to-text models create text descriptions from images. (c) Text-to-image models generate images from text descriptions. (d) Multimodal models integrate and interpret multiple types of data such as images, videos, text, and numbers simultaneously. Potential clinical applications of these models include automated captioning of medical images, generation of the preliminary radiology report, and creation of educational images. These advances may enable case prioritization, streamlining of clinical workflows, and improvements in diagnostic accuracy.
| Item Type: | Peer reviewed article published in a journal |
|---|---|
| Professor: | Professor Ben Ayed, Ismail |
| Affiliation: | Génie des systèmes |
| Date Deposited: | 19 Sep 2025 13:35 |
| Last Modified: | 24 Sep 2025 23:48 |
| URI: | https://espace2.etsmtl.ca/id/eprint/31945 |
Actions (login required)
![]() |
View Item |

