Masri, Sari, Hasasneh, Ahmad, Tami, Mohammad et Tadj, Chakib.
2024.
« Exploring the impact of image-based audio representations in classification tasks using vision transformers and explainable AI techniques ».
Information, vol. 15, nº 12.
Prévisualisation |
PDF
Tadj-C-2024-30460.pdf - Version publiée Licence d'utilisation : Creative Commons CC BY. Télécharger (8MB) | Prévisualisation |
Résumé
An important hurdle in medical diagnostics is the high-quality and interpretable classification of audio signals. In this study, we present an image-based representation of infant crying audio files to predict abnormal infant cries using a vision transformer and also show significant improvements in the performance and interpretability of this computer-aided tool. The use of advanced feature extraction techniques such as Gammatone Frequency Cepstral Coefficients (GFCCs) resulted in a classification accuracy of 96.33%. For other features (spectrogram and mel-spectrogram), the performance was very similar, with an accuracy of 93.17% for the spectrogram and 94.83% accuracy for the mel-spectrogram. We used our vision transformer (ViT) model, which is less complex but more effective than the proposed audio spectrogram transformer (AST). We incorporated explainable AI (XAI) techniques such as Layer-wise Relevance Propagation (LRP), Local Interpretable Modelagnostic Explanations (LIME), and attention mechanisms to ensure transparency and reliability in decision-making, which helped us understand the why of model predictions. The accuracy of detection was higher than previously reported and the results were easy to interpret, demonstrating that this work can potentially serve as a new benchmark for audio classification tasks, especially in medical diagnostics, and providing better prospects for an imminent future of trustworthy AI-based healthcare solutions.
Type de document: | Article publié dans une revue, révisé par les pairs |
---|---|
Professeur: | Professeur Tadj, Chakib |
Affiliation: | Génie électrique |
Date de dépôt: | 14 janv. 2025 16:29 |
Dernière modification: | 29 janv. 2025 20:13 |
URI: | https://espace2.etsmtl.ca/id/eprint/30460 |
Actions (Authentification requise)
Dernière vérification avant le dépôt |