ENGLISH
La vitrine de diffusion des publications et contributions des chercheurs de l'ÉTS
RECHERCHER

Exploring the impact of image-based audio representations in classification tasks using vision transformers and explainable AI techniques

Masri, Sari, Hasasneh, Ahmad, Tami, Mohammad et Tadj, Chakib. 2024. « Exploring the impact of image-based audio representations in classification tasks using vision transformers and explainable AI techniques ». Information, vol. 15, nº 12.

[thumbnail of Tadj-C-2024-30460.pdf]
Prévisualisation
PDF
Tadj-C-2024-30460.pdf - Version publiée
Licence d'utilisation : Creative Commons CC BY.

Télécharger (8MB) | Prévisualisation

Résumé

An important hurdle in medical diagnostics is the high-quality and interpretable classification of audio signals. In this study, we present an image-based representation of infant crying audio files to predict abnormal infant cries using a vision transformer and also show significant improvements in the performance and interpretability of this computer-aided tool. The use of advanced feature extraction techniques such as Gammatone Frequency Cepstral Coefficients (GFCCs) resulted in a classification accuracy of 96.33%. For other features (spectrogram and mel-spectrogram), the performance was very similar, with an accuracy of 93.17% for the spectrogram and 94.83% accuracy for the mel-spectrogram. We used our vision transformer (ViT) model, which is less complex but more effective than the proposed audio spectrogram transformer (AST). We incorporated explainable AI (XAI) techniques such as Layer-wise Relevance Propagation (LRP), Local Interpretable Modelagnostic Explanations (LIME), and attention mechanisms to ensure transparency and reliability in decision-making, which helped us understand the why of model predictions. The accuracy of detection was higher than previously reported and the results were easy to interpret, demonstrating that this work can potentially serve as a new benchmark for audio classification tasks, especially in medical diagnostics, and providing better prospects for an imminent future of trustworthy AI-based healthcare solutions.

Type de document: Article publié dans une revue, révisé par les pairs
Professeur:
Professeur
Tadj, Chakib
Affiliation: Génie électrique
Date de dépôt: 14 janv. 2025 16:29
Dernière modification: 29 janv. 2025 20:13
URI: https://espace2.etsmtl.ca/id/eprint/30460

Actions (Authentification requise)

Dernière vérification avant le dépôt Dernière vérification avant le dépôt