Ben Cheikh, Elyes, Mrabet, Yassine, Laporte, Catherine et Bouserhal, Rachel E..
2026.
« A multimodal in-ear audio and physiological dataset for swallowing and non-verbal event classification ».
Sensors, vol. 26, nº 7.
Prévisualisation |
PDF
Laporte-C-2026-33662.pdf - Version publiée Licence d'utilisation : Creative Commons CC BY. Télécharger (54MB) | Prévisualisation |
Résumé
Swallowing is a critical marker of neurological and emotional health. The ability to monitor it continuously and non-invasively, especially through smart ear-worn devices, holds significant promise for clinical applications. Despite this potential, no public audio datasets currently support reliable swallowing sound detection. Existing datasets focus primarily on speech and breathing, offering limited coverage and lacking detailed annotations for swallowing events. To address this gap, we introduce an in-ear audio dataset specifically designed to capture a wide range of verbal and non-verbal sounds. It includes comprehensive labeling focused on swallowing. The dataset was collected from 34 healthy adults (14 females and 20 males) between the ages of 20 and 29. Each participant performed a series of predefined tasks involving both non-verbal and verbal events. Non-verbal tasks included swallowing, clicking, forceful blinking, touching the scalp, and physical movements such as squatting or walking in place. Verbal tasks consisted of speaking (e.g., describing an image). Recordings were conducted in both quiet and noisy environments to better reflect real-world conditions. Data were captured using a combination of in-/outer-ear microphones, a chest belt to record electrocardiogram (ECG), respiration and acceleration signals, and an ultrasound probe to track tongue movement, which served as a reference for swallowing annotation. All signals were precisely synchronized. To ensure high data quality, the recordings were reviewed using both algorithmic analysis and manual inspection. Swallowing events were identified based on ultrasound signals and validated by an expert to guarantee accurate labeling. As a proof of concept that in-ear audio supports swallow classification, we fine-tune a fully connected neural network on YAMNet embeddings plus zero-crossing rate (ZCR) features. Across the completed folds, the model reaches an F1 score of 0.875 ± 0.013.
| Type de document: | Article publié dans une revue, révisé par les pairs |
|---|---|
| Chercheur(-euse): | Chercheur(-euse) Laporte, Catherine Bouserhal, Rachel |
| Affiliation: | Génie électrique, Génie électrique |
| Date de dépôt: | 29 avr. 2026 15:52 |
| Dernière modification: | 22 mai 2026 21:40 |
| URI: | https://espace2.etsmtl.ca/id/eprint/33662 |
Actions (Authentification requise)
![]() |
Dernière vérification avant le dépôt |

