Robust watch-list screening using dynamic ensembles of SVMs based on multiple face representations

Statistiques de téléchargement

Téléchargements

Téléchargements par mois depuis la dernière année

Bashbaghi, Saman, Granger, Eric, Sabourin, Robert et Bilodeau, Guillaume-Alexandre. 2017. « Robust watch-list screening using dynamic ensembles of SVMs based on multiple face representations ». Machine Vision and Applications, vol. 28, nº 1/2. pp. 219-241.
Compte des citations dans Scopus : 15.

[thumbnail of Robust-Watch-List-Screening-Using-Dynamic-Ensembles-of-SVMs-Based-on-Multiple-Face-Representations.pdf]

Prévisualisation

PDF
Robust-Watch-List-Screening-Using-Dynamic-Ensembles-of-SVMs-Based-on-Multiple-Face-Representations.pdf
Télécharger (2MB) | Prévisualisation

URL Officielle: http://dx.doi.org/10.1007/s00138-016-0820-4

Résumé

Still-to-video face recognition (FR) is an important function in video surveillance, where faces captured over a network of video cameras are matched against reference stills of target individuals. Screening faces against a watch-list is a challenging video surveillance application because the appearance of faces vary due to changing capture conditions and operational domains. The facial models used for matching may not be representative of faces captured with video cameras because they are typically designed a priori with only one reference still. In this paper, a multi-classifier framework is proposed for robust still-to-video FR based on multiple and diverse face representations of a single reference face still. During enrollment of a target individual, the single reference face still is modeled using an ensemble of SVM classifiers based on different patches and face descriptors. Multiple feature extraction techniques are applied to patches isolated in the reference still to generate a diverse SVM pool that provides robustness to common nuisance factors (e.g., variations in illumination and pose). The estimation of discriminant feature subsets, classifier parameters, decision thresholds, and ensemble fusion functions is achieved using the high-quality reference still and a large number of faces captured in lower quality video of non-target individuals in the scene. During operations, the most competent subset of SVMs are dynamically selected according to capture conditions. Finally, a head-face tracker gradually regroups faces captured from different people appearing in a scene, while each individual-specific ensemble performs face matching. The accumulation of matching scores per face track leads to a robust spatio-temporal FR when accumulated ensemble scores surpass a detection threshold. Experimental results obtained with the Chokepoint and COX-S2V datasets show a significant improvement in performance w.r.t. reference systems, especially when individual-specific ensembles (1) are designed using exemplar-SVMs rather than one-class SVMs, and (2) exploit score-level fusion of local SVMs (trained using features extracted from each patch), rather than using either decision-level or feature-level fusion with a global SVM (trained by concatenating features extracted from patches).

Type de document:	Article publié dans une revue, révisé par les pairs
Professeur:	Professeur Granger, Éric Sabourin, Robert
Affiliation:	Génie de la production automatisée, Génie de la production automatisée
Date de dépôt:	23 janv. 2017 15:56
Dernière modification:	28 janv. 2020 16:22
URI:	https://espace2.etsmtl.ca/id/eprint/14332

Actions (Authentification requise)

Dernière vérification avant le dépôt