Unsupervised exemplar-based learning for improved document image classification

Abuelwafa, Sherif, Pedersoli, Marco et Cheriet, Mohamed. 2019. « Unsupervised exemplar-based learning for improved document image classification ». IEEE Access, vol. 7. pp. 133738-133748.
Compte des citations dans Scopus : 10.

Prévisualisation

PDF
Cheriet M 2019 19647.pdf - Version publiée
Licence d'utilisation : Creative Commons CC BY.
Télécharger (1MB) | Prévisualisation

URL Officielle: http://dx.doi.org/10.1109/ACCESS.2019.2940884

Résumé

Many recent state-of-the-art approaches for document image classification are based on supervised feature learning that requires a large amount of labeled training data. In real-world problem of document image classification, the available amount of labeled data is limited and scarce while a large amount of unlabeled data is often available at almost no cost. In this paper, we present an approach for learning visual features for document analysis in an unsupervised way, which improves the document image classification performance without increasing the amount of annotated data. The proposed approach trains a neural network model on an auxiliary task in which every training example is associated with a different label (exemplar) and expanded to multiple images through a data augmentation technique. Thus, the learned model, which is trained in an unsupervised way, is used to boost the document classification performance. In fact, this learned model has proved to be consistently efficient in two different settings: i) as an unsupervised feature extractor to represent document images for an unsupervised classification task (i.e., clustering); and ii) in the parameters initialization of a supervised classification task trained with a small amount of annotated data. We perform experiments on the Tobacco-3482 dataset and demonstrate the capability of our approach to improve i) the unsupervised classification accuracy up to 2.4%; and ii) the supervised classification accuracy by 1.5% without any extra data or by 5% when using 3000 additional not annotated samples.

Type de document:	Article publié dans une revue, révisé par les pairs
Professeur:	Professeur Pedersoli, Marco Cheriet, Mohamed
Affiliation:	Génie des systèmes, Génie des systèmes
Date de dépôt:	25 oct. 2019 21:04
Dernière modification:	19 oct. 2020 14:57
URI:	https://espace2.etsmtl.ca/id/eprint/19647

Actions (Authentification requise)

Dernière vérification avant le dépôt