A showcase of ÉTS researchers’ publications and other contributions

Unsupervised exemplar-based learning for improved document image classification


Downloads per month over past year

Abuelwafa, Sherif, Pedersoli, Marco and Cheriet, Mohamed. 2019. « Unsupervised exemplar-based learning for improved document image classification ». IEEE Access, vol. 7. pp. 133738-133748.
Compte des citations dans Scopus : 7.

[thumbnail of Cheriet M 2019 19647.pdf]
Cheriet M 2019 19647.pdf - Published Version
Use licence: Creative Commons CC BY.

Download (1MB) | Preview


Many recent state-of-the-art approaches for document image classification are based on supervised feature learning that requires a large amount of labeled training data. In real-world problem of document image classification, the available amount of labeled data is limited and scarce while a large amount of unlabeled data is often available at almost no cost. In this paper, we present an approach for learning visual features for document analysis in an unsupervised way, which improves the document image classification performance without increasing the amount of annotated data. The proposed approach trains a neural network model on an auxiliary task in which every training example is associated with a different label (exemplar) and expanded to multiple images through a data augmentation technique. Thus, the learned model, which is trained in an unsupervised way, is used to boost the document classification performance. In fact, this learned model has proved to be consistently efficient in two different settings: i) as an unsupervised feature extractor to represent document images for an unsupervised classification task (i.e., clustering); and ii) in the parameters initialization of a supervised classification task trained with a small amount of annotated data. We perform experiments on the Tobacco-3482 dataset and demonstrate the capability of our approach to improve i) the unsupervised classification accuracy up to 2.4%; and ii) the supervised classification accuracy by 1.5% without any extra data or by 5% when using 3000 additional not annotated samples.

Item Type: Peer reviewed article published in a journal
Pedersoli, Marco
Cheriet, Mohamed
Affiliation: Génie des systèmes, Génie des systèmes
Date Deposited: 25 Oct 2019 21:04
Last Modified: 19 Oct 2020 14:57

Actions (login required)

View Item View Item