ENGLISH
La vitrine de diffusion des publications et contributions des chercheurs de l'ÉTS
RECHERCHER

Benchmarking pre-trained text embedding models in aligning built asset information

Shahinmoghadam, Mehrzad et Motamedi, Ali. 2025. « Benchmarking pre-trained text embedding models in aligning built asset information ». Scientific Reports, vol. 15, nº 1.

[thumbnail of Motamedi-A-2025-31230.pdf]
Prévisualisation
PDF
Motamedi-A-2025-31230.pdf - Version publiée
Licence d'utilisation : Creative Commons CC BY-NC-ND.

Télécharger (2MB) | Prévisualisation

Résumé

Accurate mapping of the built asset information to various data classification systems and taxonomies is crucial for effective asset management, whether for compliance at project handover or ad-hoc data integration scenarios. Due to the complex nature of built asset data, which predominantly comprises technical text elements, this process remains largely manual and reliant on domain expert input. Recent breakthroughs in contextual text representation learning (text embedding), particularly through pre-trained large language models, offer promising approaches that can facilitate the automation of cross-mapping of the built asset data. However, no comprehensive evaluation has yet been conducted to assess these models’ ability to effectively represent the complex semantics specific to built asset technical terminology. This study presents a comparative benchmark of state-of-the-art text embedding models to evaluate their effectiveness in aligning built asset information with domain-specific technical concepts. Our proposed datasets are derived from two renowned built asset data classification dictionaries. The results of our benchmarking across six proposed datasets, covering clustering, retrieval, and reranking tasks, showed performance variations among models, deviating from the common trend of larger models achieving higher scores. Our results underscore the importance of domain-specific evaluations and future research into domain adaptation techniques, with instruction-tuning as a promising direction. The benchmarking resources are published as an open-source library, which will be maintained and extended to support future evaluations in this field.

Type de document: Article publié dans une revue, révisé par les pairs
Professeur:
Professeur
Motamedi, Ali
Affiliation: Génie de la construction
Date de dépôt: 30 juill. 2025 13:31
Dernière modification: 12 août 2025 23:15
URI: https://espace2.etsmtl.ca/id/eprint/31230

Actions (Authentification requise)

Dernière vérification avant le dépôt Dernière vérification avant le dépôt