ENGLISH
La vitrine de diffusion des publications et contributions des chercheurs de l'ÉTS
RECHERCHER

Intelligent proactive fault tolerance at the edge through resource usage prediction

Theodoropoulos, Theodoros, Violos, John, Tsanakas, Stylianos, Leivadeas, Aris, Tsepers, Konstantinos et Varvarigou, Theodora. 2022. « Intelligent proactive fault tolerance at the edge through resource usage prediction ». ITU Journal on Future and Evolving Technologies, vol. 3, nº 3. pp. 761-778.

[thumbnail of Leivadeas-A-2022-27727.pdf]
Prévisualisation
PDF
Leivadeas-A-2022-27727.pdf - Version publiée
Licence d'utilisation : Creative Commons CC BY-NC-ND.

Télécharger (12MB) | Prévisualisation

Résumé

The proliferation of demanding applications and edge computing establishes the need for efficient management of the underlying computing infrastructures, urging the providers to rethink their operational methods. In this paper, we propose an Intelligent Proactive Fault Tolerance (IPFT) method that leverages the edge resource usage predictions through Recurrent Neural Networks (RNNs). More specifically, we focus on the process faults, which are related with the inability of the infrastructure to provide Quality of Service (QoS) in acceptable ranges due to the lack of processing power. In order to tackle this challenge we propose a composite deep learning architecture that predicts the resource usage metrics of the edge nodes and triggers proactive node replications and task migration. Taking also into consideration that the edge computing infrastructure is also highly dynamic and heterogeneous, we propose an innovative Hybrid Bayesian Evolution Strategy (HBES) algorithm for automated adaptation of the resource usage models. The proposed resource usage prediction mechanism has been experimentally evaluated and compared with other state of the art methods with significant improvements in terms of Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE). Additionally, the IPFT mechanism that leverages the resource usage predictions has been evaluated in an extensive simulation in CloudSim Plus and the results show significant improvement compared to the reactive fault tolerance method in terms of reliability and maintainability.

Type de document: Article publié dans une revue, révisé par les pairs
Professeur:
Professeur
Leivadeas, Aris
Affiliation: Génie logiciel et des technologies de l'information
Date de dépôt: 18 sept. 2023 19:58
Dernière modification: 17 oct. 2023 18:30
URI: https://espace2.etsmtl.ca/id/eprint/27727

Actions (Authentification requise)

Dernière vérification avant le dépôt Dernière vérification avant le dépôt