Reward shaping in DRL: A novel framework for adaptive resource management in dynamic environments

Chahoud, Mario, Sami, Hani, Mizouni, Rabeb, Bentahar, Jamal, Mourad, Azzam, Otrok, Hadi et Talhi, Chamseddine. 2025. « Reward shaping in DRL: A novel framework for adaptive resource management in dynamic environments ». Information Sciences, vol. 715.
Compte des citations dans Scopus : 2.

Prévisualisation

PDF
Talhi-C-2025-30901.pdf - Version publiée
Licence d'utilisation : Creative Commons CC BY-NC-ND.
Télécharger (2MB) | Prévisualisation

URL Officielle: http://dx.doi.org/10.1016/j.ins.2025.122238

Résumé

In edge computing environments, efficient computation resource management is crucial for optimizing service allocation to hosts in the form of containers. These environments experience dynamic user demands and high mobility, making traditional static and heuristic-based methods inadequate for handling such complexity and variability. Deep Reinforcement Learning (DRL) offers a more adaptable solution, capable of responding to these dynamic conditions. However, existing DRL methods face challenges such as high reward variability, slow convergence, and difficulties in incorporating user mobility and rapidly changing environmental configurations. To overcome these challenges, we propose a novel DRL framework for computation resource optimization at the edge layer. This framework leverages a customized Markov Decision Process (MDP) and Proximal Policy Optimization (PPO), integrating a Graph Convolutional Transformer (GCT). By combining Graph Convolutional Networks (GCN) with Transformer encoders, the GCT introduces a spatio-temporal reward-shaping mechanism that enhances the agent’s ability to select hosts and assign services efficiently in real time while minimizing the overload. Our approach significantly enhances the speed and accuracy of resource allocation, achieving, on average across two datasets, a 30% reduction in convergence time, a 25% increase in total accumulated rewards, and a 35% improvement in service allocation efficiency compared to standard DRL methods and existing reward-shaping techniques. Our method was validated using two real-world datasets, MOBILE DATA CHALLENGE (MDC) and Shanghai Telecom, and was compared against standard DRL models, reward-shaping baselines, and heuristic methods.

Type de document:	Article publié dans une revue, révisé par les pairs
Professeur:	Professeur Talhi, Chamseddine
Affiliation:	Génie logiciel et des technologies de l'information
Date de dépôt:	08 mai 2025 15:04
Dernière modification:	12 mai 2025 18:31
URI:	https://espace2.etsmtl.ca/id/eprint/30901

Actions (Authentification requise)

Dernière vérification avant le dépôt