FRANÇAIS
A showcase of ÉTS researchers’ publications and other contributions
SEARCH

Reward shaping in DRL: A novel framework for adaptive resource management in dynamic environments

Chahoud, Mario, Sami, Hani, Mizouni, Rabeb, Bentahar, Jamal, Mourad, Azzam, Otrok, Hadi and Talhi, Chamseddine. 2025. « Reward shaping in DRL: A novel framework for adaptive resource management in dynamic environments ». Information Sciences, vol. 715.
Compte des citations dans Scopus : 2.

[thumbnail of Talhi-C-2025-30901.pdf]
Preview
PDF
Talhi-C-2025-30901.pdf - Published Version
Use licence: Creative Commons CC BY-NC-ND.

Download (2MB) | Preview

Abstract

In edge computing environments, efficient computation resource management is crucial for optimizing service allocation to hosts in the form of containers. These environments experience dynamic user demands and high mobility, making traditional static and heuristic-based methods inadequate for handling such complexity and variability. Deep Reinforcement Learning (DRL) offers a more adaptable solution, capable of responding to these dynamic conditions. However, existing DRL methods face challenges such as high reward variability, slow convergence, and difficulties in incorporating user mobility and rapidly changing environmental configurations. To overcome these challenges, we propose a novel DRL framework for computation resource optimization at the edge layer. This framework leverages a customized Markov Decision Process (MDP) and Proximal Policy Optimization (PPO), integrating a Graph Convolutional Transformer (GCT). By combining Graph Convolutional Networks (GCN) with Transformer encoders, the GCT introduces a spatio-temporal reward-shaping mechanism that enhances the agent’s ability to select hosts and assign services efficiently in real time while minimizing the overload. Our approach significantly enhances the speed and accuracy of resource allocation, achieving, on average across two datasets, a 30% reduction in convergence time, a 25% increase in total accumulated rewards, and a 35% improvement in service allocation efficiency compared to standard DRL methods and existing reward-shaping techniques. Our method was validated using two real-world datasets, MOBILE DATA CHALLENGE (MDC) and Shanghai Telecom, and was compared against standard DRL models, reward-shaping baselines, and heuristic methods.

Item Type: Peer reviewed article published in a journal
Professor:
Professor
Talhi, Chamseddine
Affiliation: Génie logiciel et des technologies de l'information
Date Deposited: 08 May 2025 15:04
Last Modified: 12 May 2025 18:31
URI: https://espace2.etsmtl.ca/id/eprint/30901

Actions (login required)

View Item View Item