A search-based testing approach for deep reinforcement learning agents

Zolfagharian, Amirhossein, Abdellatif, Manel, Briand, Lionel C., Bagherzadeh, Mojtaba et Ramesh, S.. 2023. « A search-based testing approach for deep reinforcement learning agents ». IEEE Transactions on Software Engineering, vol. 49, nº 7. pp. 3715-3735.
Compte des citations dans Scopus : 41.

[thumbnail of Abdellatif-M-2023-26801.pdf]

Prévisualisation

PDF
Abdellatif-M-2023-26801.pdf - Version publiée
Licence d'utilisation : Creative Commons CC BY.
Télécharger (2MB) | Prévisualisation

URL Officielle: https://doi.org/10.1109/TSE.2023.3269804

Résumé

Deep Reinforcement Learning (DRL) algorithms have been increasingly employed during the last decade to solve various decision-making problems such as autonomous driving, trading decisions, and robotics. However, these algorithms have faced great challenges when deployed in safety-critical environments since they often exhibit erroneous behaviors that can lead to potentially critical errors. One of the ways to assess the safety of DRL agents is to test them to detect possible faults leading to critical failures during their execution. This raises the question of how we can efficiently test DRL policies to ensure their correctness and adherence to safety requirements. Most existing works on testing DRL agents use adversarial attacks that perturb states or actions of the agent. However, such attacks often lead to unrealistic states of the environment. Furthermore, their main goal is to test the robustness of DRL agents rather than testing the compliance of the agents’ policies with respect to requirements. Due to the huge state space of DRL environments, the high cost of test execution, and the black-box nature of DRL algorithms, exhaustive testing of DRL agents is impossible. In this paper, we propose a Search-based Testing Approach of Reinforcement Learning Agents (STARLA) to test the policy of a DRL agent by effectively searching for failing executions of the agent within a limited testing budget. We rely on machine learning models and a dedicated genetic algorithm to narrow the search toward faulty episodes (i.e., sequences of states and actions produced by the DRL agent). We apply STARLA on Deep-Q-Learning agents trained on two different RL problems widely used as benchmarks and show that STARLA significantly outperforms Random Testing by detecting more faults related to the agent's policy. We also investigate how to extract rules that characterize faulty episodes of the DRL agent using our search results. Such rules can be used to understand the conditions under which the agent fails and thus assess the risks of deploying it.

Type de document:	Article publié dans une revue, révisé par les pairs
Chercheur(-euse):	Chercheur(-euse) Abdellatif, Manel
Affiliation:	Génie logiciel et des technologies de l'information
Date de dépôt:	28 juin 2023 19:23
Dernière modification:	13 oct. 2023 15:52
URI:	https://espace2.etsmtl.ca/id/eprint/26801

Actions (Authentification requise)

Dernière vérification avant le dépôt