Masud, Md Shafakat, Liu, Xiaolong, Martinez, Kenneth, Workman, Riley, Lang, Haxiang et Ren, Jing.
2025.
« Development of AI-driven universal robot with vision perception for robotic manipulation ».
In Proceedings of the CSME-CFDSC-CSR 2025 International Congress (Montreal, QC, Canada, May 25-28, 2025)
Coll. « Progress in Canadian Mechanical Engineering », vol. 8.
Prévisualisation |
PDF
223 - Development of AI-driven universal.pdf - Version publiée Licence d'utilisation : Tous les droits réservés aux détenteurs du droit d'auteur. Télécharger (539kB) | Prévisualisation |
Résumé
This paper presents an AI-driven robotic manipulation framework integrating vision-based perception, natural language processing (NLP), and reinforcement learning for autonomous object sorting. The system employs YOLOv7 for real-time object detection, a state-machine-based execution framework, and ChatGPT-4o for task planning. The research follows a dual-phase evaluation, conducting simulations in Gazebo using KUKA iiwa14, ABB IRB120, and UR5, followed by real-world experiments with a UR5 robotic arm. Performance is assessed based on task completion time, success rate, and error analysis, including position and orientation accuracy. Simulation results indicate that KUKA iiwa14 consistently achieved the shortest execution times, averaging 56.34s for Task 1, while UR5 exhibited higher variability, peaking at 139.69s in some sessions. The average success rate across tasks exceeded 85%, though UR5 recorded increased RMSE in orientation (5.78 in Task 1), highlighting challenges in fine manipulation. Failure analysis identified motion planning errors as the dominant cause of failures in Task 1 (50%), while classification errors (75%) and sequencing issues (80%) were prevalent in Tasks 2 and 3, respectively. Experimental validation confirmed the system’s feasibility, though real-world trials revealed higher execution times and reduced success rates due to environmental uncertainties. Additionally, reliance on ChatGPT-4o introduces challenges related to internet dependency and API costs. Future work will focus on transitioning to an open-source LLM (LLAMA) for local execution and further optimizing reinforcement learning strategies for real-time adaptability and improved robotic precision.
| Type de document: | Compte rendu de conférence |
|---|---|
| Éditeurs: | Éditeurs ORCID Hof, Lucas A. NON SPÉCIFIÉ Di Labbio, Giuseppe NON SPÉCIFIÉ Tahan, Antoine NON SPÉCIFIÉ Sanjosé, Marlène NON SPÉCIFIÉ Lalonde, Sébastien NON SPÉCIFIÉ Demarquette, Nicole R. NON SPÉCIFIÉ |
| Date de dépôt: | 18 déc. 2025 15:16 |
| Dernière modification: | 18 déc. 2025 15:16 |
| URI: | https://espace2.etsmtl.ca/id/eprint/32455 |
Actions (Authentification requise)
![]() |
Dernière vérification avant le dépôt |

