HAND GESTURE SEGMENTATION AND SIGN LANGUAGE RECOGNITION USING DEEP LEARNING

HAND GESTURE SEGMENTATION AND SIGN LANGUAGE RECOGNITION USING DEEP LEARNING. Masters thesis, King Fahd University of Petroleum and Minerals.

[img] PDF (MS THESIS)
MS_THESIS_G201706690_M_FAISAL.pdf - Submitted Version
Restricted to Repository staff only until 28 June 2022.

Download (7MB)

Arabic Abstract

يعد التواصل من خلال إيماءات اليد أحد الاتجاهات البحثية النشطة في مجال التفاعل البشري مع الحاسب، وقد كان له أثر بالغ في تطوير تطبيقات لتسهيل التواصل مع الصم من خلال التعرف على لغة الٕاشارة. ورغم التقدم الملحوظ في هذاالمجال خصوًصا مع التطور التقني الحديث، مازالت هناك تحديات لمعالجة الصور ذات الخلفيات المعقدة بشكل أفضل. وتهدف هذا الدراسة إلى مناقشة الجوانب المختلفة للمشكلة وتطوير ومقارنة مجموعة من الحلول المبينة على التعلم العميق لدمج التقسيم الدلالي للٕايماءات مع أساليب تصنيفها، مما يسهم في رفع كفاءة التعرف عليها عند وجود خلفية معقدة أو تشويش في الصورة. وقد بينت نتائج التجارب المختلفة فعالية النموذج الذي أطلقنا عليه اسم S3RNet في التقسيم الدلالي بلغت 95.17%مع دقة تصنيف للٕايماءات بلغت 99.72%. كذلك قمنا بٕاجراء العديد من تجارب تحليل الحساسية لمراقبة متانة النموذج المقترح ومقارنته مع نماذج أخرى.

English Abstract

Communication through hand gestures is an active research in Human Computer Interaction (HCI). Deaf and speech-impaired community may benefit from the advancement of human gesture recognition, as it is an intermediary communication technique. Segmentation for gestures with complex background is a crucial task for better recognition. Semantic segmentation is gaining momentum in a range of fields. However, sign gestures have not witnessed a broad application of semantic segmentation. In this the- sis, we propose a CNN-based integrated mode for the segmentation and recognition of hand sign gestures with conditions such as complex background, blur, and variation of illuminations. Our model, dubbed as S3RNet, is composed of two components: a custom CNN-based segmenter with encoder-decoder architecture and a custom CNN-based recognition module. The segmentation component is designed with depth-wise separable convolutions networks with fine-tuned layers, and appropriate network parameters with an aim to have optimum performance even for a small dataset. In a second variation of our model, dubbed as DLSR, we replaced the segmentation component with a transfer learning-based ASPP module. The recognition module for both models remain the same. Besides, in designing a more suitable gesture recognition module, we proposed three architectures: two custom-based and one transfer learning-based. The proposed models were tested with three sign language datasets and one gesture dataset. A ground-truth gesture dataset has been created for evaluating the models. The results demonstrate that S3RNet achieved an mIoU score of 95.17% for segmentation and a recognition accuracy of 99.72% on the test dataset. Several sensitivity analysis experiments were also performed to observe the robustness of our models. We compared our results with other established models. Besides, as a separate task, this research also explored and presented a new deep classification approach for cross sign language recognition.

Item Type: Thesis (Masters)
Subjects: Computer
Research > Information Technology
Department: College of Computing and Mathematics > Information and Computer Science
Committee Advisor: El-Alfy, E.-S.M.
Committee Members: Luqman, Hamzah and Al-Khatib, Wasfi G.
Depositing User: M FAISAL NURNOBY (g201706690)
Date Deposited: 29 Jun 2021 10:26
Last Modified: 29 Jun 2021 10:26
URI: http://eprints.kfupm.edu.sa/id/eprint/141918