Hand Gesture Segmentation and Sign Language Recognition Using Deep Learning

Nurnoby, M Faisal

Home

KFUPM ePrints

In this section

Hand Gesture Segmentation and Sign Language Recognition Using Deep Learning

Nurnoby, M Faisal (2021) Hand Gesture Segmentation and Sign Language Recognition Using Deep Learning. Masters thesis, King Fahd University of Petroleum and Minerals.

Preview

PDF (MS THESIS)
MS_THESIS_G201706690_M_FAISAL.pdf - Submitted Version
Download (7MB) | Preview

Arabic Abstract

يعد التواصل من خلال إيماءات اليد أحد الاتجاهات البحثية النشطة في مجال التفاعل البشري مع الحاسب، وقد كان له أثر بالغ في تطوير تطبيقات لتسهيل التواصل مع الصم من خلال التعرف على لغة الٕاشارة. ورغم التقدم الملحوظ في هذاالمجال خصوًصا مع التطور التقني الحديث، مازالت هناك تحديات لمعالجة الصور ذات الخلفيات المعقدة بشكل أفضل. وتهدف هذا الدراسة إلى مناقشة الجوانب المختلفة للمشكلة وتطوير ومقارنة مجموعة من الحلول المبينة على التعلم العميق لدمج التقسيم الدلالي للٕايماءات مع أساليب تصنيفها، مما يسهم في رفع كفاءة التعرف عليها عند وجود خلفية معقدة أو تشويش في الصورة. وقد بينت نتائج التجارب المختلفة فعالية النموذج الذي أطلقنا عليه اسم S3RNet في التقسيم الدلالي بلغت 95.17%مع دقة تصنيف للٕايماءات بلغت 99.72%. كذلك قمنا بٕاجراء العديد من تجارب تحليل الحساسية لمراقبة متانة النموذج المقترح ومقارنته مع نماذج أخرى.

English Abstract

Communication through hand gestures is an active research in Human Computer Interaction (HCI). Deaf and speech-impaired community may benefit from the advancement of human gesture recognition, as it is an intermediary communication technique. Segmentation for gestures with complex background is a crucial task for better recognition. Semantic segmentation is gaining momentum in a range of fields. However, sign gestures have not witnessed a broad application of semantic segmentation. In this the- sis, we propose a CNN-based integrated mode for the segmentation and recognition of hand sign gestures with conditions such as complex background, blur, and variation of illuminations. Our model, dubbed as S3RNet, is composed of two components: a custom CNN-based segmenter with encoder-decoder architecture and a custom CNN-based recognition module. The segmentation component is designed with depth-wise separable convolutions networks with fine-tuned layers, and appropriate network parameters with an aim to have optimum performance even for a small dataset. In a second variation of our model, dubbed as DLSR, we replaced the segmentation component with a transfer learning-based ASPP module. The recognition module for both models remain the same. Besides, in designing a more suitable gesture recognition module, we proposed three architectures: two custom-based and one transfer learning-based. The proposed models were tested with three sign language datasets and one gesture dataset. A ground-truth gesture dataset has been created for evaluating the models. The results demonstrate that S3RNet achieved an mIoU score of 95.17% for segmentation and a recognition accuracy of 99.72% on the test dataset. Several sensitivity analysis experiments were also performed to observe the robustness of our models. We compared our results with other established models. Besides, as a separate task, this research also explored and presented a new deep classification approach for cross sign language recognition.

Item Type:	Thesis (Masters)
Subjects:	Computer Research > Information Technology
Department:	College of Computing and Mathematics > Information and Computer Science
Thesis Advisor:	Elsayed Elalfy,
Thesis Committee Members:	Hamzah Luqman, Wasfi Al-khatib,
Depositing User:	M FAISAL NURNOBY
Date Deposited:	29 Jun 2021 10:26
Last Modified:	02 Jul 2026 10:52
URI:	https://eprints.kfupm.edu.sa/id/eprint/141918