Automatic Identification of Arabic Figurative Language

Abouhagar, Leina

Home

KFUPM ePrints

In this section

Automatic Identification of Arabic Figurative Language

Abouhagar, Leina (2024) Automatic Identification of Arabic Figurative Language. Masters thesis, King Fahd University of Petroleum and Minerals.

Preview

PDF
Automatic Identification Of Arabic Figurative Language.pdf - Accepted Version
Download (15MB) | Preview

Arabic Abstract

مع زيادة استخدام تطبيقات معالجة اللغات الطبيعية في حياتنا اليومية، بدءًا من ترجمة اللغات إلى التحدث مع الروبوتات، نرى نموًا متزايدًا في الأبحاث التي تهدف إلى استخدام خوازميات الذكاء الاصطناعي في فهم وتقليد لغة البشر. أحد التحديات الرئيسية في مجال معالجة اللغات الطبيعية العربية هو فهم اللغات التصويرية البلاغية، بسبب طبيعتها المجازية. اللغة التصويرية هي الاستخدام الإبداعي للّغة لإيصال رسالة ليست بالمعنى الحرفي للكلمات المستخدمة. تستخدم اللغة التصويرية المجازية في كل من الأدب والشعر لإنشاء خطاب ابداعي يصل إليه القارئ من خلال حواسه حيث ان اللغة المجازية تجعل الكاتب أكثر عمقًا و ابداعا في الخطاب. تكمن صعوبة اللغات التصويرية في طبيعتها المجازية حيث ان بعض الكتابات المجازية تحمل معاني أخرى تكون معاكسة لمعناها الحرفي، وهنا يكمن التحدي المتمثل في فهمها. تعتبر معالجة اللغات التصورية العربية، مجالا حديث النشأة. حيث ان الباحثين المعاصرين اهتموا في دراسة أثر اللغات التصويرية على أداء الأنظمة الذكية في فهم اللغات الطبيعية العربية، و أظهروا أن استخدام الاستعارة و هي أحد أنواع اللغات التصويرية تؤثر في قدرة الآلة في تحليل النصوص العربية. لتأثيرها المباشر في أداء الأنظمة الذكية، اهتم الباحثون في بناء نماذج تعلم آلة قادرة على تحليل بعض اللغات التصويرية في النصوص مثل السخرية والاستهزاء والمبالغة و التشبيه. ما توصل له الباحثون حتي الآن هو إنشاء نماذج قادرة على الكشف عن بعض أنواع اللغات التصويرية كل على حده (تصنيف ثنائي). هذه الرسالة تساهم في مجال معالجة اللغات الطبيعية العربية من خلال إنشاء نماذج ذكاء اصطناعي قادرة على فهم اللغات التصويرية العربية بشكل عام و الاستعارة كأحد أنواعها بشكل خاص. في هذه الرسالة، تم تجميع بيانات الاستعارة العربية من ثلاثة مصادر مختلفة: القرآن والشعر والنثر و إنشاء نموذج ذكاء اصطناعي مبني على نموذج MARBERT قادر على الكشف عن الاستعارة في النصوص العربية بدقة81% باستخدام مجموعة بيانات الاستعارة العربية التي تم جمعها. بالإضافة إلى ذلك ،تم تطوير نموذج ذكاء اصطناعي قادر على تصنيف النصوص العربية إن كانت تحتوي على أربع أنواع من اللغات التصويرية أو أنها حرفية بإجمالي خمسة فئات من السخرية والمبالغة والتشبيه والاستعارة والحرفية. تم دراسة أداء النموذج باستخدام F1-score = 0.78. أخيرًا ، في هذه الرسالة تم بناء نموذج ذكاء اصطناعي قادر على الكشف عن موضع الاستعارة في الجملة الاستعارية. و تم دراسة أداء النموذج باستخدام F1-score = 0.62.

English Abstract

With the increased use of natural language processing applications in our everyday life, ranging from language translation to chatbots, there is an increase in research aiming to understand and generate human-like natural language speech. One of the main challenges is understanding figurative language because of its literal nature. Figurative language is the creative use of language to deliver a message that is not the literal and strict meaning of the words used. Previous research in Arabic figurative language detection covers a few figurative language types such as irony, hyperbole, sarcasm, and simile. The research on Arabic figurative language mainly deals with them as binary classification problems, where a sentence can be of a specific figurative class or not. This thesis contributes to the Arabic natural language processing field by studying Arabic metaphorical figurative language identification at the sentence level and fine-grain word level. Also, it studies Arabic figurative language as a multi-class problem where the given sentence could be one of four figurative classes: metaphor, irony, hyperbole, sarcasm, and simile. In this thesis, an Arabic metaphor dataset has been collected from three different resources, the Quran, poetry, and prose. Using the collected Arabic metaphor dataset, a binary Arabic metaphor classifier has been developed which reaches 81% accuracy. Also, a multi-class classifier has been developed for Arabic figurative language classification with a total of five classes: sarcasm, hyperbole, simile, metaphor, and literal, with the best F1-score of 0.78. Finally, by using fine-grain analysis of Arabic metaphors, a model for identifying the words that are the basis of the metaphor in the sentence has been developed which shows the best F1-score of 0.62.

Item Type:	Thesis (Masters)
Subjects:	Computer
Department:	College of Computing and Mathematics > Information and Computer Science
Thesis Advisor:	Irfan Ahmad,
Thesis Committee Members:	Mohammad Alshayeb, Imane Boudellioua,
Depositing User:	LEINA ABOUHAGAR
Date Deposited:	07 Jan 2024 08:26
Last Modified:	30 Jun 2026 09:16
URI:	https://eprints.kfupm.edu.sa/id/eprint/142742