Hierarchical Attention Network Architecture Improvement for Cloze-Style Question Answering

Alsahli, Fahad

Home

KFUPM ePrints

In this section

Hierarchical Attention Network Architecture Improvement for Cloze-Style Question Answering

Alsahli, Fahad (2021) Hierarchical Attention Network Architecture Improvement for Cloze-Style Question Answering. Masters thesis, King Fahd University of Petroleum and Minerals.

Preview

PDF
Thesis.pdf
Download (1MB) | Preview

Arabic Abstract

يوجد العديد من مجالات معالجة اللغات الطبيعية. أحد هذه المجالات هو إجابة الأسئلة المتعلقة بموضوع معين. يهتم هذا المجال ببناء برامج تقوم بقراءة نصوص في موضوع معين، و تجيب عن أسئلة المهتمين بالموضوع بناءً على ما تمت قراءته. لتحسين أداء هذه البرامج، نماذج التعلم العميق أصبحت تستخدم في عملية بناء برامج إجابة الأسئلة. أحد هذه النماذج هو Hierarchical Attention Network. عملنا في هذه الدراسة على تحسين أداء النموذج من خلال تقليص الوقت اللَازم لمعالجة المدخلات و دراسة رفع دقة النموذج. استخدمنا أربع مجموعات مختلفة من البيانات خلال هذه الدراسة. عملية تحسين النموذج تمت على مرحلتين. المرحلة الأولى كانت عبارة عن تقليص عدد وحدات معالجة النصوص (Text Encoding Layers). نتج عن هذا التعديل انخفاض 41.39% في وقت معالجة النصوص مع عدم التأثير على دقة النموذج. بالنسبة للمرحلة الثانية، فهي عبارة عن دراسة تأثير Bidirectional Encoder Representations from Transformers (BERT) embeddings على دقة النموذج مقارنة بالطريقة التقليدية، وهي Global Vectors embeddings. توصلت المرحلة الثانية إلى أن BERT embeddings لا تحسن أداء النموذج مقارنة بالطريقة التقليدية.

English Abstract

Recently, researchers have been addressing Question Answering (QA) by utilizing deep learning architectures, e.g., Recurrent Neural Networks (RNNs), Convolutional Neural Networks, and attention mechanism. QA has several variants, for example, document-based QA and cloze-style QA. In general, QA tasks could be addressed via similar approaches. This is due to the nature of QA which needs a context and a question to be analyzed so that an answer can be retrieved. In this thesis, we are tackling cloze-style QA. In such tasks, a context and a query are given. Query is a sentence that is missing a piece of information (e.g., a word). The missing information should be inferred based on the given context. Hierarchical Attention Network (HAN) based models employ hierarchical attention, so they are suitable for cloze-style QA task. HAN models have two layers of text encoding and use Global Vectors (Glove) embeddings. We propose a HAN model to address cloze-style QA. Our proposed HAN model has a single layer of text encoding. We conduct experiments to compare proposed model against a baseline model (HAN pointer sum attention) having two layers of text encoding. Comparison is based on inference times (i.e., time needed to process and answer a sample) and accuracy scores. We utilize two publicly available cloze-style datasets which are two instances of Children’s Book Test (CBT), namely, Named Entity (CBT-NE) and Common Nouns (CBT-CN). We also use simplified versions of Cable News Network (CNN) and Daily Mail data. Results show that proposed model has lower inference time compared to baseline while maintaining baseline's accuracy score. Proposed model achieves an average inference time reduction of 41.39%. Moreover, we investigate effects of different Bidirectional Encoder Representations from Transformers (BERT) embeddings on accuracy score of HAN. We find that embeddings extracted from BERT's first layer give the best accuracy score compared to embeddings extracted from other layers. Then, we design and run experiments to compare BERT's first layer embeddings against Glove embeddings. Experiments show that the two techniques of embeddings result in similar accuracy scores.

Item Type:	Thesis (Masters)
Subjects:	Computer Engineering
Department:	College of Computing and Mathematics > Information and Computer Science
Thesis Advisor:	Andri Mirzal,
Thesis Committee Members:	Irfan Ahmad, Hamzah Luqman,
Depositing User:	FAHAD ALSAHLI
Date Deposited:	04 Jan 2021 11:09
Last Modified:	30 Jun 2026 09:14
URI:	https://eprints.kfupm.edu.sa/id/eprint/141785