Node Feature Enrichment and Pathway-Focused Hypergraph Neural Networks for Cancer Driver Gene Prediction

Node Feature Enrichment and Pathway-Focused Hypergraph Neural Networks for Cancer Driver Gene Prediction. Masters thesis, King Fahd University of Petroleum and Minerals.

[img] PDF (Master Thesis)
Etab_Alotaibi_Final_MS_Thesis_.pdf - Accepted Version
Restricted to Repository staff only until 12 November 2026.

Download (8MB)

Arabic Abstract

إنّ اﻟﺴﺮطﺎن ﻣﻦ أﻛﺜﺮ أﻣﺮاض اﻟﻌﺼﺮ ﺷﯿﻮﻋًﺎ، وﻗﺪ أھﻠﻚ ﻣﻼﯾﯿﻦ اﻟﺒﺸﺮ ﻓﻲ اﻟﺴﻨﻮات اﻟﻘﻠﯿﻠﺔ اﻟﻤﺎﺿﯿﺔ. وﻗﺪ ﺗﺰداد ھﺬه اﻷرﻗﺎم ﻓﻲ اﻟﺴﻨﻮات اﻟﻌﺸﺮة اﻟﻤﻘﺒﻠﺔ ﻛﻤﺎ ﺗﻘﻮل ﺗﻘﺎرﯾﺮ ﻣﻨﻈﻤﺔ اﻟﺼﺤﺔ اﻟﻌﺎﻟﻤﯿﺔ. واﻟﺴﺮطﺎن ﻓﻲ ﺟﻮّھﺮه ﻣﺮضٌ ﺟﯿﻨﻲ ﯾﻨﺸﺄ ﻧﺘﯿﺠﺔ ﺣﺪوث طﻔﺮات - ﺗﻐﯿﺮات ﻓﻲ ﺗﺴﻠﺴﻞ اﻟﺤﻤﺾ اﻟﻨﻮوي - ﻓﺘﺨﺴﺮ اﻟﺨﻠﯿﺔ وظﺎﺋﻔﮭﺎ اﻟﻄﺒﯿﻌﯿﺔ وﺗُﺤﻔّﺰ ﻋﻠﻰ اﻻﻧﻘﺴﺎم ﻏﯿﺮ اﻟﻄﺒﯿﻌﻲ، ﻓﺘﻨﺘﺸﺮ ھﺬه اﻟﺨﻼﯾﺎ ﻣُﺸﻜِّﻠﺔ ﻣﺎ ﯾُﻌﺮف ﺑﺎﻟﻮرم اﻟﺨﺒﯿﺚ. وﻟﺘﻄﻮر اﻟﺬﻛﺎء اﻻﺻﻄﻨﺎﻋﻲ وﻧﺠﺎح اﻵﻟﺔ ﻓﻲ ﻣﺠﺎل اﻟﻤﻌﻠﻮﻣﺎﺗﯿﺔ اﻟﺤﯿﻮﯾﺔ وﺗﺤﻠﯿﻞ اﻟﺒﯿﺎﻧﺎت اﻟﺤﯿﻮﯾﺔ ﯾﺪٌ ﻓﻲ ھﺬه اﻷطﺮوﺣﺔ؛ ﻓﻨﻘﺪم ﻓﯿﮭﺎ ﻧﻤﻮذﺟﺎ ﯾﻌﺘﻤﺪ ﻋﻠﻰ ﺗﻘﻨﯿﺎت اﻟﺘﻌﻠﻢ اﻟﻌﻤﯿﻖ، ﯾﺴﻤﻰ Hypergraph Neural Networks و اﻟﺬي أُﺧﺘﯿﺮ ﻟﺪراﺳﺔ اﻟﺠﯿﻨﺎت اﻟﺴﺮطﺎﻧﯿﺔ وﺗﺤﻠﯿﻞ ﺗﻔﺎﻋﻠﮭﺎ ﺿﻤﻦ ﺷﺒﻜﺎت وﻣﺴﺎرات ﺣﯿﻮﯾﺔ ﻣﻌﻘﺪة. اﻋﺘﻤﺪﻧﺎ ﻓﻲ ھﺬا اﻟﻨﻤﻮذج ﻋﻠﻰ ﻣﺒﺪأ رﺋﯿﺲ ﻓﻲ ﻋﻠﻢ أﺣﯿﺎء اﻟﺨﻠﯿﺔ، وھﻮ أنّ اﻟﺠﯿﻨﺎت ﺗﺘﻔﺎﻋﻞ داﺧﻞ ﻣﺴﺎرات ﺣﯿﻮﯾﺔ وﻛﯿﻤﯿﺎﺋﯿﺔ ﻣﺤﺪدة ﻷداء وظﺎﺋﻒ ﻣﮭﻤﺔ ﻟﻠﺨﻠﯿﺔ، وﻟﺬﻟﻚ ﺣﺪوث طﻔﺮة ﻓﻲ أﺣﺪ اﻟﺠﯿﻨﺎت ﻗﺪ ﯾﺆﺛﺮ ﻋﻠﻰ اﻟﻤﺴﺎر ﻛﺎﻣ ًﻼ ، ﻣﻤﺎ ﻗﺪ ﯾﻤﻜﻨﻨﺎ ﻣﻦ اﻛﺘﺸﺎف ﺟﯿﻨﺎت ﺟﺪﯾﺪة ﻣﺮﺷﺤﺔ ﻟﺘﻜﻮن ﻣﺴﺒﺒﺔ اﻟﺴﺮطﺎن. وﺑﺈﺳﺘﺨﺪام اﻟﺨﺼﺎﺋﺺ اﻟﺤﯿﻮﯾﺔ ﻣﺜﻞ اﻟﻄﻔﺮات اﻟﺠﯿﻨﯿﺔ، واﻟﺘﻌﺒﯿﺮ اﻟﺠﯿﻨﻲ، واﻷﻧﻤﺎط اﻟﻈﺎھﺮﯾﺔ ﻟﻠﻄﻔﺮات ﻟﻜﻞ ﺟﯿﻦ، ﻗﻤﻨﺎ ﺑﺘﺪرﯾﺐ اﻟﺸﺒﻜﺔ اﻟﻌﺼﺒﯿﺔ ﻟﺘﺤﺪﯾﺪ ﻋﺪد ﻣﻦ اﻟﺠﯿﻨﺎت اﻟﻤُﺮﺷﺤﺔ ﻷن ﺗﻜﻮن ﻣﺮﺗﺒﻄﺔ ﺑﺎﻟﻤﺮض ﻓﻲ أﻧﻮاع ﻣﺘﻌﺪدة ﻣﻦ اﻟﺴﺮطﺎﻧﺎت ﻣﺜﻞ ﺳﺮطﺎن اﻟﺜﺪي وﺳﺮطﺎن اﻟﺮﺋﺔ وﻏﯿﺮھﻤﺎ. وﻗﺪ ﺗﻢ اﻟﺘﺤﻘﻖ ﻣﻦ ھﺬه اﻟﺠﯿﻨﺎت إﺣﺼﺎﺋﯿًﺎ، ﺣﯿﺚ وُ ﺟﺪ إرﺗﺒﺎط وﺛﯿﻖ ﺑﯿﻦ ﺧﺼﺎﺋﺼﮭﺎ وﺧﺼﺎﺋﺺ اﻟﺠﯿﻨﺎت اﻟﺴﺮطﺎﻧﯿﺔ اﻟﻤﻌﺮوﻓﺔ ﻣُﺴﺒﻘﺎً. ﻣﻤﺎ ﯾﺠﻌﻠﮭﺎ ﻣُﺮﺷﺤﺔ ﺑﻨﺴﺒﺔ ﻛﺒﯿﺮة ﻟﺘﻜﻮن ﺟﯿﻨﺎت ﻣﺴﺒﺒﺔ ﻟﻠﺴﺮطﺎن. وﺑﯿّﻨﺖ اﻟﻨﺘﺎﺋﺞ أن ھﺬه اﻟﺠﯿﻨﺎت ﺗﻤّﺖ دُرﺳﺖ ﻣُﺴﺒﻘﺎً و أُﺛﺒﺖ أن ﻟﮭﺎ ﻧﺴﺒﺔ ﻟﯿﺴﺖ ﺑﺎﻟﻘﻠﯿﻠﺔ ﻓﻲ ﺗﺤﺪﯾﺪ ﻣﺪة ﻣُﻘﺎوﻣﺔ اﻟﻤﺮض، وأن اﻟﻄﻔﺮات ﻓﻲ ھﺬه اﻟﺠﯿﻨﺎت ﻗﺪ ﺗﻜﻮن طﻔﺮات ﺧﺒﯿﺜﺔ ﺗﺆﺛﺮ ﻋﻠﻰ اﻟﻮظﺎﺋﻒ اﻷﺳﺎﺳﯿﺔ ﻟﻠﺒﺮوﺗﯿﻨﺎت اﻟﻨﺎﺗﺠﺔ ﻋﻨﮭﺎ. و ﻛﺬﻟﻚ ﺣﻘﻖ اﻟﻨﻤﻮذج اﻟﻤﻘﺘﺮح ﻧﺘﺎﺋﺞ دﻗﯿﻘﺔ ﻣﻘﺎرﻧﺔ ﺑﻨﻤﺎذج ﻣُﺸﺎﺑﮭﺔ ﻓﻲ اﻟﺪراﺳﺎت اﻟﺴﺎﺑﻘﺔ وھﺬا ﯾﺜﺒﺖ ﻓﻌﺎﻟﯿﺔ اﺳﺘﺨﺪام اﻟﻤﺴﺎرات اﻟﺤﯿﻮﯾﺔ ﻟﻠﺠﯿﻨﺎت اﻟﺴﺮطﺎﻧﯿﺔ واﻟﺨﺼﺎﺋﺺ اﻟﺤﯿﻮﯾﺔ اﻟﻤﺨﺘﻠﻔﺔ ﻟﻠﺠﯿﻦ ﻟﺘﺪرﯾﺐ اﻷﻟﺔ ﻓﯿﺘﻌﻠﻢ اﻟﻨﻤﻮذج ﺑﺪﻗﺔ ﻣﻦ ھﺬه اﻟﺒﯿﺎﻧﺎت وﯾﺘﻌﺮف ﻋﻠﻰ اﻟﺠﯿﻦ ذو اﻹﺣﺘﻤﺎﻟﯿﺔ اﻷﻋﻠﻰ. ﺧﺘﺎﻣﺎً، ﻛﺸﻒ ھﺬا اﻟﺒﺤﺚ ﻋﻦ أھﻤﯿﺔ اﺳﺘﺨﺪام ﺗﻘﻨﯿﺎت اﻟﺘﻌﻠﻢ اﻵﻟﻲ ﻓﻲ دراﺳﺔ اﻟﺸﺒﻜﺎت اﻟﺤﯿﻮﯾﺔ و اﻟﺠﯿﻨﺎت اﻟﻤُﺤﻔﺰة ﻟﻠﻤﺮض، ﻟﻤﺎ ﻟﮭﺎ ﻣﻦ ﻗﺪرة ﻋﻠﻰ ﺗﺴﺮﯾﻊ اﻻﻛﺘﺸﺎف وﺗﻮﻓﯿﺮ اﻟﻮﻗﺖ واﻟﺠﮭﺪ ﻓﻲ اﻟﺘﻌﺎﻣﻞ ﻣﻊ اﻟﺒﯿﺎﻧﺎت ﺷﺪﯾﺪة اﻟﺘﻌﻘﯿﺪ .

English Abstract

Cancer development is driven by a small subset of somatic mutations, called driver mutations, which disrupt key regulatory processes and promote tumor initiation. These mutations typically occur in specific genes, known as driver genes. Identifying driver genes among thousands of mutated genes in tumors remains a major challenge. Existing computational approaches, including graph-based deep learning models, often focus on pairwise gene interactions and fail to capture the higher-order relationships present in biological pathways, limiting their ability to uncover functionally related drivers. In this thesis, we introduce a Hypergraph Neural Network (HGNN) framework that models genes within shared pathways, integrating diverse molecular and phenotypic features, including somatic mutations, gene expression, and DNA methylation into a pathway-informed hypergraph structure. Our method generates enriched gene representations that reflect higher-order interactions beyond simple pairwise connections. We evaluate the framework on both pan-cancer and cancer-type–specific datasets, indicating improved performance over state-of-the-art driver gene prediction methods. Beyond predictive accuracy, our model prioritizes candidate genes that share functional characteristics with known drivers. To assess these predictions, we examined the mutation patterns of these candidate genes, their variants, and their associations with biological pathways, comparing them to established drivers. This analysis provides biological evidence that the prioritized genes may play important roles in cancer, demonstrating the utility of pathway-guided hypergraph modeling for advancing cancer driver genes discovery.

Item Type: Thesis (Masters)
Subjects: Computer
Research
Department: College of Computing and Mathematics > Information and Computer Science
Committee Advisor: Tran, Van Dinh
Committee Members: Al-Khatib, Wasfi G. and Niazi, Mahmood
Depositing User: ETAB ALOTAIBI (g202212960)
Date Deposited: 27 Nov 2025 05:43
Last Modified: 27 Nov 2025 05:43
URI: http://eprints.kfupm.edu.sa/id/eprint/143742