RMIT University
Browse

Graph embedding-based link prediction for literature-based discovery in Alzheimer's Disease

journal contribution
posted on 2024-11-03, 10:12 authored by Yiyuan Pu, Daniel Beck, Cornelia VerspoorCornelia Verspoor
Objective: We explore the framing of literature-based discovery (LBD) as link prediction and graph embedding learning, with Alzheimer’s Disease (AD) as our focus disease context. The key link prediction setting of prediction window length is specifically examined in the context of a time-sliced evaluation methodology. Methods: We propose a four-stage approach to explore literature-based discovery for Alzheimer’s Disease, creating and analyzing a knowledge graph tailored to the AD context, and predicting and evaluating new knowledge based on time-sliced link prediction. The first stage is to collect an AD-specific corpus. The second stage involves constructing an AD knowledge graph with identified AD-specific concepts and relations from the corpus. In the third stage, 20 pairs of training and testing datasets are constructed with the time-slicing methodology. Finally, we infer new knowledge with graph embedding-based link prediction methods. We compare different link prediction methods in this context. The impact of limiting prediction evaluation of LBD models in the context of short-term and longer-term knowledge evolution for Alzheimer’s Disease is assessed. Results: We constructed an AD corpus of over 16 k papers published in 1977–2021, and automatically annotated it with concepts and relations covering 11 AD-specific semantic entity types. The knowledge graph of Alzheimer’s Disease derived from this resource consisted of 11 k nodes and 394 k edges, among which 34% were genotype-phenotype relationships, 57% were genotype-genotype relationships, and 9% were phenotype-phenotype relationships. A Structural Deep Network Embedding (SDNE) model consistently showed the best performance in terms of returning the most confident set of link predictions as time progresses over 20 years. A huge improvement in model performance was observed when changing the link prediction evaluation setting to consider a more distant future, reflecting the time required for knowledge accumulation. Conclusion: Neural network graph-embedding link prediction methods show promise for the literature-based discovery context, although the prediction setting is extremely challenging, with graph densities of less than 1%. Varying prediction window length on the time-sliced evaluation methodology leads to hugely different results and interpretations of LBD studies. Our approach can be generalized to enable knowledge discovery for other diseases.

History

Related Materials

  1. 1.
    DOI - Is published in 10.1016/j.jbi.2023.104464
  2. 2.
    ISSN - Is published in 15320464

Journal

Journal of Biomedical Informatics

Volume

145

Number

104464

Start page

1

End page

13

Total pages

13

Publisher

Academic Press

Place published

United States

Language

English

Copyright

© 2023 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

Former Identifier

2006125494

Esploro creation date

2023-09-21

Usage metrics

    Scholarly Works

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC