RMIT University
Browse

Automatic consistency assurance for literature-based gene ontology annotation

journal contribution
posted on 2024-11-02, 18:45 authored by Jiyu Chen, Nicholas Geard, Justin Zobel, Cornelia VerspoorCornelia Verspoor
Background: Literature-based gene ontology (GO) annotation is a process where expert curators use uniform expressions to describe gene functions reported in research papers, creating computable representations of information about biological systems. Manual assurance of consistency between GO annotations and the associated evidence texts identified by expert curators is reliable but time-consuming, and is infeasible in the context of rapidly growing biological literature. A key challenge is maintaining consistency of existing GO annotations as new studies are published and the GO vocabulary is updated. Results: In this work, we introduce a formalisation of biological database annotation inconsistencies, identifying four distinct types of inconsistency. We propose a novel and efficient method using state-of-the-art text mining models to automatically distinguish between consistent GO annotation and the different types of inconsistent GO annotation. We evaluate this method using a synthetic dataset generated by directed manipulation of instances in an existing corpus, BC4GO. We provide detailed error analysis for demonstrating that the method achieves high precision on more confident predictions. Conclusions: Two models built using our method for distinct annotation consistency identification tasks achieved high precision and were robust to updates in the GO vocabulary. Our approach demonstrates clear value for human-in-the-loop curation scenarios.

Funding

Automated assessment of data quality in biological knowledge resources

Australian Research Council

Find out more...

History

Journal

BMC Bioinformatics

Volume

22

Number

565

Issue

1

Start page

1

End page

22

Total pages

22

Publisher

BioMed Central

Place published

United Kingdom

Language

English

Copyright

© The Author(s), 2021. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License

Former Identifier

2006113008

Esploro creation date

2022-04-08

Usage metrics

    Scholarly Works

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC