RMIT University
Browse

Exploring automatic inconsistency detection for literature-based gene ontology annotation

journal contribution
posted on 2024-11-02, 20:39 authored by Jiyu Chen, Benjamin Goudey, Justin Zobel, Nicholas Geard, Cornelia VerspoorCornelia Verspoor
MOTIVATION: Literature-based gene ontology annotations (GOA) are biological database records that use controlled vocabulary to uniformly represent gene function information that is described in the primary literature. Assurance of the quality of GOA is crucial for supporting biological research. However, a range of different kinds of inconsistencies in between literature as evidence and annotated GO terms can be identified; these have not been systematically studied at record level. The existing manual-curation approach to GOA consistency assurance is inefficient and is unable to keep pace with the rate of updates to gene function knowledge. Automatic tools are therefore needed to assist with GOA consistency assurance. This article presents an exploration of different GOA inconsistencies and an early feasibility study of automatic inconsistency detection. RESULTS: We have created a reliable synthetic dataset to simulate four realistic types of GOA inconsistency in biological databases. Three automatic approaches are proposed. They provide reasonable performance on the task of distinguishing the four types of inconsistency and are directly applicable to detect inconsistencies in real-world GOA database records. Major challenges resulting from such inconsistencies in the context of several specific application settings are reported. This is the first study to introduce automatic approaches that are designed to address the challenges in current GOA quality assurance workflows. The data underlying this article are available in Github at https://github.com/jiyuc/AutoGOAConsistency.

Funding

Automated assessment of data quality in biological knowledge resources

Australian Research Council

Find out more...

History

Related Materials

  1. 1.
    DOI - Is published in 10.1093/bioinformatics/btac230
  2. 2.
    ISSN - Is published in 13674803

Journal

Bioinformatics

Volume

38

Issue

1

Start page

273

End page

281

Total pages

9

Publisher

Oxford University Press

Place published

United Kingdom

Language

English

Copyright

© The Author(s) 2022. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/)

Former Identifier

2006116726

Esploro creation date

2022-10-26

Usage metrics

    Scholarly Works

    Keywords

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC