RMIT University
Browse

Annotating the biomedical literature for the human variome

journal contribution
posted on 2024-11-01, 14:59 authored by Karin Verspoor, Antonio Yepes, Lawrence CavedonLawrence Cavedon, Tara McIntosh, Asha Herten-Crabb, Zoe Thomas, John-Paul Plazzer
This article introduces the Variome Annotation Schema, a schema that aims to capture the core concepts and relations relevant to cataloguing and interpreting human genetic variation and its relationship to disease, as described in the published literature. The schema was inspired by the needs of the database curators of the International Society for Gastrointestinal Hereditary Tumours (InSiGHT) database, but is intended to have application to genetic variation information in a range of diseases. The schema has been applied to a small corpus of full text journal publications on the subject of inherited colorectal cancer. We show that the inter-annotator agreement on annotation of this corpus ranges from 0.78 to 0.95 F-score across different entity types when exact matching is measured, and improves to a minimum F-score of 0.87 when boundary matching is relaxed. Relations show more variability in agreement, but several are reliable, with the highest, cohort-has-size, reaching 0.90 F-score. We also explore the relevance of the schema to the InSiGHT database curation process. The schema and the corpus represent an important new resource for the development of text mining solutions that address relationships among patient cohorts, disease and genetic variation, and therefore, we also discuss the role text mining might play in the curation of information related to the human variome. The corpus is available at http://opennicta.com/home/health/variome.

History

Related Materials

  1. 1.
    DOI - Is published in 10.1093/database/bat019
  2. 2.
    ISSN - Is published in 17580463

Journal

Database

Volume

2013

Number

bat019

Start page

1

End page

13

Total pages

13

Publisher

Oxford University Press

Place published

United Kingdom

Language

English

Copyright

© The Author(s) 2013. Published by Oxford University Press.

Former Identifier

2006044720

Esploro creation date

2020-06-22

Fedora creation date

2015-01-19

Usage metrics

    Scholarly Works

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC