RMIT University
Browse

A comparative analysis of active learning for biomedical text mining

journal contribution
posted on 2024-11-02, 16:09 authored by Usman Naseem, Matloob Khushim, Shah Khalid Khan, Kamran Shaukat, Mohammed Moni
An enormous amount of clinical free-text information, such as pathology reports, progress reports, clinical notes and discharge summaries have been collected at hospitals and medical care clinics. These data provide an opportunity of developing many useful machine learning applications if the data could be transferred into a learn-able structure with appropriate labels for supervised learning. The annotation of this data has to be performed by qualified clinical experts, hence, limiting the use of this data due to the high cost of annotation. An underutilised technique of machine learning that can label new data called Active Learning (AL) is a promising candidate to address the high cost of the label the data. AL has been successfully applied to labelling speech recognition and text classification, however, there is a lack of literature investigating its use for clinical purposes. We performed a comparative investigation of various AL techniques using ML and deep learning (DL) based strategies on three unique biomedical datasets. We investigated Random Sampling (RS), Least confidence (LC), Informative diversity and density (IDD), Margin and Maximum representativeness-diversity (MRD) AL query strategies. Our experiments show that AL has the potential to significantly reducing the cost of manual labelling. Additionally, AL-assisted pre-annotations accelerates the de novo annotation process with less annotation time required.

History

Related Materials

  1. 1.
    DOI - Is published in 10.3390/asi4010023
  2. 2.
    ISSN - Is published in 25715577

Journal

Applied System Innovation

Volume

4

Number

23

Issue

1

Start page

1

End page

18

Total pages

18

Publisher

MDPI AG

Language

English

Copyright

© 2021 by the authors. Licensee MDPI, Basel, Switzerland.

Former Identifier

2006105817

Esploro creation date

2021-06-01

Usage metrics

    Scholarly Works

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC