RMIT University
Browse

The Secondary Use of Electronic Health Records for Data Mining: Data Characteristics and Challenges

journal contribution
posted on 2024-11-02, 19:24 authored by Tabinda Sarwar, Sattar Seifollahi, Jeffrey ChanJeffrey Chan, Xiuzhen ZhangXiuzhen Zhang, David Akman, Irene HudsonIrene Hudson, Cornelia VerspoorCornelia Verspoor, Lawrence CavedonLawrence Cavedon
The primary objective of implementing Electronic Health Records (EHRs) is to improve the management of patients’ health-related information. However, these records have also been extensively used for the secondary purpose of clinical research and to improve healthcare practice. EHRs provide a rich set of information that includes demographics, medical history, medications, laboratory test results, and diagnosis. Data mining and analytics techniques have extensively exploited EHR information to study patient cohorts for various clinical and research applications, such as phenotype extraction, precision medicine, intervention evaluation, disease prediction, detection, and progression. But the presence of diverse data types and associated characteristics poses many challenges to the use of EHR data. In this article, we provide an overview of information found in EHR systems and their characteristics that could be utilized for secondary applications. We first discuss the different types of data stored in EHRs, followed by the data transformations necessary for data analysis and mining. Later, we discuss the data quality issues and characteristics of the EHRs along with the relevant methods used to address them. Moreover, this survey also highlights the usage of various data types for different applications. Hence, this article can serve as a primer for researchers to understand the use of EHRs for data mining and analytics purposes.

History

Journal

ACM Computing Surveys

Volume

55

Issue

2

Start page

1

End page

36

Total pages

36

Publisher

Association for Computing Machinery

Place published

United States

Language

English

Copyright

© 2022 Association for Computing Machinery.

Former Identifier

2006110681

Esploro creation date

2022-02-16

Usage metrics

    Scholarly Works

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC