RMIT University
Browse

An ensemble approach for record matching in data linkage

chapter
posted on 2024-11-01, 02:10 authored by Simon Poon, Josiah Poon, Mary LamMary Lam, Daniel Sze
Objectives: To develop and test an optimal ensemble configuration of two complementary probabilistic data matching techniques namely Fellegi-Sunter (FS) and Jaro-Wrinkler (JW) with the goal of improving record matching accuracy. Methods: Experiments and comparative analyses were carried out to compare matching performance amongst the ensemble configurations combining FS and JW against the two techniques independently. Results: Our results show that an improvement can be achieved when FS technique is applied to the remaining unsure and unmatched records after the JW technique has been applied. Discussion: Whilst all data matching techniques rely on the quality of a diverse set of demographic data, FS technique focuses on the aggregating matching accuracy from a number of useful variables and JW looks closer into matching the data content (spelling in this case) of each field. Hence, these two techniques are shown to be complementary. In addition, the sequence of applying these two techniques is critical. Conclusion: We have demonstrated a useful ensemble approach that has potential to improve data matching accuracy, particularly when the number of demographic variables is limited. This ensemble technique is particularly useful when there are multiple acceptable spellings in the fields, such as names and addresses.

History

Related Materials

  1. 1.
    DOI - Is published in 10.3233/978-1-61499-666-8-113
  2. 2.
    ISBN - Is published in 9781614996668 (urn:isbn:9781614996668)

Start page

113

End page

119

Total pages

7

Outlet

Digital Health Innovation for Consumers, Clinicians, Connectivity and Community

Editors

Andrew Georgiou, Louise K. Schaper, Sue Whetton

Publisher

IOS Press

Place published

Netherlands

Language

English

Copyright

© 2016 The authors and IOS Press. This article is published online with Open Access by IOS Press and distributed under the terms of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0).

Former Identifier

2006111615

Esploro creation date

2021-11-28

Usage metrics

    Scholarly Works

    Categories

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC