Author name disambiguation for ranking and clustering PubMed data using NetClus
conference contribution
posted on 2024-10-31, 16:00authored byArvin Varadharajalu, Wei Liu, Wilson Wong
The ranking and clustering of publication databases are often used to discover useful information about research areas. NetClus is an iterative algorithm for clustering heterogenous star-schema information network that incorporates the ranking information of individual data types. The algorithm has been evaluated using the DBLP database. In this paper, we apply NetClus on PubMed, a free database of articles on life sciences and biomedical topics to discover key aspects of cancer research. The absence of unique identifiers for authors in PubMed introduces additional challenges. To address this, we introduce an improved author disambiguation technique using affiliation string normalisation based on vector space model together with co-author networks. Our technique for disambiguating authors, which offers a higher accuracy than existing techniques, significantly improves NetClus clustering results.
History
Start page
152
End page
161
Total pages
10
Outlet
24th Australasian Joint Conference on Artificial Intelligence (AI2011)
Editors
D. Wang and M. Reynold
Name of conference
24th Australasian Joint Conference on Artificial Intelligence (AI2011)