RMIT University
Browse

Representing documents with named entities for story link detection (SLD)

conference contribution
posted on 2024-10-31, 10:41 authored by Chirag Shah, Bruce Croft, David Jensen
Several information organization, access, and filtering systems can benefit from different kind of document representations than those used in traditional Information Retrieval (IR). Topic Detection and Tracking (TDT) is an example of such an application. In this paper we demonstrate that named entities serve as better choices of units for document representation over all words. In order to test this hypothesis we study the effect of words-based and entity-based representations on Story Link Detection (SLD) - a core task in TDT research. The experiments on TDT corpora show that entity-based representations give significant improvements for SLD. We also propose a mechanism to expand the set of named entities used for document representation, which enhances the performance in some cases. We then take a step further and analyze the limitations of using only named entities for the document representation. Our studies and experiments indicate that adding additional topical terms can help in addressing such limitations.

History

Related Materials

  1. 1.
    DOI - Is published in 10.1145/1183614.1183771
  2. 2.
    ISBN - Is published in 9781595934338 (urn:isbn:9781595934338)

Start page

868

End page

869

Total pages

2

Outlet

Proceedings of the 15th ACM Conference on Information and Knowledge Management (CIKM 2006)

Editors

Philip S. Yu, Vassilis J. Tsotras, Edward A. Fox and Bing Liu

Name of conference

15th ACM Conference on Information and Knowledge Management (CIKM 2006)

Publisher

ACM

Place published

New York, USA

Start date

2006-11-06

End date

2006-11-11

Language

English

Copyright

© 2006 ACM

Former Identifier

2006024232

Esploro creation date

2020-06-22

Fedora creation date

2012-11-15

Usage metrics

    Scholarly Works

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC