Part of speech based term weighting for information retrieval
conference contribution
posted on 2024-10-31, 21:12 authored by Christina Lioma, Roi Blanco GonzalezAutomatic language processing tools typically assign to terms so-called 'weights' corresponding to the contribution of terms to information content. Traditionally, term weights are computed from lexical statistics, e.g., term frequencies. We propose a new type of term weight that is computed from part of speech (POS) n-gram statistics. The proposed POS-based term weight represents how informative a term is in general, based on the 'POS contexts' in which it generally occurs in language. We suggest five different computations of POS-based term weights by extending existing statistical approximations of term information measures. We apply these POS-based term weights to information retrieval, by integrating them into the model that matches documents to queries. Experiments with two TREC collections and 300 queries, using TF-IDF & BM25 as baselines, show that integrating our POS-based term weights to retrieval always leads to gains (up to +33.7% from the baseline).Additional experiments with a different retrieval model as baseline (Language Model with Dirichlet priors smoothing) and our best performing POS-based term weight, show retrieval gains always and consistently across the whole smoothing range of the baseline. © Springer-Verlag Berlin Heidelberg 2009.
History
Related Materials
- 1.
- 2. ISBN - Is published in 9783642009570 (urn:isbn:9783642009570)
Volume
5478 LNCSStart page
412End page
423Total pages
12Outlet
Proceedings of the 31st European Conference on IR Research (ECIR 2009)Editors
Mohand Boughanem, Catherine Berrut, Josiane Mothe, Chantal Soule-DupuyName of conference
LNCS 5478: Advances in Information RetrievalPublisher
SpringerPlace published
GermanyStart date
2009-04-06End date
2009-04-09Language
EnglishCopyright
© 2009 Springer-Verlag Berlin HeidelbergFormer Identifier
2006077312Esploro creation date
2020-06-22Fedora creation date
2017-09-13Usage metrics
Categories
Keywords
Licence
Exports
RefWorksRefWorks
BibTeXBibTeX
Ref. managerRef. manager
EndnoteEndnote
DataCiteDataCite
NLMNLM
DCDC

