posted on 2024-10-31, 09:21authored byDavid Urbansky, Marius Feldmann, James Thom, A Schill
This paper describes a system for entity extraction from the web. The system uses three different extraction techniques which are tightly coupled with mechanisms for retrieving entity rich web pages. The main contributions of this paper are a new entity retrieval approach, a comparison of different extraction techniques and a more precise entity extraction algorithm. The presented approach allows to extract domain-independent information from the web requiring only minimal human effort.
History
Start page
209
End page
218
Total pages
10
Outlet
Proceedings of the AWIC'09 6th Atlantic Web Conference