RMIT University
Browse

Methods for collection and evaluation of comparable documents

chapter
posted on 2024-10-30, 20:45 authored by Monica Paramita, David Guthrie, Evangelos Kanoulas, Rob Gaizauskas, Paul Clough, Mark SandersonMark Sanderson
Considerable attention is being paid to methods for gathering and evaluating comparable corpora, not only to improve Statistical Machine Translation (SMT) but for other applications as well, e.g. the extraction of paraphrases. The potential value of such corpora requires efficient and effective methods for gathering and evaluating them. Most of these methods have been tested in retrieving document pairs for well resourced languages, however there is a lack of work in areas of less popular (under resourced) languages, or domains. This chapter describes the work in developing methods for automatically gathering comparable corpora from the Web, specifically for under resourced languages. Different online sources are investigated and an evaluation method is developed to assess the quality of the retrieved documents.

History

Related Materials

  1. 1.
    DOI - Is published in 10.1007/978-3-642-20128-8_5
  2. 2.
    ISBN - Is published in 9783642201271 (urn:isbn:9783642201271)

Start page

93

End page

112

Total pages

20

Outlet

Building and Using Comparable Corpora

Editors

S. Sharoff, R. Rapp, P. Zweigenbaum, P. Fung

Publisher

Springer

Place published

Berlin, Germany

Language

English

Copyright

© Springer-Verlag Berlin Heidelberg 2013

Former Identifier

2006044715

Esploro creation date

2020-06-22

Fedora creation date

2015-01-14

Usage metrics

    Scholarly Works

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC