RMIT University
Browse

Cross-corpus Native Language Identification via Statistical Embedding

conference contribution
posted on 2024-11-03, 12:26 authored by Francisco Rangel, Paolo Rosso, Sandra UitdenbogerdSandra Uitdenbogerd, Julian Brooke
In this paper, we approach the task of na- tive language identification in a realistic cross- corpus scenario where a model is trained with available data and has to predict the native lan- guage from data of a different corpus. We have proposed a statistical embedding representa- tion reporting a significant improvement over common single-layer approaches of the state of the art, identifying Chinese, Arabic, and In- donesian in a cross-corpus scenario. The pro- posed approach was shown to be competitive even when the data is scarce and imbalanced.

History

Related Materials

  1. 1.
    ISBN - Is published in 9781948087247 (urn:isbn:9781948087247)
  2. 2.

Start page

39

End page

43

Total pages

5

Outlet

Proceedings of the Second Workshop on Stylistic Variation

Name of conference

The Second Workshop on Stylistic Variation (at NAACL)

Publisher

Association for Computational Linguistics

Place published

United States

Start date

2018-06-05

End date

2018-06-05

Language

English

Copyright

© 2018 The Association for Computational Linguistics

Former Identifier

2006088962

Esploro creation date

2020-06-22

Fedora creation date

2019-02-21