In this paper, we approach the task of na-
tive language identification in a realistic cross-
corpus scenario where a model is trained with
available data and has to predict the native lan-
guage from data of a different corpus. We have
proposed a statistical embedding representa-
tion reporting a significant improvement over
common single-layer approaches of the state
of the art, identifying Chinese, Arabic, and In-
donesian in a cross-corpus scenario. The pro-
posed approach was shown to be competitive
even when the data is scarce and imbalanced.
History
Related Materials
1.
ISBN - Is published in 9781948087247 (urn:isbn:9781948087247)