RMIT University
Browse

Capturing out-of-vocabulary words in Arabic text

conference contribution
posted on 2024-10-30, 16:41 authored by Abdusalam Nwesri, Seyed Tahaghoghi, Falk ScholerFalk Scholer
The increasing flow of information between languages has led to a rise in the frequency of non-native or loan words, where terms of one language appear transliterated in another. Dealing with such out of vocabulary words is essential for successful cross-lingual information retrieval. For example, techniques such as stemming should not be applied indiscriminately to all words in a collection, and so before any stemming, foreign words need to be identified. In this paper, we investigate three approaches for the identification of foreign words in Arabic text: lexicons, language patterns, and n-grams and present that results show that lexicon-based approaches outperform the other techniques.

History

Start page

258

End page

266

Total pages

9

Outlet

Proceedings of the 2006 conference on empirical methods in natural language processing (EMNLP 2006)

Editors

E. Ringger

Name of conference

Conference on Empirical Methods in Natural Language Processing

Publisher

Association for Computational Linguistics (ACL)

Place published

Australia

Start date

2006-07-22

End date

2006-07-23

Language

English

Copyright

© 2006 Association for Computational Linguistics

Former Identifier

2006001944

Esploro creation date

2020-06-22

Fedora creation date

2009-10-08

Usage metrics

    Scholarly Works

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC