RMIT University
Browse

Answering English queries in automatically transcribed Arabic speech

Download (197.11 kB)
conference contribution
posted on 2024-11-23, 02:08 authored by Abdusalam Nwesri, Seyed Tahaghoghi, Falk ScholerFalk Scholer
There are several well-known approaches to parsing Arabic text in preparation for indexing and retrieval. Techniques such as stemming and stopping have been shown to improve search results on written newswire dispatches, but few comparisons are available on other data sources. In this paper, we apply several alternative stemming and stopping approaches to Arabic text automatically extracted from the audio soundtrack of news video footage, and compare these with approaches that rely on machine translation of the underlying text. Using the TRECVID video collection and queries, we show that normalisation, stopword- removal, and light stemming increase retrieval precision, but that heavy stemming and trigrams have a negative effect. We also show that the choice of machine translation engine plays a major role in retrieval effectiveness.

History

Related Materials

  1. 1.
    ISBN - Is published in 0769528414 (urn:isbn:0769528414)

Start page

11

End page

16

Total pages

6

Outlet

Proceedings of the 6th IEEE International Conference on Computer and Information Science

Editors

R. Lee, M. Chowdhury, S. Ray & T. Lee

Name of conference

6th IEEE International Conference on Computer and Information Science

Publisher

IEEE

Place published

USA

Start date

2007-07-11

End date

2007-07-13

Language

English

Copyright

© 2007 IEEE

Former Identifier

2006006573

Esploro creation date

2020-06-22

Fedora creation date

2009-04-08

Open access

  • Yes

Usage metrics

    Scholarly Works

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC