RMIT University
Browse

Melbourne at SemEval 2016 Task 11: Classifying Type-level word complexity using random forests with corpus and word list features

conference contribution
posted on 2024-10-31, 19:44 authored by Julian Brooke, Timothy Baldwin, Sandra UitdenbogerdSandra Uitdenbogerd
SemEval 2016 task 11 involved determining whether words in a sentence were complex or simple for a cohort of people with English as a second language. Training data consisted of 200 annotated sentences, representing the combined judgements of 20 human annotators, such that if any annotator of the group labelled a word as complex, then it was considered to be complex. Testing was based on single annotator judgements. Our system used a random forest classifier with a variety of features, the most important of which were term frequency statistics garnered from four large corpora, and style lexicons built on two large corpora. Minor features in the final system include the presence or absence of words in various readability word lists; many other features we tried were not successful. Our ranking amongst submitted systems did not reflect the strength of our system, due to submitting a far from optimal weighting between complex and simple, but we show that when a more appropriate weighting is used, our system ranks amongst the best submitted systems.

History

Related Materials

  1. 1.
    DOI - Is published in 10.18653/v1/s16-1150
  2. 2.
    ISBN - Is published in 9781941643952 (urn:isbn:9781941643952)

Start page

885

End page

891

Total pages

7

Outlet

Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval 2016)

Name of conference

10th International Workshop on Semantic Evaluation (SemEval-2016)

Publisher

Association for Computational Linguistics (ACL)

Place published

Stroudsburg, PA, United States

Start date

2016-06-16

End date

2016-06-17

Language

English

Copyright

© 2016 Association for Computational Linguistics

Former Identifier

2006063210

Esploro creation date

2020-06-22

Fedora creation date

2016-07-13

Usage metrics

    Scholarly Works

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC