RMIT University
Browse

Application of information retrieval techniques for source code authorship attribution

conference contribution
posted on 2024-10-31, 09:23 authored by Steven Burrows, Sandra UitdenbogerdSandra Uitdenbogerd, Andrew Turpin
Authorship attribution assigns works of contentious authorship to their rightful owners solving cases of theft, plagiarism and authorship disputes in academia and industry. In this paper we investigate the application of information retrieval techniques to attribution of authorship of C source code. In particular, we explore novel methods for converting C code into documents suitable for retrieval systems, experimenting with 1,597 student programming assignments. We investigate several possible program derivations, partition attribution results by original program length to measure effectiveness of modest and lengthy programs separately, and evaluate three different methods for interpreting document rankings as authorship attribution. The best of our methods achieves an average of 76.78% classification accuracy for a one-in-ten classification problem which is competitive against six existing baselines. The techniques that we present can be the basis of practical software to support source code authorship investigations

History

Related Materials

  1. 1.
    DOI - Is published in 10.1007/978-3-642-00887-0_61
  2. 2.
    ISSN - Is published in 03029743

Start page

699

End page

713

Total pages

15

Outlet

Proceedings 14th International Conference on Database Systems for Advanced Applications

Editors

Xiaofang Zhou, Haruo Yakota, Qing Liu

Name of conference

14th International Conference on Database Systems for Advanced Applications

Publisher

Springer

Place published

Germany

Start date

2009-04-21

End date

2009-04-23

Language

English

Copyright

© Springer-Verlag Berlin Heidelberg 2009

Notes

Lecture Notes in Computer Science ; volume 5463.

Former Identifier

2006016501

Esploro creation date

2020-06-22

Fedora creation date

2011-05-26

Usage metrics

    Scholarly Works

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC