RMIT University
Browse

Efficient plagiarism detection for large code repositories

journal contribution
posted on 2024-11-01, 04:24 authored by Steven Burrows, Seyed Tahaghoghi, Justin Zobel
Unauthorized re-use of code by students is a widespread problem in academic institutions, and raises liability issues for industry. Manual plagiarism detection is time-consuming, and current effective plagiarism detection approaches cannot be easily scaled to very large code repositories. While there are practical text-based plagiarism detection systems capable of working with large collections, this is not the case for code-based plagiarism detection. In this paper, we propose techniques for detecting plagiarism in program code using text similarity measures and local alignment. Through detailed empirical evaluation on small and large collections of programs, we show that our approach is highly scalable while maintaining similar levels of effectiveness to that of the popular JPlag and MOSS systems.

History

Journal

Software: Practice and Experience

Volume

37

Start page

151

End page

175

Total pages

25

Publisher

John Wiley and Sons

Place published

Chichester

Language

English

Copyright

© 2006 John Wiley & Sons, Ltd.

Former Identifier

2006005765

Esploro creation date

2020-06-22

Fedora creation date

2011-01-07

Usage metrics

    Scholarly Works

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC