RMIT University
Browse

The significance of user-defined identifiers in Java source code authorship identification

journal contribution
posted on 2024-11-01, 15:04 authored by Georgia Frantzeskou, Stephen MacDonell, Efstathios Stamatatos, Stelios GeorgiouStelios Georgiou, Stefanos Gritzalis
When writing source code, programmers have varying levels of freedom when it comes to the creation and use of identifiers. Do they habitually use the same identifiers, names that are different to those used by others? Is it then possible to tel I who the author of a piece of code is by examining these identifiers? If so, can we use the presence or absence of identifiers to assist in correctly classifying programs to authors? Is it possible to hide the provenance of programs by identifier renaming? In this study, we assess the importance of three types of identifiers in source code author classification for two different Java program data sets. We do this through a sequence of experiments in which we disguise one type of identifier at a time. These experiments are performed using as a tool the Source Code Author Profiles (SCAP) method. The results show that, although identifiers when examined as a whole do not seem to reflect program authorship for these data sets, when examined separately there is evidence that class names do signal the author of the program. In contrast, simple variables and method names used in Java programs do not appear to reflect program authorship. On the contrary, our analysis suggests that such identifiers are so common as to mask authorship. We believe that these results have applicability in relation to the robustness of code plagiarism analysis and that the underlying methods could be valuable in cases of litigation arising from disputes over program authorship.

History

Journal

International Journal of Computer Systems Science and Engineering

Volume

26

Issue

2

Start page

139

End page

148

Total pages

10

Publisher

CRL Publishing

Place published

United Kingdom

Language

English

Copyright

© 2011 CRL Publishing Ltd.

Former Identifier

2006042160

Esploro creation date

2020-06-22

Fedora creation date

2013-09-23

Usage metrics

    Scholarly Works

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC