Plagiarism Prevention and Detection

Source Code Similarity Detection Tools

Many source-code similarity detection tools exist. Below are a selection and a brief description of the most popular ones. These tools aim to detect and point out similarities between source-code files. These similarities should be carefully investigated by the academic prior to taking actions for plagiarism against the students. The output provided by the tools could be used as evidence in the event that the academic decides to take matters further.

Free tools

JPlag promises to find ‘similarities among multiple sets of source code files’. JPlag was developed by Guido Malpohl in 1996. It currently supports Java, C#, C, C++, Scheme, and natural language text. JPlag is free but users are required to create an account. JPlag uses a variation of the Karp-Rabin comparison algorithm developed by Wise, but adds different optimizations for improving its run time efficiency.

MOSS (Measure Of Software Similarity) was developed by Alex Aiken in 1994. MOSS finds similarities in a number of different languages: C, C++, Java, Pascal, Ada, ML, Lisp, and Scheme programs. MOSS is a free service but the users must create an account.

Free open-source tools

Sherlock was developed at Warwick University’s Computer Science department. It is available as part of the BOSS Online Submission System or as a stand-alone application. It compares source-code, and natural language texts for similarity.

The PMD open source tool provides a Copy/Paste Detector (CPD) for finding duplicate code. CPD uses the Karp-Rabin string matching algorithm. It works with Java, JSP, C, C++, Fortan and PHP code. It also provides guidance on how to add other programming languages to the tool. Unlike JPlag, MOSS, and Sherlock this tool is not specifically aimed at detecting similarities in students’ work but works well in doing so. Similarly to JPlag, CPD uses a variation of the Karp-Rabin string matching algorithm developed by Wise. The developers of PMD provide excellent support and documentation for this tool. Because it is a duplicate code detector, this tool scans the files themselves for duplicate code, hence it returns similar code found within the same file. However, it is also successful in returning similar code across different files and can be used as a tool for detecting similarity in source-code files.

Commercial tools

CodeMatch is a commercial source-code plagiarism detector claiming to have a superior algorithm to the others listed here. CodeMatch currently supports the following programming languages: BASIC, C, C++, C#, Delphi, Flash ActionScript, Java, JavaScript, MASM, Pascal, Perl, PHP, PowerBuilder, Ruby, SQL, Verilog, VHDL.

Further information

Paul Clough (2000). Plagiarism in Natural and Programming Languages: an Overview of Current Tools and Technologies. University of Sheffield.

This report discusses in detail techniques used to hide plagiarism, legal aspects of proving that suspected plagiarism has taken place, and some of the tools available for detecting plagiarism. Although the report concentrates on text analysis, there is a large section devoted to source code analysis.

Mike Joy (2006) “Detecting Source-Code Plagiarism”. Presentation delivered at 7th HEA-ICS Annual Conference.

This presentation describes the results gathered from a survey on gathering the perspectives of academics on what constitutes source-code plagiarism. It summarises the main features of JPlag, MOSS, and Sherlock and provides print-screens illustrating the JPlag and Sherlock in action.

Lutz Prechelt, Guido Malpohl and Michael Philippsen (2000). JPlag: Finding Plagiarisms among a Set of Programs. Technical Report 2000-1, Fakultät für Informatik, Universität Karlsruhe.

This paper gives a detailed look at the algorithm used by the JPlag software as well as some thorough testing of the effectiveness of the software using different settings. This report also contains an extensive list of successful and unsuccessful ways in which students had tried to hide plagiarism when they were asked to modify a program for JPlag.