Plagiarism Prevention and Detection

Feature Comparison of Source Code Tools

This table compares the features offered by various tools that aid with the detection of source-code plagiarism.

  JPlag Moss Sherlock CodeMatch Copy/Paste Detector (CPD)
System Started from 1997 1994 1994 2005 2003
Language (s) supported Java, C#, C, C++, Scheme and natural language text. C, C++, Java, C#, Python, Visual Basic, Javascript, FORTRAN, ML, Haskell, Lisp, Scheme, Pascal, Modula2, Ada, Perl, TCL, Matlab, VHDL, Verilog, Spice, MIPS assembly, a8086 assembly, a8086 assembly, MIPS assembly, HCL2. Java, C, C++, natural language text. BASIC, C, C++, C#, Delphi, Flash ActionScript, Java, JavaScript, MASM, Pascal, Perl, PHP, PowerBuilder, Ruby, SQL, Verilog, VHDL. Java, JSP, C, C++, Fortan and PHP code
Cost Free but user must create an account Free but user must create an account Free and open sourced Commercial tool, free on any code where the total of all files being examined is less than 1 megabyte Free and open sourced
Service Web service Internet service Standalone application Standalone application Standalone application
Interface GUI Web interface GUI GUI GUI
Requirements Web browser, Java Runtime Environment (JRE), Java 1.5 or higher A submission script for either UNIX or Windows JDK 1.4 or later   JDK 1.4 or later
Security User id and e-mail needed User id and e-mail needed Runs locally Runs locally Runs locally
Submission Methods Standalone Java software application that can be deployed over the network Command line Standalone Java application Standalone application Standalone application
Source Data Files, or a directory with or without subdirectories Files, or a directory with or without subdirectories A directory containing subdirectories, each containing one or more zip archives or submissions Files, or a directory with or without subdirectories Files, or a directory with or without subdirectories
Option for excluding files Yes Yes Yes Yes No
Speed Fast Fast Large sets of files will require long amounts of time to analyze Large sets of files will require long amounts of time to analyze Fast
Result
Result Output HTML output for exploring the similar code fragments found Both text and HTML reports Interactive dialogues for comparing detected similarities and printing reports HTML report, and spreadsheet showing statistical information about the files that were analysed Text report, xml file, cvs file.
Result storage Local Remote Local Local Local
Overview results display method? Histogram, statistics about the files that were analysed Ordered list Matching pair tree Ordered list Ordered list
Display of results Powerful graphical interface for presenting results Powerful graphical interface for presenting results Powerful graphical interface for presenting results Graphical user interface presenting the results Results displayed in a dialogue
Visualisation of results Cross linked listings, scroll bars   2/4 scroll bars, simultaneous scrolling Table showing the similar lines of code detected among file pairs. List of results showing the detected code fragments between file pairs.
Visual display of matched file pairs? Yes Yes Yes No No
Display of suspicious code-fragments? Highlights the suspicious source-code fragments Highlights the suspicious source-code fragments Highlights the suspicious source-code fragments Returns suspicious lines of code, no option to view entire suspicious code fragments or the entire suspicious files Returns duplicate lines of code
Metrics produced Percentage similarity, token matches Percentage similarity, token matches, lines matched Percentage similarity, token matches Percentage similarity, lines matched Token matches, lines matched
Other
Other features?     Checkbox to mark the suspicious pairs    
Miscellaneous Several detection parameter settings, including sensitivity Self adjusting Several detection parameter settings, including sensitivity Some detection parameter settings Some detection parameter settings
Algorithms Greedy String Tiling Winnowing algorithm Token based matching algorithm for source-code, string matching for natural language texts String matching Greedy String Tiling