Plagiarism Prevention and Detection

Comparison of Free-text Tools

This table compares the CopyCatch Gold, Sherlock and VAST/ PRAISE free-text detection tools. In addition, the source code detection tools MOSS and JPlag can also be used to detect suspicious files written in natural language.

CopyCatch Gold, Sherlock and VAST/ PRAISE were chosen for the comparison as they were all designed with the intention of detecting collusion between documents in a set of submissions. None of them offers any significant ability to detect plagiarism from web-based sources.

Note that CopyCatch Gold is a commercial product while Sherlock and VAST/PRAISE are the result of ongoing research and student project work at their respective universities and are therefore not as ‘polished’ as fully packaged software.

  CopyCatch Gold Sherlock VAST/PRAISE
Detection Modes Natural language Natural language, can also detect plagiarism in source code comments. Natural language
Supported File-Types Plain-text, HTML, Word and RTF files Plain-text, will work on markup languages (such as HTML) but effectiveness is lessened
Features
Installed software
Installed software
Installed software or web applet
Campus licencing costs 10p per student per annum
Free
GUI
Scales to large submissions
Detection Speed (on benchmark data set of 95 essays of 3000 words)
25 secs
1407 secs
408 secs
Time to load results (same data set)
Instantaneous
87 secs per viewed pair
Parameters Adjustable similarity score threshold Many parameters for detection and parser. Metric to be used and chain length.
    Adjustable results ‘filter’  
Security
Runs locally
Requirements
A recent Java Runtime Environment
A recent JRE, but would not run on *nix systems
Source Data
A directory that may contain subdirectories, each of these represents a data item
Result
Side-by-side comparison of suspicious pairs, with similar text highlighted
Similar to CopyCatch but suspicious text can be used as hyperlinks Similarity Matrix used to generate an image, dark areas can be selected and represented a high-degree of similarity.
Suspicious pairs ranked according to likelihood of collusion Documents assigned a score where a higher score indicates a higher likelihood of collusion Documents displayed as a configurable graph, edges between documents indicate similarity.
HTML output for exploring and understanding the similarities found in detail
Report contents Submissions overview Submissions overview for natural language  
Ordered list Ordered list Matching pair graph
Individual matched pair frame
  Un-informative similarities can be ‘ignored’ by results calculations.  
Metrics produced Unclear how metrics are produced Relative score to other documents Percentage similarity
  Statistical analysis of results  
FAQ and support Minimal In-program help file Minimal
Algorithms Analyses structure of sentences, looking for similarities in certain places Looks for sentences that share an unusual amount of words Looks for chains of either words, characters or sentences.