Comparison of Free-text Tools
This table compares the CopyCatch Gold, Sherlock and
VAST/ PRAISE free-text detection tools.
In addition, the source code detection tools MOSS
and JPlag can also be used to
detect suspicious files written in natural language.
CopyCatch Gold, Sherlock and VAST/ PRAISE were chosen for the
comparison as they were all designed with the intention of detecting
collusion between documents in a set of submissions. None of them
offers any significant ability to detect plagiarism from web-based sources.
Note that CopyCatch Gold is a
commercial product while Sherlock and VAST/PRAISE are the result of
ongoing research and student project work at their respective universities
and are therefore not as ‘polished’ as fully
packaged software.
| |
CopyCatch Gold |
Sherlock |
VAST/PRAISE |
| Detection Modes |
Natural language |
Natural language, can also detect plagiarism in source code comments. |
Natural language |
| Supported File-Types |
Plain-text, HTML, Word and RTF files |
Plain-text, will work on markup languages (such as HTML) but effectiveness is lessened |
| Features |
Installed software |
Installed software |
Installed software or web applet |
| Campus licencing costs 10p per student per annum |
Free |
| GUI |
| Scales to large submissions |
| Detection Speed (on benchmark data set of 95 essays of 3000 words) |
25 secs |
1407 secs |
408 secs |
| Time to load results (same data set) |
Instantaneous |
87 secs per viewed pair |
| Parameters |
Adjustable similarity score threshold |
Many parameters for detection and parser. |
Metric to be used and chain length. |
| |
|
Adjustable results ‘filter’ |
|
| Security |
Runs locally |
| Requirements |
A recent Java Runtime Environment |
A recent JRE, but would not run on *nix systems |
| Source Data |
A directory that may contain subdirectories, each of these represents a data item |
| Result |
Side-by-side comparison of suspicious pairs, with similar text highlighted |
Similar to CopyCatch but suspicious text can be used as hyperlinks |
Similarity Matrix used to generate an image, dark areas can be selected and represented a high-degree of similarity. |
| Suspicious pairs ranked according to likelihood of collusion |
Documents assigned a score where a higher score indicates a higher likelihood of collusion |
Documents displayed as a configurable graph, edges between documents indicate similarity. |
| HTML output for exploring and understanding the similarities found in detail |
| Report contents |
Submissions overview |
Submissions overview for natural language |
|
| Ordered list |
Ordered list |
Matching pair graph |
| Individual matched pair frame |
| |
Un-informative similarities can be
‘ignored’ by results calculations. |
|
| Metrics produced |
Unclear how metrics are produced |
Relative score to other documents |
Percentage similarity |
| |
Statistical analysis of results |
|
| FAQ and support |
Minimal |
In-program help file |
Minimal |
| Algorithms |
Analyses structure of sentences, looking for similarities in certain places |
Looks for sentences that share an unusual amount of words |
Looks for chains of either words, characters or sentences. |