Plagiarism Prevention and Detection

Demonstration of PMD’s Copy/Paste Detector (CPD)

The PMD open source tool provides a Copy/Paste Detector (CPD) tool for finding duplicate code. CPD uses the Karp-Rabin string matching algorithm. It works with Java, JSP, C, C++, Fortan and PHP code. It also provides guidance on how to add other programming languages to the tool. Unlike JPlag, and Sherlock this tool is not specifically aimed at detecting similarities in students’ work may work well in doing so.

Similarly to JPlag, CPD uses a variation of the Karp-Rabin string matching algorithm developed by Wise. The developers of PMD provide excellent support and documentation for this tool. Because it is a duplicate code detector, this tool scans the files themselves for duplicate code, hence it returns similar code found within the same file. However, it is also successful in returning similar code across different files and can be used as a tool for detecting similarity in source-code files.

Initially the user selects the directory in which the files to be compared reside. The user can select the programming language, and also the minimum number of tokens to be detected (see Figure 1).

Figure 1: PMD’s CPD tool initial screen

All the results are displayed in a single window, as shown in Figure 2. Note that amongst the list of results CPD has detected duplicate lines of code within the same file. However, since this tool is open source it could be modified such that results are only not displayed for those files. This is a very simple tool to use that appears to work very fast and well in detecting similar code-fragments between two files. The results state the length of the duplicated code detected as number of lines and tokens, and a display of the lines found.

Figure 2: CPD’s results window

User can choose to save the results as .txt, .xml, or .csv as shown in Figure 3.

Figure 3: CPD’s results saving options

Clearly, the user interface is not as advanced or interactive as the other tools we have discussed, however this is a handy tool to have.