![]() |
Plagiarism Prevention and DetectionDemonstration of SherlockThe Sherlock plagiarism detection tool was developed at Warwick University’s Computer Science department. It is available as part of the BOSS Online Submission System or as a stand-alone application. It compares source-code, and natural language texts for similarity. Process of detecting plagiarism using SherlockThe user starts the plagiarism detection process by selecting the directory in which the files to be compared reside (see Figure 1).
Figure 1: Sherlock initial screen The user can choose to modify the pre-processing settings. Figure 2 shows some of the settings that can be modified. More details about the pre-processing settings can be found in the Sherlock documentation. We have used the default pre-processing settings.
Figure 2: Sherlock pre-processing settings Once the desired settings are set the user can start the detection process by pressing the ‘OK’ button. Figure 3 below shows the screen displayed during comparison.
Figure 3: Detection process message window Once the comparison process is over, the user can then examine the matches (see Figure 4).
Figure 4: Process finished screen We have selected the ‘Examine stored matches’ option. An extract of the results is shown in Figure 5.
Figure 5: Sherlock results The user can click on each match, shown in Figure 5, to view a list of the similar code fragments detected. Now looking at the results for the detected file pair 39 and 27 we see that Sherlock has detected 3 suspicious code fragments shown in the highlighted lines in Figure 6. Double clicking on the row representing the code to be examined will display the comparison window, see Figure 7.
Figure 6: Suspicious fragments Each suspicious section is clearly marked and the user can view the original and pre-processed files simultaneously. For example, the top half of the screen shows the original files and the bottom half shows just the code in an easy to compare display. In addition the figure on the left points out in yellow the suspicious code fragments detected.
Figure 7: Comparison window Sherlock also allows the user to view the tokenised file. A very useful feature in Sherlock is that it has the option for the academic to mark the viewed files as ‘suspicious’ or ‘innocent’. None of the other tools discussed (JPlag and CodeMatch) provide such a feature. This is a very useful to the academic regarding evidence gathering.
Figure 8: Comparison window Once the user compares the suspicious files detected, s/he can mark the suspicious files ones and Sherlock provides a facility for printing the file pairs the user has marked (see Figure 9).
Figure 9: Marking the suspicious code fragments The percentage column shows how long the suspicious lines of code is as a percentage of the whole file size, for example, if the two files are exactly the same, the percentage figure should be 100%. The length of each match is represented by their line numbers. The user can mark the suspicious pairs individually or as a group using the checkbox on the right of the screen.
Figure 10: Option to print out file pairs marked as suspicious Sherlock also provides a visual display of the results. The user can choose to explore the results using a graph by selecting the View matches graph option shown in Figure 4. Files can be selected for viewing based on their similarity threshold, and can navigate to the code by selecting the desired option as displayed below. Each node represents one submission and each edge represents at least one match between two submissions. The colour of the connecting edges is related to the summed percentage figure over all the matches found for the pair of files being examined.
Figure 11: Sherlock’s Interactive Matches Graph The user can select the desired code fragments to view and a window will pop up displaying the code in question (see Figure 12).
Figure 12: Comparison window Sherlock was developed and designed with the academic in mind. It provides a straightforward interface but also allows the user to select from various pre-processing settings. Importantly, the results are easily interpreted and suspicious code fragments are clearly marked and displayed. Sherlock also provides a facility for making the evidence gathering process easy for the academic. In addition, it is open-sourced which allows programming academics to modify the tool to their liking. |













