Plagiarism Prevention and Detection

On-line Resources on Source Code Plagiarism

The following are a selection of articles available online which discuss source-code plagiarism, and tools for detecting plagiarism.

Aleksi Ahtiainen, Sami Surakka and Mikko Rahikainen (2007). “Plaggie: GNU-licensed Source Code Plagiarism Detection Engine for Java Exercises”. In Proceedings of the 6th Baltic Sea Conference on Computing Education Research, Koli Calling, pp. 141-142.

This short paper presents an open-source plagiarism detection engine called Plaggie. It is a stand-alone Java application that can be used to check Java programming exercises for plagiarism.

Christian Arwin and S.M.M. Tahaghoghi (2006). “Plagiarism detection across programming languages”. In Proceedings of the 29th Australasian Computer Science Conference, pp. 277-286.

This paper discusses various source-code plagiarism detection tools and proposes a plagiarism detection system called XPlag. The authors compare XPlag with JPlag.

B. Belkhouche, Anastasia Nix and Johnette Hassell (2004). “Plagiarism detection in software designs”. In Proceedings of the 42nd Annual Southeast Regional Conference, pp. 207-211.

This paper describes a framework for detecting plagiarism in software designs, which compares designs using multi--level abstractions of the design (instead of code).

Georgina Cosma and Mike Joy (2006a). “Source-code Plagiarism: a UK Academic Perspective”. In Proceedings of the 7th Annual Conference of the HEA Network for Information and Computer Sciences.

Georgina Cosma and Mike Joy (2006b) Source-code Plagiarism: A UK Academic Perspective. Research Report No. 422. Department of Computer Science, University of Warwick.

An on-line questionnaire was distributed to a list of academics supplied by the Higher Education Academy Subject Centre for Information and Computing Sciences (HEA-ICS). The responses of 59 responses from academics across 31 were English universities and 3 Scottish universities are analysed. The purpose of the survey was to establish what is understood to constitute source-code plagiarism in an undergraduate context. The responses to the survey revealed a wide agreement between academics on the issue of what can constitute source-code plagiarism, and some controversial responses were expressed on issues surrounding source-code reuse, and self-plagiarism. The first paper (2006a) suggests an adjustable definition of what can constitute source-code plagiarism from a UK academic perspective. The second document (2006b) contains a detailed (and lengthy) report of all the survey findings.

Fintan Culwin, Anna MacLeod and Thomas Lancaster (2001).Source Code Plagiarism in UK HE Computing Schools, Issues, Attitudes and Tools . London: South Bank University SCISM Technical Report SBU-CISM-01-01.

This report forms part of the JISC Committee for Integrated Environments for Learners (JCIEL) electronic plagiarism detection project. It contains details of a survey about attitudes in UK higher education computing departments towards source-code plagiarism in programming assignments as well as a basic analysis of the two most widely used source-code plagiarism detection systems (MOSS and JPlag). The survey indicates that in general computing departments are aware of plagiarism in assignments but only 17 of the 54 departments surveyed use an automated service while 18 of 54 make no effort to detect plagiarism.

Paul Clough (2000). Plagiarism in Natural and Programming Languages: an Overview of Current Tools and Technologies. University of Sheffield.

This report discusses in detail techniques used to hide plagiarism, legal aspects of proving that suspected plagiarism has taken place, and some of the tools available for detecting plagiarism. Although the report concentrates on text analysis, there is a large section devoted to source code analysis.

David Gitchell and Nicholas Tran (1999). “Sim: A Utility for Detecting Similarity in Computer Programs”. In the Proceedings of the Thirtieth SIGCSE Technical Symposium on Computer Science Education, pp. 266-270.

This paper describes the design and implementation of a program called sim to measure similarity between two C computer programs. It claims to be useful for detecting plagiarism among a large set of homework programs.

Kristina Verco and Michael J. Wise (1996). “Software for Detecting Suspected Plagiarism: Comparing Structure and Attribute-Counting Systems”. In Proceedings of the 1st Australasian conference on Computer Science Education, pp. 81-88.

Comparison between attribute-counting-metric and structure-metric systems for detecting source code plagiarism. The authors’ YAP3 is compared to Sim and Plague.

Michael J. Wise (1996). “YAP3: Improved Detection of Similarities in Computer Program and Other Texts”. In Proceedings of the Twenty-Seventh SIGCSE Technical Symposium on Computer Science Education pp. 130-134.

This paper describes YAP3, a tool for detecting similarity in computer programs and other texts submitted by students. The underlying algorithm behind YAP3, called the Running-Karp-Rabin Greedy-String-Tiling (or RKS-GST), is described.

Sebastian Niezgoda and Thomas P. Way (2006). “SNITCH: A Software Tool for Detecting Cut and Paste Plagiarism”. In Proceedings of the 37th SIGCSE Technical Symposium on Computer Science Education (SIGCSE ’06), pp. 51-55.

This paper describes the design of a software tool called SNITCH that implements a plagiarism detection algorithm using the Google Web API. It also discusses issues related to plagiarism detection software.

Samuel Mann and Zelda Frew (2006). “Similarity and Originality in Code: Plagiarism and Normal Variation in Student Assignments”. In Proceedings of the 8th Australasian Conference on Computing Education pp. 143-150.

This paper examines the relationship between plagiarism and normal variation in student programming assignments. It outlines reasons why code might be similar. The authors also investigate similarity into students’ source-code.

Lutz Prechelt, Guido Malpohl and Michael Philippsen (2000). JPlag: Finding Plagiarisms among a Set of Programs. Technical Report 2000-1, Fakultät für Informatik, Universität Karlsruhe.

This paper gives a detailed look at the algorithm used by the JPlag software as well as some thorough testing of the effectiveness of the software using different settings. The report also contains the results of testing on the software carried out by the authors, using sets of work where they know the amount of plagiarism as well as a set where students have deliberately tried to create a plagiarised piece of work that can ‘beat’ JPlag. The aim of these tests is to see how well JPlag detects plagiarism but also to establish useful default settings for the software. Section 4 of the report has an extensive list of successful and unsuccessful ways in which students had tried to hide plagiarism when they were asked to modify a program for JPlag.

Saul Schleimer, Daniel S. Wilkerson and Alex Aiken (2003). “Winnowing: Local Algorithms for Document Fingerprinting. In Proceedings of the 22nd ACM SIGMOD International Conference on Management of Data pp. 76-85.

This paper describes a local document fingerprinting algorithm for detecting similarities in source-code. A series of experiments are performed with the winnowing algorithm, and the authors also report on the experimental results gathered from the MOSS plagiarism detection service.

Peter Vamplew and Julian Dermoudy (2005). “An Anti-plagiarism Editor for Software Development Courses”. In Proceedings of the 7th Australasian Conference on Computing Education pp. 83-90.

This paper describes an anti-plagiarism approach which considers the process of producing source-code, rather than just the source-code itself. It describes a text editor and related software implemented on the Eclipse development environment.

Paul D. Wiedemeier (2002). “Preventing Plagiarism in Computer Literacy Courses”. Journal of Computing Sciences in Colleges 17(4) pp. 154-163.

This paper presents a method for preventing plagiarism in computer literacy courses, describes an experiment and reviews the results of the experiment. The second part of the report compares the functionality of the MOSS (Measure of Software Similarity) and JPlag source-code plagiarism detection systems.