Anti-anti-plagiarism
Performing plagiarism in university assignment submissions is cheating. In most universities, it is a serious academic offence. If undetected, plagiarism devalues the assessment, is unfair to honest students and damages the reputation of the university. Steps should be taken to detect plagiarism then, right? Well, I know one popular detection implementation which produces in me only great ire.
My university has introduced the TurnItIn system to its assessment program. Only disappointed and saddened at first, I took little notice knowing I'll be out of the university soon, and unlikely to be affected by it all in my Engineering/Maths degrees. Unfortunately, TurnItIn and I were forced into contact last semester, and since then I've cursed and discussed and complained and researched the topic. This blog is an attempt to contain the story and the arguments in a rational fashion, while being unashamedly provocative and argumentative.
To begin with, this news article was posted on SlashDot some time ago and is a decent starting point to the drama.
Read on for my fuelled extrapolation.
TurnItIn claims to be the "standard in online plagiarism prevention". A university can purchase the services of TurnItIn, who will accept submissions of literary works from students on behalf of a university's lecturer, and perform a matching analysis on a database of existing works. A lecturer can then retrieve the plagiarism report from TurnItIn.
At the start of 2004 I enrolled in the Computer Networks subject. I've already blogged about my, mostly unrelated, frustrations with the subject and will attempt to keep those seperate. Each assessment in the subject (two Java programming assignments and one essay) has required a cover sheet to be submitted along with the assignment. The cover sheets are a Microsoft Word document which must be filled in with a repeat of things like the assignment title, the subject name, my name, and other redundant information. The significant part of course, is the declaration at the bottom, where I acknowledge that my work can be submitted to an online plagiarism detection service, and that service may keep my work to improve their system. Submitting the cover sheet electronically (these were programming assignments after all!) constituted an electronic signature - proof enough that I agreed with the statement. Of course, these cover sheets are compulsory (the assignments would not be marked without them) and served only to cover the University legally. As the lecturer admitted, they were not even read (I submitted the same cover for the first two assignments because I did not want to track down a copy of Word again, and it was never noticed), but simply had to be present for assessment to take place.
I want to first, quickly point out the legalities surrounding the TurnItIn service and these cover sheets. TurnItIn actually provides just the background we need. On their legal page, they describe the various American laws they've complied with (this is an American company we are dealing with after all), and then provides a link to a legal opinion from their Australian law representatives. And that is where the fun begins. The law firm, Blake Dawson Waldron, have done a commendable job in satisfying iParadigms (the owners of the TurnItIn service), given the terribly insecure legal platform TurnItIn operates on. Unfortunately, they still admit that it is "not completely inconceivable that a Court would consider that the use of the Turnitin system by a subscriber to the service in Australia would infringe a student's copyright in their research papers.". They go on to admit that the Copyright Act in Australia grants to owner of copyright the exclusive right to "reproduce the work in a material form" and "communicate the work to the public". Read it all yourself (point 18 in the Summary), but basically the use of a student's work by TurnItIn to generate income "confers a commercial advantage on Turnitin without any compensation being made by Turnitin to the owners of copyright in those works". The legal advice goes on to say that there is doubt as to the legalities of such a service, and "any doubt could ... be removed by informing the students in advance" and requiring that they consent to the submission. Makes sense, doesn't it? If a company is going to make a business off the copyright works of an individual, it would seem appropriate to request consent from the individual. The legal advice goes further:
it would be wise to leave students in this situation with an ability to withhold their consent, leaving open the question of how the subscriber should deal with the consequences of an individual student refusing to allow his or her paper to be submitted to the service.
I could not have said it better myself. The advice continues with Supplementary Advice describing claims by Lachlan William (University of Melbourne Postgraduate Association president) that the treatment of copyright by TurnItIn does not stack up. The advice repeats again that legal action against TurnItIn is "highly unlikely" (a common phrase in the advice), leaving me wondering just what is stopping that action. One possibility raised by the advice is that "any activities undertaken by Turnitin which might constitute an 'adaptation' would take place in Oakland, California, and infringement under the Act only extends to acts which take place within Australia.". Saved! I know a few pornography and warez sites similarly grasping onto the loopholes in International Copyright Law.
There's plenty more juicy elements in the legal advice, but I'll leave you to your own reading. One last tidbit is the legal firm's "understanding that the Turnitin system evaluates the originality of a student essay by applying mathematical algorithms to produce a digital 'code' or fingerprint of the written material in an electronically submitted research paper.". Blake Dawson Waldron may have their heads around copyright law (which I never imagine I will achieve) but it seems iParadigms hasn't quite educated them enough on word matching systems. The only digital "code" TurnItIn would use is ASCII - it's a word matching system, and so retains the words of submitted documents. There is no escaping copyright by obfuscating digital reproduction with terms like "mathematical algorithms" and "digital code".
That's enough legal background for now. There's plenty more analysis available on the legal issues, but the most significant point to this blog is actually the presentation of my arguments against the system. Legalese is important background and available elsewhere, so lets begin.
- No Choice. I'll start with the point most relevant to the legal extracts above. I had no say in the use of TurnItIn. Getting an academic result in the subject (after I had incurred my $500 odd HECS tag) required submission of the cover sheet. Having my work assessed required that I turn copyright of my original work over to an American company with whom I had no trust relationship. I do not appreciate suddenly (by suddenly I mean for the first time since I began University) being forced relinquish the expectation of reasonable containment of my own work within the University assessment system. No longer can I be confident that my work is used only for academic assessment purposes - now I must give it up for use in an American company.
- Guilty until proven innocent. I do not condone plagiarism, and this is a point I really need to stress. Anti-plagiarism is not my enemy, bad implementation of anti-plagiarism is. I am honest in my university submissions, and have the maturity to understand the benefits of producing original work. In fact, I feel I am part of that dying breed who consider a University degree a journey of learning rather than a path to a degree. Completing my own assignments educates me in some way at least, and even though plagiarism may be less work and possibly better marks, it is an option I detest. The point is, I do not conduct plagiarism on my assessments because I feel it is wrong. That worked well for me. The plagiarists around me struggled continually when they could not copy - they would be caught out as soon as they were presented with an assessment task (such as an exam or well formed assignment question) which required their own work. And now I'm part of that plagiarist group until I prove otherwise. It irks me terribly to have the burden of proof suddenly placed on me, to prove my innocence, despite having done nothing wrong. In fact, I am required to give up my copyrights in return for the possibility of being falsely accused of plagiarism.
- Just who is TurnItIn? This point was foreshadowed a little earlier - I have no trust relationship with TurnItIn. Having been forced to turn my work over to their system, my copyright is completely in their hands. If they decide to sell my work to a competitor or a successor, I have no say in it - I already "agreed" to their use of it. If they decide to publish the work, there's nothing I can do. If I write a controversial essay and TurnItIn or a partner has control of it, why would I trust TurnItIn to keep it from those that should not see it? If I ever write a decent article, why would TurnItIn not sell it as their own? In fact, in a small way, iParadigms are already making money off my original work. They sell a system supported by a database of original works written by myself and my collegues. Good for them for creating their business, but why the hell am I forced to support it with my work? I feel most uncomfortable to assume all liability for the misuse of my work by a third party because I am the one that has to sign the agreement. I would prefer a chance to review the company and express my freewill in accepting their terms and acknowledging my risk.
- Why bother coming up with new assessments? Now this point is a little different in that it considers a case somewhat evitable. Nonetheless, these are real concerns which I feel are overlooked. Before automatic anti-plagiarism, there was encouragement for assessors to set original, relevant and specific assessments, because students would be forced to produce original work. Rip-offs are a lot easier to spot when the assessment is original. Reproduce the same turn-key assignments year after year and you are going to tempt a lot of cheaters. Forcing anti-plagiarism on students is not the answer to a worthwhile education - setting creative, contemporary (if appropriate), considered assessments relevant to the specific course material is the way to assess learning and competence, and also makes it very impractical to submit plagiarised work. Perhaps it is my own growing awareness contributing to the effect, but I have seen a small but palpable trend towards laziness in assessment over my university degree. I fear this sort of anti-plagiarism system will only contribute to that trend.
- I am inconvenienced to suit your new system. I work hard to produce an original work, and am ready to submit it for assessment. It is rude and most inconvenient to then find that I have to sign up for some American web site to submit my work to my Australian university. I have to learn a new interface and conform to an alien set of formats and submission criteria, and I need to consume international Internet bandwidth conforming to your new system. I need to put my personal details and email address into another company's database. I cannot get technical help when the American website breaks on my setup, except by contacting the American website. It is most inconvenient for me to liaison with this third party, communicating across the world with a company I have no previous or requested affiliation with, to satisfy your new system. I could not even revise the title (which I made a mistake on - not realising how the system worked) of my assignment after I had submitted a copy because the external system only accepts a one-time submission. I understand there is a certain considerable burden in detecting plagiarism by human assessment, but I feel a discussion was overlooked when the decision was made to instead inconvenience the hundreds of students trying to submit their work.
- How do we conform to our new plagiarism robot? It is perhaps very relevant to mention I am a student of technical subjects. I do not have the experience that a good majority of the university population has in conforming to referencing standards and obeying rules for attribution to existing sources. We tend to write technical documents - programming code, tables, graphs, charts, diagrams - the kind of things where quoting and referencing is sometimes less important or altogether inapplicable. Perhaps then, I had more of a learning curve as to methods of reference and defence from plagiarism accusations than the majority of the university. That aside, I still had no instruction on how we might safely use material from other sources, while avoiding the plagiarism tag. Previously, I was happy to simply produce my own work, drawing and quoting existing works where appropriate. Now my work is reviewed by a machine which will identify matching phrases as possible plagiarism. Is the system aware of my quoting methods, or my reference methods? If part of my bibliography is the same as another paper on the same topic, is that going to flag as "75% likelihood of plagiarism"? How exactly, do I escape suspicion when dealing with this plagiarism machine, if I can no longer rely on objective (that is, without influence by a coloured report and some statistics) human analysis?
- False positives are a real possibility. Occasionally I write about the things I am interested in. This blog is the perfect example, but others include posts I have made on news groups, papers I have submitted through work, software I have developed privately and made open source, essays I have submitted to competitions and creative art sites and even university assignments. Sometimes, I submit these works anonymously or under an alternative moniker. This is the case in particular, with a lot of the technical works I have authored - lots of people participate in technical forums under alternate nicknames or handles. An anti-plagiarism system like TurnItIn works by collecting works from just these sorts of sources, and comparing it to the work I submit for a subject. If I have already written an essay or discussed a subject which is presented as an assessment item (see the prior point about unoriginal assessment tasks), and am happy with that work applying to the current subject, I would be stupid not to reuse that work. Of course, that work would then (if the anti-plagiarism system is working as it claims) flag my work as plagiarised. I then have the hassle of proving that my original work is my original work.
TurnItIn and similar services will undoubtedly attempt to improve their services by increasing the grasp of their reference databases. Unfortunately, there are only so many words in the English language, and a great number of potential authors around the world. Some things can only be said in so many ways. What guarantee do we have that the limit TurnItIn sets on consecutive matching words being indicative of plagiarism is accurate? This becomes particularly pertinent when you consider the possibility of lack of originality in assessment by assessors around the world who subscribe to the same anti-plagiarism service.
- Reuse is not only discouraged - it is eliminated. Recall that this was a software engineering subject I was enrolled in. Recall, or realise if you are not a programmer, then code reuse is the holy grail of programming. Programmers should in general, strive to write code that can be reused. That is how programming develops. Large programs are written from smaller programs. The idea is to not reinvent the wheel, and is to write code good enough to serve the same task in other programs. Although it is not nearly as strong a principle as in programming, this idea of reuse carries over to much of engineering. Engineering is all about abstraction - complicated structures are built by making the fundamental pieces solid, and then copying them over and over again without having to think about their details. That is how all non-trivial engineering projects are built. By introducing some unknown word matching system to the assessment process, our lecture is essentially requiring that we make well sure that no part of our work is similar to anyone else's work. The programming assignments in this subject turned out to be laughable - developing client - server Java applications, using the java.net.* classes. Basically there is only one way to do this, and that is exactly how it is done in the accompanying tutorial on their use. But for fear of being branded a plagiarist because of word similarities, we as students must work to make our assignments original - essentially obfuscating the code - precisely what is discouraged in programming.
I think the point still holds in non-programming assignments, but understand that this is coming from the point of view of a student who has constantly been educated on the principles of abstraction and encapsulation - that building on and using existing works is how the field is advanced. I consider collaboration a primary learning and development tool. I have actually seen students in other courses, twist themselves and distort their work terribly to ensure that they reference exactly right and do not reproduce the words of another outside quotation marks. That behaviour has always seemed to me, to be an unnecessary over-reaction to the very real plagiarism problem. It seems with automatic anti-plagiarism systems, we cannot rely on honesty - instead we must ensure that no one else has ever said the same thing as ourselves before. And that gets very tricky if you happen to be presented with unoriginal assessment tasks.
One could imagine that beating the anti-plagiarism system becomes a race to submit the best phrases first. As the system fills up with student's work, more and more of the best ways to describe a topic will become out-of-bounds, and students will work harder and harder to devise news ways to describe old ideas. Again, not a inevitable outcome but an interesting thought experiment nonetheless.
- Frankly, it will be ineffective. I do quite a bit of programming. I am always in awe of the capabilities of modern computers, and love to harness their power to do wonderful things. However, I am also aware of their limitations, and see a growing ignorance towards the state of computing. I do not think the ignorance is necessarily due to a loss of intelligence in the population - it is just that as more and more people are affected by computers, a less and less proportion will be motivated to stay aware of their capabilities. The ignorance leads to a certain amount of fear and awe, and ultimately to blind faith. In the end, we must realise that computer systems are still deterministic. Think of the anti-plagiarism system not as a omnipresent black box which means the end of plagiarism, but as a machine which matches words. If you present the machine with groups of words which it has in its database, then it will show matches - also known as "detecting plagiarism". That said, a deterministic machine is not intelligent, and is trivial to trick. I foresee that forcing this system on such a large group of people will only incite those determined to plagiarise to work around the system. In fact, no longer do they have to defeat a reasoning human, they need only work around a deterministic machine. I would go further, and say that relying on an anti-plagiarism system opens the door for plagiarism to those that would not have considered it before. Traditional plagiarism required cunning, high risk and a energetic consideration for each circumstance. Defeating an anti-plagiarism system means applying systematic rules (not necessarily developed by the plagiarist themselves) and submitting electronically. This is equivalent to the traditional method of identity theft which required cunning, research and constant attentiveness. When people trust their identification to computers, all that is necessary is for someone to publish a procedure for defeating the deterministic machine and everyone so minded has access to an electronic identity theft device without the hard work.
So it is with anti-plagiarism. There are very accessible means of performing system identification tests on anti-plagiarism systems. One only needs an account with the service, and access to a large number of "trial" assessment runs. Building a model of how plagiarism is detected by that system then, simply requires repeat trials of similar but carefully modified documents, looking for the limits of similarity detection by the service.
Here are some of the methods which are likely to defeat this system:
- Get your sibling or friend to do the work for you. It wont be in the database, and an assessor is no longer forced to watch your style of work, so why not get someone a little more experienced in the subject to do your work?
- Translate a work from a different language. Plagiarise all you want if you have a translator - no machine translator will produce the same translation as a human.
- Submit your work as an image or a image-encoded PDF for example. This is likely to be noticed as suspect early on by a human assessor, but is listed here as another example of how simple it is to defeat a word matching machine.
- Change the order of shorter sentences, change the word order of longer ones.
- Remove the work you are plagiarising from the web before submitting. For example, issue a DOS (Denial Of Service) on the hosting web server, spoof the DNS record, perform a little social engineering to get it taken down. When the report comes back that you had plagiarised, there will be no record of the original work (possibly foiled by Google and other caches, so start early!) and therefore no grounds for action.
- You can copy an entire essay if you just change enough of the words and phrases around. Or get/pay someone else to do word exchanging....
- Or.... use an automatic word exchanging system. I just so happen to have prepared one already. I have called it aaps - the Anti-Anti-Plagiarism System. It is written in Perl and demonstrates the simplicity in forming the basis of a automated system which defeats plagiarism detection. I want to make this clear: aaps is not designed to be used to defeat any plagiarism systems (and in fact, in its current state it probably wont defeat that many), but to instead show a proof of concept, and tangibly demonstrate just how easy it is to build systems which defeat anti-plagiarism. Relying on automated, deterministic anti-plagiarism systems opens the door for automated anti-anti-plagiarism systems. No longer do you have to exercise cunning and skilful deception to avoid being caught for plagiarism - you simply follow the rules for bypassing the detection system.
I have offered my reasons for disliking the current system, but on the other hand I understand why the university is working to stop plagiarism. Fundamentally, successful plagiarism in a degree usually reduces the competency of the graduates, diluting the value of the degree and ultimately, reducing the commercial advantage of the university. There are plenty of arguments to support actions which reduce plagiarism, but I will not go through them here. I agree, and being against plagiarism personally for the reasons expressed early in this piece, I in fact support moves to reduce plagiarism. It is the current implementation which I am revolting against. It would appear only fair then, to at least offer some indication on what I believe is a more favourable solution.
Basically, the important thing is to not force me to work to gain a less favourable position, or to enter a legally binding "agreement". In particular, I should not have to expend more labour then I do currently, in return for a loss of copyright and a possibility of false accusation. Plagiarism checking should still be the task of the assessor. I am perfectly happy to submit electronically to aid the assessor in that task, however it must be up to the assessor to perform the plagiarism tests. Putting the onus on the assessor would make the decision to make use of the services of an external plagiarism detection service a whole lot more considered. Services like TurnItIn, at least currently, start to look considerably more dangerous legally, when you do not force the students to agree with their conditions. The task then relies on assessors choosing a system which is legitimate, reliable and trustworthy. If my original work is then abused in the copyright sense, at least there will be grounds for corrective action. One suggestion I read was to consider an academia sponsored research service that would perform the plagiarism detection for lecturers and academics, on a not-for-profit basis. That would solve a lot of the concerns that arise when including a commercial third party company and makes the solution a lot more relevant and adaptable to the particular university or discipline.
Offer the terms up front and open for discussion! This is essential. Students need to be informed at least, of the process and how plagiarism detection cases will be handled. Publish a code of conduct to be reviewed by students and academics.
I cannot offer a concrete solution then. I have thought plenty about what I would not like to see in a solution, but I do fall short of providing a fix all. That is where the assessor comes in I think - assessment is their game. Ranting about poor implementations is my game.
Comments
Commenting on my own article here again.
A frequently asked question that has come up, which on reflection was a big omission on my behalf, is how student copyright actually works within the University. This is fairly clear to me now, but I am sure there are those that are not aware:
Any work you create is automatically protected by the copyright act, without you making reference to that act. Including the copyright symbol, a date, and your name is by convention, a reasonable practice and worthwhile when making things official. Nonetheless, deliberate mention of the copyright is not required. As an undergraduate, this policy remains. The university make no claim on student's copyright. This changes if you are an employee (of the university or of a corporation) and you will often be asked to sign a waiver upon employment, relinquishing your copyright to that employer. That is why researchers, and some postgraduates, have a different relationship to copyright of their work, but undergraduates and others who have not specifically signed away their own copyright, retain full copyright protection and ownership of their own work.
This information is hard to parse from the University's documents, but if you feel so motivated, this is the place to start:
http://www.newcastle.edu.au/research/dir2000/appD.html
Posted by: Heath | July 25, 2004 1:53 PM
In the United States there are federal laws that prevent schools from sharing information with companies, including graded work. You might want to check to see if there are any similar laws in your area.
Posted by: Anonymous | August 25, 2004 2:18 AM
I am a university student and I've heard a lot about this service and if anything it is protecting the copyright of your paper from cheat sites and use by other students. Once you submit your paper, no on else can use that work without citing you properly, I hardly see how that affects you negatively. Just another crazy student. You should have to pay them for that type of service. If you are an honest student like you claim (bs) then you should be happy that your papers are being protected.
Posted by: Anonymous | March 23, 2005 12:56 AM
The copy right issue is easy to solve such that both parties remain happy. Log onto turnitin under a fake name. Thereby signing nothing away. The paper is assessed as it should be and you sign over no copy right claims. QED.
James
Posted by: James | April 6, 2005 6:39 AM
Eh, it hardly seems worthwhile replying, but seeing how the previous two comments offer easy pickings, I'll make the effort.
Anonymous 1 suggests I should pay for the service. Perhaps they missed the part where it was the university who is using the service. They're the ones paying money, so eventually yes, students do pay for it through their tuition fees. Perhaps they also missed the part where I explicitly said we were required to *give up* our copyright to our work so TurnItIn could do as they please. That ain't copyright protection. People can use my work as much as they please, whether I retain the copyright or TurnItIn inherits it. In the former case I could pursue copyright infringement paths if I wished, but in the latter I no longer have control - if that work is then submitted to a TurnItIn-enabled university, the student may be dealt with by the university. That means *very* little to me. Anonymous 1's bold claim about my academic honesty is a fitting conclusion to a petty comment.
Anonymous 2 thinks copyright will be retained by using a fake TurnItIn name. I'll counter with "ah, no". Remember - we are required to give up our copyright so TurnItIn may use our work. Whether they know my name or not is irrelevant.
Sheesh.
Posted by: Heath Raftery | June 7, 2007 5:41 PM
I agree with your points whole heartedly, I stumble upon your blog while researching turnit-in, because I am going back to college after graduating the first time eight years ago. My biggest problems with the system is giving up my copyright and not getting compensated for turnit-in's use of my work for matching purposes. Seems I am paying twice, once through tuition and once with my work, which enables turnit-in to make even more money off of me. I would probably be ok with it if I was compensated for my work. Like you though I do not plagerise, never have and never will. I just do not like the idea of a company making money off of me twice and I have no choice in the matter unless I want to take a hit in my GPA.
Posted by: David | October 11, 2008 7:04 PM
I only found your article on the turnitin thing when listening to a lecture which mentioned it. I've had to use it before, and your problems with quoting codes and whatnot seem to not be so serious- whenever a plagarism error happened with my assignments last year, it was flagged, but then I simply informed my lecturer every source I used (I kept records of everything I used, more detailed than what I put on the biblography thingie at the end of it).
However, your concerns on the company selling the assignments on- that is rather worrying, and I'm going to contact my university soonish for some clarification on that. I agree with you that there's no trust built between the students here in the UK and this anonymous American company we know next to nothing about, so yeah, perhaps we should take it with a bit of scepticism before signing the compulsory agreement.
The stagnancy of assignments... Well, we already have that in the non-Turnitin exercises we have, such as the 20% of module mid-semester tests, which are the same each year, and it is well known that people just bring in the last year's answers in here simply to copy the working out over... Theory questions tend to be identical accross the entire year group, the only exception being with the honest students (me when i initially took my course and inevitably failed).
This has made me think. :D
Posted by: Dylan Parry | November 5, 2010 4:21 AM
While I agree with the sentiment here, I do think you got a bit carried away: "issue a DOS (Denial Of Service) on the hosting web server." Really? Why not add "nuke the hosting facility" to your list of options? There are MUCH easier ways to defeat turnitin than those mentioned here.
Posted by: Comrad | December 22, 2010 2:39 AM