We’re continuing our Research Integrity Toolkit blog series with Research Square in honor of this year’s Peer Review Week theme, “Research Integrity: Creating and supporting trust in research.” In this post, Head of Marketing and Community Development at Scholastica Danielle Padula discusses plagiarism detection best practices for journals. Be sure to check out the corresponding tools for authors posted on the Research Square blog!

According to a recent editorial on “Detecting and handling suspected plagiarism in submitted manuscripts” published in the Journal of the Royal College of Physicians of Edinburgh (Misra & Ravindran, 2021), “literature suggests that up to a sixth of manuscripts submitted to journals might be affected by plagiarism.”

One-sixth may not seem like a large fraction at face value. However, considering that up to a sixth of all papers submitted to scholarly journals could be the product of stolen or unethically repurposed work is certainly food for thought and a reminder that journals should begin taking precautions where possible to prevent plagiarism, if they aren’t already.

Plagiarism has negative ramifications for all publishing stakeholders. If flagged too late in peer review, it can take away from limited reviewer time at a point when many are reporting referee shortages across disciplines. And, if left uncaught, plagiarism results in a loss of intellectual property (IP) for authors and a blemish on journal reputations, in addition to compromising the integrity of the scholarly record.

For all of the above reasons, verifying the originality of manuscript submissions to the fullest extent possible is foundational to upholding the research integrity of any scholarly journal, regardless of discipline. Journal publishers and especially editorial teams must be vigilant of potential plagiarism indicators and have protocols to screen submissions for red flags.

In this blog post, we break down how to establish journal plagiarism detection policies and processes, tips to proactively address clarifying questions authors may have about plagiarism, and manuscript similarity screening best practices.

Establish clear plagiarism definitions and policies for editors and authors

The first step to developing a plagiarism detection plan for the journals you work with, or for honing your current approach, is getting your editorial team aligned around the different types of plagiarism they should be vigilant of, with clear definitions.

There are four main categories of potential plagiarism:

  • Direct plagiarism: As defined by the US Office of Research Integrity, plagiarism includes “both the theft or misappropriation of intellectual property and the substantial unattributed textual copying of another’s work. […] Substantial unattributed textual copying of another’s work means the unattributed verbatim or nearly verbatim copying of sentences and paragraphs which materially mislead the ordinary reader regarding the contributions of the author.”
  • Duplicate or redundant publication: In the case of “duplicate” or “redundant” publication, rather than copying another person’s work, an author is directly or nearly verbatim copying a piece of their own work without citing the original publication. Or they may submit a manuscript that displays significant overlap with another work they already published (e.g., using the same dataset as in a previous journal article).
  • “Text recycling” or self-plagiarism: Similar to duplicate publication, “text recycling“ (a.k.a. “self-plagiarism” sometimes also referred to as “text overlap”) is when an author copies parts of one or more of their published works into a new manuscript. For example, if a scholar reuses portions of their dissertation in a new submission without citation.
  • “Salami slicing” or minor overlap: Finally, “salami slicing” is when an author includes some elements in a new manuscript that are similar to one or more of their published works but not so much that they are clearly “recycling” parts of previous publications.

The definition of direct plagiarism is pretty steadfast across disciplines. However, depending on the needs and expectations of your journal and its subject area, how your editorial team defines and approaches the different instances of potential self-plagiarism above may vary.

For example, biomedical and life sciences journals may allow or even require authors to essentially “recycle” sections of previously published works with proper citation in new submissions, such as descriptions of clinical trials or materials and methods sections like ASM Journals. As noted in this article on “The Dos and Don’ts of Text Recycling” from ASM, “groups like the Text Recycling Research Project (TRRP) have argued that limited text recycling, in the introduction and/or methods and materials sections of papers, is both acceptable, and unlikely to pose a legal risk […].”

Journals also often allow duplicate publication in certain instances, such as submissions based on conference posters and abstracts, data or methods previously published alone in a repository (i.e., not yet compiled into an article), preprints, and translations.

Once your editorial team nails down the forms of plagiarism you’ll check for and definitions (i.e., instances where you allow duplicate publication), you can head off potential points of confusion among authors by adding plagiarism policies to your website. For example, ASM has a Journals Publishing Ethics Checklist on its website with a section on how to avoid plagiarism, including what it deems to be unacceptable “text recycling.”

Bear in mind that the concept of self-plagiarism will likely be new to many authors, especially early career researchers (ECRs), who may not realize it’s possible to plagiarize oneself. Communicating to authors when and how to ensure they’re providing references to any of their own published works will help clarify expectations and ensure transparency for readers.

Develop processes for screening submissions and addressing potential plagiarism

As noted, ideally, plagiarism detection should occur before sending manuscripts out for external peer review to avoid wasting reviewers’ time, which could also adversely affect reviewer retention. That’s where doing initial similarity check screenings of all submissions comes into play.

While identifying potential instances of plagiarism was once purely up to the keen eyes and memories of journal editors and reviewers versed in the research literature of their discipline, today, technology is making identifying potential plagiarism much less a matter of chance. With plagiarism detection software like iThenticate from Turnitin and Crossref’s sister service for its members, Similarity Check, journals can run scans to compare submissions against millions of full-text documents from research articles to preprints to conference proceedings, in a matter of minutes.

Manuscripts should be checked for similarity to other works at the time of submission. Journals not using dedicated plagiarism detection software can still manually screen manuscripts to see if they raise any red flags. Nancy Gough, founder of BioSerendipity and former editor of Science Signaling, discussed how she did this in the past in a feature article on detecting and investigating plagiarism for the Council of Science Editor publication ScienceEditor.

Gough described what made her suspect plagiarism when assessing an invited review article. “What alerted me was the quality and style of the writing from a nonnative–English speaker with whom I had been corresponding,” said Gough. “The writing did not match the writing in the correspondence we had exchanged regarding contributing the review article.”

Gough did a more thorough manual similarity screening of the article by searching for published reviews on the same topic. She eventually found one by the same author that appeared suspiciously similar. Gough went on to explain how she cross-checked it with the submission. “I compared several aspects of the two documents: (i) the overall organization in terms of the sections; (ii) the beginnings and endings of the paragraphs; and (iii) the complete text of one entire section, including the references cited in that section.” When she found that the review text and references were copied virtually verbatim, she pursued next steps in the journal’s plagiarism investigation process.

Of course, using plagiarism detection software can significantly speed up processes like the ones described above and help ensure instances of plagiarism don’t fall through the cracks. With software like iThenticate and Similarity Check, editors can either run one-off similarity checks for incoming submissions or automate similarity report generation through their peer review software, such as Scholastica’s Peer Review System (i.e., similarity checks are automatically initiated for all submissions with reports displayed in the editor work area).

Quick tip for Scholastica users: Starting in October 2022, Scholastica’s Peer Review System will include the option to integrate with the latest version of Crossref’s Similarity Check plagiarism detection service (using iThenticate V2 software).

In addition to having processes in place for running similarity checks for all submissions, it’s imperative to have processes for reviewing similarity reports (more on this below) and investigating instances of suspected plagiarism. The Committee on Publication Ethics (COPE) has comprehensive process flowcharts you can use as a guide for plagiarism in a submitted manuscript and plagiarism in a published article.

Remember, plagiarism can occur at many stages of publishing (e.g., at submission, during revisions, following acceptance), so you should be ready to screen manuscripts and address concerns at any point in your editorial workflows. When contacting authors or their institutions about potential plagiarism, avoid accusatory communication and instead focus on explaining your journal policies and asking for clarification where needed. Also, be sure to conceal the identity of any outside party who raises plagiarism concerns.

For more information on how to respond to plagiarism, check out this past guide from COPE, which covers advice for contacting authors, issuing corrections, and issuing retractions (while it was published in 2011, the guidance is still relevant today).

Know how to read similarity reports if you’re using plagiarism detection software

For journals using plagiarism detection software, it’s also vital to train all editors on how to read the similarity reports it generates. iThenticate and Similarity Check both generate two types of reports: 1) a text overlap analysis that links to documents with detected overlaps and 2) an “Overall Similarity Score,” which is a cumulative percentage of the amount of text in the manuscript that overlaps with one or more published works.

At the highest level, it’s essential for all editors to understand that a high Similarity Score does not automatically mean a manuscript contains plagiarism. High scores can result from direct plagiarism, which is unquestionably a clear red flag. But they can also result from overlaps with multiple texts (whether a handful or many) that may or may not be of concern.

As discussed above, there are many valid reasons for duplicate publication and “text recycling” that your journal may make allowances for and therefore want to disregard (e.g., allowing authors to submit manuscripts they previously preprinted). High similarity scores can also result from various minor text overlaps that aren’t problematic (e.g., legitimately cited references). To help editors save time evaluating text overlap that your journal allows, you can configure plagiarism detection software to have specific exclusion criteria. For example, you might exclude all references/quotes and preprints (i.e., if your journal allows or encourages authors to preprint submissions). Editors will still be able to see excluded overlaps, but they’ll be filtered from report views, so only the most pressing potential issues are immediately flagged.

Editors should also be aware that a low similarity score doesn’t necessarily mean a manuscript is good to go, since any amount of direct plagiarism of another work is a problem (e.g., a 10% similarity score with a single published article). For these reasons, every similarity report should be reviewed by an editor who has the necessary subject-matter expertise and context to determine whether findings require investigation.

For more information on how to assess similarity scores, check out this guide from iTheticate. If you’re using Similarity Check, you can also learn more about setting up exclusions here.

Putting it all together: Plagiarism in context

At the end of the day, plagiarism detection is all about the context. As noted by Nancy Gough, “there are no absolute rules or a similarity threshold that will allow this process to be completely automated.” However, technology can make plagiarism detection a lot faster and easier. And with clear plagiarism definitions and policies, as well as processes for screening submissions and investigating concerns, journals can judiciously and efficiently review potential instances of plagiarism.

As noted, this blog is part of a Research Integrity Toolkit series in partnership with Research Square for Peer Review Week 2022. Click here to read the series recap, which includes links to all the resource blogs and a corresponding infographic.

Danielle Padula
This post was written by Danielle Padula, Head of Marketing and Community Development
Tales from the Trenches