Wed Jan 29

Problematic Paper Screener: Trawling for fraud in the scientific literature

Written by Guillaume Cabanac, Professor of Computer Science, Institut de Recherche en Informatique de Toulouse

Screenshot of text with one phrase highlighted yellow

Have you ever heard of the Joined Together States ^[1]? Or bosom peril ^[2]? Kidney disappointment ^[3]? Fake neural organizations ^[4]? Lactose bigotry ^[5]? These nonsensical, and sometimes amusing, word sequences are among thousands ^[6] of “tortured phrases ^[7]” that sleuths have found littered throughout reputable scientific journals.

They typically result from using paraphrasing tools to evade plagiarism-detection software when stealing someone else’s text. The phrases above are real examples of bungled synonyms for the United States, breast cancer, kidney failure, artificial neural networks, and lactose intolerance, respectively.

We are a pair of computer scientists at Université de Toulouse ^[8] and Université Grenoble Alpes ^[9], both in France, who specialize in detecting bogus publications. One of us, Guillaume Cabanac, has built an automated tool that combs through 130 million scientific publications every week and flags those containing tortured phrases.

The Problematic Paper Screener ^[10] also includes eight other detectors ^[11], each of which looks for a specific type of problematic content.

In addition to tortured phrases, the Problematic Paper Screener flags ChatGPT fingerprints: snippets of telltale text left behind by the AI agent. Screenshot by The Conversation, CC BY-ND ^[12]^[13]

Several publishers use our paper screener, which has been instrumental in more than 1,000 retractions. Some have integrated the technology into the editorial workflow to spot suspect papers upfront. Analytics companies have used the screener for things like picking out suspect authors from lists of highly cited researchers. It was named one of 10 key developments in science ^[14] by the journal Nature in 2021.

So far, we have found:

Nearly 19,000 papers containing at least five tortured phrases each.
More than 280 gibberish papers – some still in circulation – written entirely by the spoof SCIgen program ^[15] that Massachusetts Institute of Technology students came up with nearly 20 years ago.
More than 764,000 articles that cite retracted works that could be unreliable. About 5,000 of these articles have at least five retracted references listed in their bibliographies. We called the software that finds these the “Feet of Clay” detector ^[16] after the biblical dream story where a hidden flaw is found in what seems to be a strong and magnificent statue. These articles need to be reassessed ^[17] and potentially retracted.
More than 70 papers containing ChatGPT “fingerprints” with obvious signs such as “Regenerate Response ^[18]” or “As an AI language model, I cannot …”^[19] in the text. These articles represent the tip of the tip of the iceberg: They are cases where ChatGPT output has been copy-pasted wholesale into papers without any editing (or even reading) and has also slipped past peer reviewers and journal editors alike. Some publishers allow the use of AI to write papers, provided the authors disclose it. The challenge is to identify cases where chatbots are used not just for language-editing purposes but to generate content – essentially fabricating data.

There’s more detail about our paper screener and the problems it addresses in this presentation ^[20] for the Science Studies Colloquium ^[21].

Read The Conversation’s investigation into paper mills here: Fake papers are contaminating the world’s scientific literature, fueling a corrupt industry and slowing legitimate lifesaving medical research ^[22]

References

^{^} Joined Together States (pubpeer.com)
^{^} bosom peril (pubpeer.com)
^{^} Kidney disappointment (pubpeer.com)
^{^} Fake neural organizations (pubpeer.com)
^{^} Lactose bigotry (pubpeer.com)
^{^} thousands (www.irit.fr)
^{^} tortured phrases (doi.org)
^{^} Université de Toulouse (www.irit.fr)
^{^} Université Grenoble Alpes (membres-lig.imag.fr)
^{^} Problematic Paper Screener (www.irit.fr)
^{^} eight other detectors (doi.org)
^{^} Screenshot by The Conversation (pubpeer.com)
^{^} CC BY-ND (creativecommons.org)
^{^} key developments in science (doi.org)
^{^} SCIgen program (pdos.csail.mit.edu)
^{^} “Feet of Clay” detector (doi.org)
^{^} reassessed (doi.org)
^{^} Regenerate Response (pubpeer.com)
^{^} “As an AI language model, I cannot …” (pubpeer.com)
^{^} presentation (youtu.be)
^{^} Science Studies Colloquium (scientificelites.org)
^{^} Fake papers are contaminating the world’s scientific literature, fueling a corrupt industry and slowing legitimate lifesaving medical research (theconversation.com)

Authors: Guillaume Cabanac, Professor of Computer Science, Institut de Recherche en Informatique de Toulouse

Problematic Paper Screener: Trawling for fraud in the scientific literature

References

How Trump and Brexit united Europe

Winter brings more than just ugly sweaters – here's how the season can affect your mind and behavior

How to stay safe during heat waves – and the heat stroke warning signs to watch for

College students with loans more likely to report bad health and skip medicine and care, study finds

Supreme Court skeptical that Colorado − or any state − should decide for whole nation whether Trump is eligible for presidency

What is walking meditation?

Poland invites nationalism in its commemoration of WWII by moving location and inviting Trump

MLB's decision to drop Atlanta highlights the economic power companies can wield over lawmakers – when they choose to

When Trump calls Russia a 'competitor' for the US, he might be talking about natural gas exports

Pelé: a global superstar and cultural icon who put passion at the heart of soccer