Fri Aug 2

CAPTCHAs: The struggle to tell real humans from fake

Written by Tam Nguyen, Associate Professor of Computer Science, University of Dayton

CAPTCHAs are those now ubiquitous challenges you encounter to prove that you’re a human and not a bot when you go to log in to many websites.

Websites and mobile apps have long been attacked by bots on a massive scale ^[1]. Those malicious bots ^[2] are programmed to automatically consume a large amount of computing resources, post spam messages, collect data from websites and even register and perform user authentication. This state of affairs led to the introduction of CAPTCHA ^[3], which stands for Completely Automated Public Turing test to tell Computers and Humans Apart.

As a computer scientist ^[4], I see CAPTCHAs as an effective shield ^[5] for websites to prevent automated attacks, enhance cybersecurity and improve user experience – at least in the short term. For example, denial-of-service attacks create a bottleneck and cause a web server to become overloaded and unresponsive. CAPTCHAs help stop automated bots from executing such denial-of-service attacks and even fraudulent activities such as sending spam messages and creating fake accounts.

Meanwhile, financial institutions rely on CAPTCHAs to protect against bots trying to steal clients’ data ^[6]. Additionally, CAPTCHAs improve the integrity of online voting and polls ^[7] by preventing automated bots from manipulating results.

How CAPTCHAs work

CAPTCHAs are designed to show questions or challenges that are easy for humans but difficult for computer bots to answer. In practice, there are several types of CAPTCHAs: text-based, image-based, audio-based and behavior-based.

Text-based CAPTCHAs ^[8] have been very popular since the early days of the internet. This CAPTCHA type requires users to read a distorted and complicated image of text and enter the answer into a text field. A variant of text-based CAPTCHA asks users to solve simple math problems like “18+5” or “23-7.” However, it was recently solved by advanced optical character recognition algorithms ^[9], thanks to the proliferation of deep-learning AIs.

three rectangular graphics, the left and center contain text and colors, the right a photo

CAPTCHAs come in text, audio and image forms. Screencaptures by Tam Nguyen

When the text is tuned to be more distorted and more complicated, actual humans ironically fail to provide a correct answer ^[10].

Audio CAPTCHA ^[11] plays a short audio clip containing a series of numbers or letters spoken by a human or synthetic voice, which the user listens to and then types into a provided text field. The input is verified against the correct answer to determine whether the user is human. Like text-based CAPTCHAs, audio CAPTCHA can be difficult for humans to interpret ^[12] due to factors such as background noise, poor audio quality, heavy distortion and unfamiliar accents.

Image-based CAPTCHAs ^[13] were introduced to make it more challenging for bots. Users must identify specific objects from images – for example, selecting all image blocks containing traffic lights. This task leverages human visual perception, which is still superior to most computer vision-based bots. However, this type of CAPTCHA also confuses people in many cases ^[14].

Photo of a person riding a bicycle segmented into 16 squares

Image CAPTCHAs often confuse people. Is the rider considered part of the bicycle? Annotated screencapture by Tam Nguyen

Behavior-based CAPTCHAs ^[15] analyze user behaviors such as mouse movements and typing patterns. reCAPTCHA ^[16], a popular behavior-based CAPTCHA, requires users to check the “I am not a robot” box. During this process, reCAPTCHA analyzes mouse movement and mouse click to differentiate between humans and bots. Humans typically have more varied and less predictable behaviors, while bots often show precise and consistent actions.

AI vs. human

CAPTCHA is one more battleground in the seemingly endless battle between AI and humans. Nowadays, AI has become more advanced, using modern techniques such as deep learning and computer vision to solve CAPTCHA challenges.

For instance, optical character recognition algorithms have improved ^[17], making text-based CAPTCHAs less effective. Audio CAPTCHA can be bypassed by advanced speech-to-text technology ^[18]. Similarly, AI models trained on vast image datasets can solve many image-based CAPTCHAs with high accuracy rates ^[19].

On the other side of the battlefield, CAPTCHA researchers have created more complex CAPTCHA technologies. For example, reCAPTCHA assesses user interactions and computes their likelihood of being human.

Ironically, humans are helping AI solve complicated CAPTCHAs. For instance, click farms hire a large pool of low-paid workers to click on ads, such as social media posts, follow accounts, write fake reviews and even solve CAPTCHA questions. Their work is to help AI systems behave like humans ^[20] in order to defeat CAPTCHAs and other fraud-prevention techniques.

The history of CAPTCHAs.

The future of CAPTCHAs

The future of CAPTCHAs will be influenced by the ongoing advancements in AI. The traditional CAPTCHA methods are becoming less effective, thus future CAPTCHA systems are likely to focus more on analyzing user behavior ^[21], such as how people interact with websites, making it harder for bots to mimic that behavior.

Websites might turn to the use of biometric CAPTCHAs, such as facial recognition or fingerprint scanning, though these raise privacy concerns ^[22]. CAPTCHA can be replaced by blockchain, which uses verifiable credentials ^[23] to authenticate users. These credentials, issued by trusted entities and stored in digital wallets, ensure interactions are performed by verified humans rather than bots.

Future CAPTCHAs might work alongside AI systems in real time, constantly adapting and evolving to stay ahead of automated attacks.

References

^{^} attacked by bots on a massive scale (securitytoday.com)
^{^} malicious bots (gcore.com)
^{^} CAPTCHA (www.britannica.com)
^{^} computer scientist (scholar.google.com.sg)
^{^} CAPTCHAs as an effective shield (www.red-button.net)
^{^} steal clients’ data (www.payway.com.au)
^{^} integrity of online voting and polls (fraudblocker.com)
^{^} Text-based CAPTCHAs (habr.com)
^{^} advanced optical character recognition algorithms (medium.com)
^{^} fail to provide a correct answer (www.wired.com)
^{^} Audio CAPTCHA (doi.org)
^{^} difficult for humans to interpret (nymag.com)
^{^} Image-based CAPTCHAs (www.grovestreettt.com)
^{^} confuses people in many cases (www.boredpanda.com)
^{^} Behavior-based CAPTCHAs (cloud.google.com)
^{^} reCAPTCHA (www.google.com)
^{^} optical character recognition algorithms have improved (doi.org)
^{^} advanced speech-to-text technology (securityaffairs.com)
^{^} high accuracy rates (medium.com)
^{^} help AI systems behave like humans (www.arkoselabs.com)
^{^} focus more on analyzing user behavior (www.linkedin.com)
^{^} raise privacy concerns (www.linkedin.com)
^{^} verifiable credentials (www.dock.io)

Authors: Tam Nguyen, Associate Professor of Computer Science, University of Dayton

CAPTCHAs: The struggle to tell real humans from fake

How CAPTCHAs work

AI vs. human

The future of CAPTCHAs

References

What the world can learn from the Buddhist concept loving-kindness

Body dysmorphic disorder is more common than eating disorders like anorexia and bulimia, yet few people are aware of its dangers

Reckoning with slavery: What a revolt's archives tell us about who owns the past

What can drones do to protect civilians in armed conflict?

What social change movements can learn from fly fishing: The value of a care-focused message

Historians are learning more about how the Nazis targeted trans people

World Vision tinkers with its 70-year-old child sponsorship model

This tropical plant builds isolated ‘apartments’ to prevent battles among the aggressive ant tenants it relies on for survival

The CDC now recommends wearing a mask in some cases – a physician explains why and when to wear one

¿Por qué se fortaleció la tormenta Ida en el Noreste tan rápido después de haberse debilitado?