Wed Mar 10

It's not just a social media problem – how search engines spread misinformation

Written by Chirag Shah, Associate Professor of Information Science, University of Washington

Search engines are one of society’s primary gateways to information and people, but they are also conduits for misinformation. Similar to problematic social media algorithms ^[1], search engines learn to serve you what you and others have clicked on before. Because people are drawn to the sensational, this dance between algorithms and human nature can foster the spread of misinformation.

Search engine companies, like most online services, make money not only by selling ads, but also by tracking users and selling their data through real-time bidding ^[2] on it. People are often led to misinformation by their desire for sensational and entertaining news as well as information that is either controversial or confirms their views. One study found that more popular YouTube videos about diabetes are less likely to have medically valid information ^[3] than less popular videos on the subject, for instance.

Ad-driven search engines, like social media platforms, are designed to reward clicking on enticing links because it helps the search companies boost their business metrics. As a researcher who studies the search and recommendation systems ^[4], I and my colleagues show that this dangerous combination of corporate profit motive and individual susceptibility makes the problem difficult to fix ^[5].

How search results go wrong

When you click on a search result, the search algorithm learns that the link you clicked is relevant for your search query. This is called relevance feedback ^[6]. This feedback helps the search engine give higher weight to that link for that query in the future. If enough people click on that link enough times, thus giving strong relevance feedback, that website starts coming up higher in search results for that and related queries.

People are more likely to click on links shown up higher ^[7] on the search results list. This creates a positive feedback loop – the higher a website shows up, the more the clicks, and that in turn makes that website move higher or keep it higher. Search engine optimization techniques use this knowledge to increase the visibility of websites.

There are two aspects to this misinformation problem: how a search algorithm is evaluated and how humans react to headlines, titles and snippets. Search engines, like most online services, are judged using an array of metrics, one of which is user engagement. It is in the search engine companies’ best interest to give you things that you want to read, watch or simply click. Therefore, as a search engine or any recommendation system creates a list of items to present, it calculates the likelihood that you’ll click on the items.

Traditionally, this was meant to bring out the information that would be most relevant. However, the notion of relevance has gotten fuzzy because people have been using search to find entertaining search results as well as truly relevant information ^[8].

Imagine you are looking for a piano tuner. If someone shows you a video of a cat playing a piano, would you click on it? Many would, even if that has nothing to do with piano tuning. The search service feels validated with positive relevance feedback and learns that it is OK to show a cat playing a piano when people search for piano tuners.

In fact, it is even better than showing the relevant results in many cases. People like watching funny cat videos, and the search system gets more clicks and user engagement.

This might seem harmless. So what if people get distracted from time to time and click on results that aren’t relevant to the search query? The problem is that people are drawn to exciting images and sensational headlines. They tend to click on conspiracy theories and sensationalized news ^[9], not just cats playing piano, and do so more than clicking on real news ^[10] or relevant information.

Famous but fake spiders

In 2018, searches for “new deadly spider” spiked on Google ^[11] following a Facebook post that claimed a new deadly spider killed several people in multiple states. My colleagues and I analyzed the top 100 results from Google search for “new deadly spider” during the first week of this trending query.

Distribution of search results for 'new deadly spider' on Google

The first two pages of Google search results for ‘new deadly spider’ in August 2018 (shaded area) were related to the original fake news post about that subject, not debunking or otherwise factual information. Chirag Shah, CC BY-ND ^[12]

It turned out this story was fake ^[13], but people searching for it were largely exposed to misinformation related to the original fake post. As people continued clicking and sharing that misinformation, Google continued serving those pages at the top of the search results.

This pattern of thrilling and unverified stories emerging and people clicking on them continues, with people apparently either being unconcerned with the truth or believing that if a trusted service such as Google Search is showing these stories to them then the stories must be true. More recently, a disproven report ^[14] claiming China let the coronavirus leak from a lab gained traction on search engines because of this vicious cycle.

Spot the misinformation

To test how well people discriminate between accurate information and misinformation, we designed a simple game called “Google Or Not ^[15].” This online game shows two sets of results for the same query. The objective is simple – pick the set that is reliable, trustworthy or most relevant.

A screenshot showing two sets of Google search results side-by-side

In tests, about half the time people can’t tell the difference between Google search results containing misinformation and those with only trustworthy results. Chirag Shah, CC BY-ND ^[16]

One of these two sets has one or two results that are either verified and labeled as misinformation or a debunked story. We made the game available publicly and advertised through various social media channels. Overall, we collected 2,100 responses from over 30 countries.

When we analyzed the results, we found that about half the time people mistakenly picked as trustworthy the set with one or two misinformation results ^[17]. Our experiments with hundreds of other users over many iterations have resulted in similar findings. In other words, about half the time people are picking results that contain conspiracy theories and fake news. As more people pick these inaccurate and misleading results, the search engines learn that that’s what people want.

Questions of Big Tech regulation and self-regulation aside, it’s important for people to understand how these systems work and how they make money. Otherwise market economies and people’s natural inclination to be attracted to eye-catching links will keep the vicious cycle going.

[Understand new developments in science, health and technology, each week. Subscribe to The Conversation’s science newsletter ^[18].]

References

^{^} problematic social media algorithms (theconversation.com)
^{^} through real-time bidding (www.eff.org)
^{^} less likely to have medically valid information (theconversation.com)
^{^} studies the search and recommendation systems (scholar.google.com)
^{^} makes the problem difficult to fix (chiragshah.org)
^{^} relevance feedback (link.springer.com)
^{^} more likely to click on links shown up higher (www.smartinsights.com)
^{^} entertaining search results as well as truly relevant information (www.dummies.com)
^{^} tend to click on conspiracy theories and sensationalized news (www.aaai.org)
^{^} more than clicking on real news (www.buzzfeednews.com)
^{^} spiked on Google (trends.google.com)
^{^} CC BY-ND (creativecommons.org)
^{^} was fake (www.snopes.com)
^{^} disproven report (www.cnn.com)
^{^} Google Or Not (infoseeking.org)
^{^} CC BY-ND (creativecommons.org)
^{^} about half the time people mistakenly picked as trustworthy the set with one or two misinformation results (infoseeking.org)
^{^} Subscribe to The Conversation’s science newsletter (theconversation.com)

Authors: Chirag Shah, Associate Professor of Information Science, University of Washington

It's not just a social media problem – how search engines spread misinformation

How search results go wrong

Famous but fake spiders

Spot the misinformation

References

'The Current War: Director's Cut' shows how the electric power system we take for granted came to be

What is ectopic pregnancy? A reproductive health expert explains

What happens if you need to pee while you're asleep?

Black Americans are bearing the brunt of coronavirus recession – this should come as no surprise

Drones to deliver incessant buzzing noise, and packages

Coronavirus variants, viral mutation and COVID-19 vaccines: The science you need to understand

This course uses big data to examine how American newspapers covered lynchings

Empty pews take a financial toll on many US congregations

Why California's new rooftop mandate isn't good enough for some solar power enthusiasts

Audiences love the anger: Alex Jones, or someone like him, will be back