Wed Jan 5

When researchers don't have the proteins they need, they can get AI to 'hallucinate' new structures

Written by Ivan Anishchenko, Acting instructor in Computational Biology, University of Washington

All living organisms use proteins, which encompass a vast number of complex molecules. They perform a wide array of functions, from allowing plants to use solar energy for oxygen production ^[1] to helping your immune system ^[2] fight against pathogens to letting your muscles ^[3] perform physical work. Many drugs ^[4] are also based on proteins.

For many areas of biomedical research and drug development, however, there are no natural proteins that can serve as suitable starting points to build new proteins. Researchers designing new drugs to prevent COVID-19 infection ^[5], or developing proteins that can turn genes on or off ^[6] or turn cells into computers ^[7], had to create new proteins from scratch.

This process of de novo protein design ^[8] can be difficult to get right. Protein engineers like me ^[9] have been trying to figure out ways to more efficiently and accurately design new proteins with the properties we need.

Luckily, a form of artificial intelligence called deep learning ^[10] may provide an elegant way to create proteins that did not exist previously – hallucination ^[11].

New proteins created from scratch can be deployed to tackle a wide range of environmental and medical challenges.

Designing proteins from scratch

Proteins are made up of hundreds to thousands of smaller building blocks called amino acids ^[12]. These amino acids are connected to one another in long chains that fold up to form a protein ^[13]. The order in which these amino acids are connected to one another determines each protein’s unique structure and function.

Illustration of the four levels of protein structure.

Proteins are composed of amino acid chains that fold into a protein. LadyofHats/Wikimedia Commons ^[14]

The biggest challenge protein engineers face when designing new proteins is coming up with a protein structure that will perform a desired function. To get around this problem, researchers typically create design templates based on naturally occurring proteins with a similar function. These templates have instructions on how to create the unique folds of each particular protein. However, because a template must be created for each individual fold, this strategy is time-consuming, labor-intensive and limited by what proteins are available in nature.

Over the past few years, various research groups, including ^[15] the lab I work in ^[16], have developed a number of dedicated deep neural networks ^[17] – computer programs that use multiple processing layers to “learn” from input data to make predictions about a desired output.

When the desired output is a new protein, millions of parameters describing different facets of a protein are put into the network. What’s predicted is a randomly chosen sequence of amino acids mapped onto the most probable 3D structure that sequence would take.

Network predictions for a random amino acid sequence are blurry, meaning the final structure of the protein is not very clear-cut, while both naturally occurring proteins and proteins built from scratch produce much more well-defined protein structures.

Hallucinating new proteins

These observations hint at one way that new proteins can be generated from scratch – by tweaking random inputs to the network until predictions yield a well-defined structure.

The protein generation method ^[18] my colleagues ^[19] and I developed is conceptually similar to computer vision ^[20] methods such as Google’s DeepDream ^[21], which finds and enhances patterns in images.

These methods work by taking networks trained to recognize human faces or other patterns in images, like the shape of an animal or an object, and inverting them so that they learn to recognize these patterns where they don’t exist. In DeepDream, for example, the network is given arbitrary input images that are adjusted until the network can recognize a face or some other shape in the image. While the final image doesn’t look much like a face to a person looking at it, it would to the neural network.

The products of this technique are often referred to as hallucinations ^[22], and this is what we call our designed proteins, too.

Deep neural networks can also learn how to hallucinate images from words.

Our method ^[23] starts by passing a random amino acid sequence through a deep neural network. The resulting predictions are initially blurry, with unclear structures, as expected for random sequences. Next, we introduce a mutation that changes one amino acid in the chain into a different one and pass this new sequence through the network again. If this change gives the protein a more defined structure, then we keep the amino acid and we introduce another mutation into the sequence.

With each repetition of this process, the proteins get closer and closer to the real shape they would take if they were produced in nature. Thousands of repetitions are required to create a brand-new protein.

Using this process, we generated 2,000 new protein sequences predicted to fold into well-defined structures. Of these, we selected over 100 that were the most distinct in shape to physically recreate in the lab. Finally, we chose three of the top candidates for detailed analysis and confirmed that they were close matches to the shapes predicted by our hallucinated models.

Why hallucinate new proteins?

Our hallucination approach greatly simplifies the protein design pipeline. By eliminating the need for templates, researchers can directly focus on creating a protein based on desired functions and let the network take care of figuring out the structure for them.

Our work opens up multiple avenues for researchers to explore. Our lab is currently investigating ^[24] how to best use this hallucination approach to generate even more specificity in the function of designed proteins. Our approach can also be readily extended to design new proteins using other ^[25] recently developed ^[26] deep neural networks.

The potential applications of de novo proteins are vast. With deep neural networks, researchers will be able to create even more proteins that can break down plastics ^[27] to reduce environmental pollution, identify and respond ^[28] to unhealthy cells and improve vaccines ^[29] against existing and new pathogens – just to name a few.

[Like what you’ve read? Want more? Sign up for The Conversation’s daily newsletter ^[30].]

References

^{^} use solar energy for oxygen production (www.energy.gov)
^{^} immune system (www.livescience.com)
^{^} muscles (www.britannica.com)
^{^} Many drugs (doi.org)
^{^} prevent COVID-19 infection (www.doi.org)
^{^} turn genes on or off (doi.org)
^{^} turn cells into computers (www.doi.org)
^{^} de novo protein design (doi.org)
^{^} Protein engineers like me (scholar.google.com)
^{^} deep learning (www.techtarget.com)
^{^} hallucination (doi.org)
^{^} amino acids (www.britannica.com)
^{^} protein (www.britannica.com)
^{^} LadyofHats/Wikimedia Commons (commons.wikimedia.org)
^{^} including (doi.org)
^{^} lab I work in (www.bakerlab.org)
^{^} deep neural networks (towardsdatascience.com)
^{^} protein generation method (doi.org)
^{^} my colleagues (www.ipd.uw.edu)
^{^} computer vision (towardsdatascience.com)
^{^} Google’s DeepDream (ai.googleblog.com)
^{^} hallucinations (www.americanscientist.org)
^{^} Our method (doi.org)
^{^} currently investigating (doi.org)
^{^} other (www.ipd.uw.edu)
^{^} recently developed (deepmind.com)
^{^} break down plastics (doi.org)
^{^} identify and respond (doi.org)
^{^} improve vaccines (doi.org)
^{^} Sign up for The Conversation’s daily newsletter (theconversation.com)

Authors: Ivan Anishchenko, Acting instructor in Computational Biology, University of Washington

When researchers don't have the proteins they need, they can get AI to 'hallucinate' new structures

Designing proteins from scratch

Hallucinating new proteins

Why hallucinate new proteins?

References

Memphis police numbers dropped by nearly a quarter in recent years – were staffing shortages a factor in the killing of Tyre Nichols?

8 GOP candidates debate funding to Ukraine, Trump's future and -- covertly, with dog whistles -- race

Saudi Arabia is a repressive regime – and so are a lot of US allies

With Beyoncé’s foray into country music, the genre may finally break free from the stereotypes that have long dogged it

Preventive care may no longer be free in 2026 because of HIV stigma − unless the Trump administration successfully defends the ACA

How the U.S. could in fact make Canada an American territory

Why is 'Blonde' – Netflix's Marilyn Monroe biopic – rated NC-17 instead of TV-MA?

2020 campaign shows the more women run, the more they are treated like candidates – not tokens

How Marine Le Pen managed to gain ground with youth voters – and why her success isn't being replicated by the US right

Teenagers reveal what they really think of Donald Trump