Jump to page content
Engineering Impact: Summer 2021

Summer 2021

Novel algorithms help gene engineers battle novel coronavirus

by College of Engineering

Detection and Diagnosis

The mapping of the human genome — the complete code of instructions that enables us to develop and function — is vital in the fight against the coronavirus pandemic. Genome engineering, or genome editing, essentially alters an organism’s genetic code, and was recognized in awarding the Nobel Prize in Chemistry in 2020. Recently, labs have turned to gene-based technologies to develop vaccines in record time compared with the traditional approach, in which weakened viruses are grown in mammalian or insect cells and the desired pieces are extracted to inject into humans.

According to Somali Chaterji, assistant professor of agricultural and biological engineering, a staggering 94 vaccines to combat the severe acute respiratory syndrome coronavirus 2 (SARSCoV-2) are undergoing clinical evaluation. The initial three vaccines being administered in the U.S. are examples of genetic engineering being used to safeguard the world.

The first two vaccines, from Pfizer/BioNTech and Moderna, use messenger RNA (mRNA) coated with lipid nanoparticles to enable them to be taken up by our cells — which then follow the vaccines’ instructions to make a harmless piece of the spike protein found on the COVID-19 virus, triggering an immune response. The third vaccine (from Johnson & Johnson) uses a common cold virus, adenovirus, which has been inactivated by removing the replication gene and harmful genes and splicing in the spike protein gene instead, in an example of recombinant gene technology.

Along with the mRNA coronavirus vaccines, Chaterji says, RNA-based therapeutics that mitigate the full force of the novel coronavirus once someone has been infected have risen to the forefront. “My lab, the Innovatory for Cells and Neural Machines (ICAN), has worked on developing algorithms to reduce the side effects of RNA-powered therapies that can occur through ‘off-targeting’ to unwanted regions of the genome,” Chaterji says. “We are developing simple, LEGO™-style building blocks of these algorithms, mixing and matching these ‘kernels,’ and stitching them together with programming language constructs and the associated compiler to accelerate the development cycle of new computational genomics algorithms.” Her lab also has come up with a Natural Language Processing (NLP)-inspired technique to decode the language of cells probabilistically, to detect and correct errors in sequenced reads. Underlying this approach is a perplexity metric — an indication, based on the currently observed sequence, of what the next sequence will be, with probability scores for differing sequence outcomes. “For an analogy, think smartphone software autocorrecting your typing,” she says. “A lower perplexity metric score for a word means that the software is more likely to suggest that word.”

Such advances, she believes, will lead to greater adoption and effectiveness of precision medicine. “Consider the current coronavirus pandemic, in which some people are asymptomatic, while others have varying degrees of resilience to the disease, possibly due to variations in their genetic makeup,” she says. “I have used machine learning (ML) techniques, specifically neural networks and support vector machines, to identify patterns in the epigenome (a set of chemical compounds that tell the genome what to do) that result in different phenotypes in different humans. My work has enabled identifying regions of the genome that enhance gene regulation called enhancers.

“I am leveraging the power of neural networks to extract patterns in the genomic code in order to decipher the computation of cells for precision mRNA therapeutics and correct errors in sequenced genomic codes. With precision-centric RNA technologies, I aim to make the translation of RNA therapeutics — RNA-based drugs or CRISPR-Cas9-based genome editing — more specific, by decreasing the incidence of off-targeting and increasing the robustness of on-target activity. This interplay between ML and data engineering, combined with genomics and cell engineering, will speed the translation of lab research into clinical treatments.”

It all comes down to tuning the “music of the cell,” she says. “Think of the epigenome and genome as affecting the ‘computation’ of life — in that the alphabets of the genome are like the strings of the note, and if the alphabets are modified or edited by natural mutations or genome editing, the music can become discordant. In disease, the music gets distorted and needs to be fixed. That is why we named one of our deep neural network (DNN)-based algorithms ‘Aikyatan,’ from the Sanskrit, meaning ‘one harmony.’”