Many medical drugs work by modulating the body’s production of specific proteins. Recent research aimed to predict this activity, enabling researchers to identify drugs that might counteract the effects of Covid-19.
What’s new: Thai-Hoang Pham and colleagues at The Ohio State University and The City University of New York developed DeepCE, a system designed to predict how particular drugs will influence the amounts of RNA, and therefore the amounts of various proteins, produced by a cell.
Key insight: In machine learning, attention layers learn to represent how the various parts of two input sequences interact with one another. In biology, genes mediate the production of RNA, while drugs can affect the action of genes. Given separate embeddings that represent genes and chemical structures of drugs, attention can capture how a drug affects RNA production.
How it works: Given a drug, a dose, and a line of cells cloned from a particular patient, DeepCE predicts the amount of RNA produced by each of roughly 1,000 genes. (Collectively, this information constitutes a gene expression profile). The training and test data included more than 600 drugs for a total of over 4,000 gene expression profiles from seven human cell lines in the L1000 database.
- The authors used the node2vec method to generate embeddings of proteins in a database of relationships among genes and proteins. From these embeddings, they extracted representation of the genes in L1000.
- A chemical can be represented as a graph in which each node stands for an element in the periodic table. The authors used a convolutional graph neural network to generate embeddings of drugs in L1000. The network represented each node of a given compound based on its surrounding nodes.
- Given the gene and drug embeddings, a multi-headed attention network generated a matrix that represented gene-drug and gene-gene interactions. Given information about drug doses and cell lines in L1000, separate feed-forward networks generated embeddings of these factors.
- A fully connected network accepted all of these representations and learned how to predict RNA production.
Results: The authors compared DeepCE’s predictions with those of several baseline methods using the Pearson correlation coefficient, a measure of the correlation between predictions and ground truth. DeepCE outperformed all of them with a score of 0.4907. The next-best method, a two-layer feed-forward network, scored 0.4270. They also used DeepCE to look for existing drugs that might treat Covid-19. They compared the predictions for more than 11,000 drugs with corresponding profiles of Covid-19 patients, looking for the greatest negative correlations — an indicator that the drug would fight the illness. Of 25 drugs surfaced by DeepCE, at least five already had shown potential as Covid-19 treatments; others had been used for different viruses with similar symptoms.
Why it matters: Complex datasets may have features that aren’t processed easily by a single network. By using a different network for each type of input and combining their outputs, machine learning engineers can extract useful information that otherwise might be inaccessible.
We’re thinking: The next blockbuster antiviral (or antidepressant, anti-inflammatory, or heart medicine) may already be on pharmacy shelves. Wouldn’t it be wonderful if deep learning found it?