Researchers create a neural network for genomics that explains how it achieves accurate predictions
A team of New York University computer scientists has created a neural network that can explain how it reaches its predictions. The work reveals what accounts for the functionality of neural networks—the engines that drive artificial intelligence and machine learning—thereby illuminating a process that has largely been concealed from users.
The breakthrough centers on a specific usage of neural networks that has become popular in recent years—tackling challenging biological questions. Among these are examinations of the intricacies of RNA splicing—the focal point of the study—which plays a role in transferring information from DNA to functional RNA and protein products.
“Many neural networks are black boxes—these algorithms cannot explain how they work, raising concerns about their trustworthiness and stifling progress into understanding the underlying biological processes of genome encoding,” says Oded Regev, a computer science professor at NYU’s Courant Institute of Mathematical Sciences and the senior author of the paper, which was published in the Proceedings of the National Academy of Sciences.
“By harnessing a new approach that improves both the quantity and the quality of the data for machine-learning training, we designed an interpretable neural network that can accurately predict complex outcomes and explain how it arrives at its predictions.”
Regev and the paper’s other authors, Susan Liao, a faculty fellow at the Courant Institute, and Mukund Sudarshan, a Courant doctoral student at the time of the study, created a neural network based on what is already known about RNA splicing.
Specifically, they developed a model—the data-driven equivalent of a high-powered microscope—that allows scientists to trace and quantify the RNA splicing process, from input sequence to output splicing prediction.
“Using an ‘interpretable-by-design’ approach, we’ve developed a neural network model that provides insights into RNA splicing—a fundamental process in the transfer of genomic information,” notes Regev. “Our model revealed that a small, hairpin-like structure in RNA can decrease splicing.”
The researchers confirmed the insights their model provides through a series of experiments. These results showed a match with the model’s discovery: Whenever the RNA molecule folded into a hairpin configuration, splicing was halted, and the moment the researchers disrupted this hairpin structure, splicing was restored.