Mathematics Ph.D. Dissertations

Title

Inferring RNA 3D Motifs from Sequence

Date of Award

2019

Document Type

Dissertation

Degree Name

Doctor of Philosophy (Ph.D.)

Department

Statistics

First Advisor

Craig Zirbel (Advisor)

Second Advisor

Paul Morris (Other)

Third Advisor

James Albert (Committee Member)

Fourth Advisor

Maria Rizzo (Committee Member)

Fifth Advisor

Junfeng Shang (Committee Member)

Abstract

An outstanding problem in molecular biology is the prediction of the 3D structure of RNA molecules based on the sequence of the RNA. An important step toward prediction of full RNA 3D structures from sequence is predicting the 3D structures of the non-helical regions, which are often referred to as loop regions. We have developed a methodology for modeling the sequence variability of known RNA 3D loop structures, using data from the RNA 3D Motif Atlas. Our models are stochastic context free grammars (SCFGs) that utilize Markov random fields (MRFs) where necessary. The models are parameterized based on the geometry of the pairwise interactions in the loop 3D structure as well as the sequences that have been observed making the structure in 3D, with the result that a reasonable model can be generated using only one sequence variant observed forming the 3D loop structure. Work has also been done to measure and compare how these sequence variability models overlap in sequence space.

We have developed a software package in which these models for the sequence variability of RNA 3D loop structures can be quickly and automatically generated. The software, called JAR3D, is available on Github for download, and a web server and a command line tool by the same name is publicly available. There are a variety of applications for the JAR3D package. It can be used to align loop sequences to a particular known 3D loop geometry, as well as accept or reject a loop sequence as a viable candidate to form a particular geometry. JAR3D can also be used to address a matching problem: given a novel loop sequence, which known 3D geometry, if any, is the sequence likely to form? This matching problem use case is not addressed by current tools for RNA 3D structure prediction, and is a new addition to the field.

COinS