Mathematics Ph.D. Dissertations

Predicting base conservation scores in RNA 3D structures

Date of Award

2023

Document Type

Dissertation

Degree Name

Doctor of Philosophy (Ph.D.)

Department

Statistics

First Advisor

Craig L. Zirbel (Committee Chair)

Second Advisor

Andrew C. Layden (Other)

Third Advisor

Umar Islambekov (Committee Member)

Fourth Advisor

Junfeng Shang (Committee Member)

Abstract

This dissertation explores the relationship between the local context of a nucleotide in an RNA 3D structure and the extent to which the nucleotide base (A, C, G, U) is conserved across different species. Two datasets are studied, a small dataset from E. coli where the local context is described in terms of human-annotated interactions such as base pairs and base stacking, and a large dataset in which the local context is divided into 675 grid cells and the number of atoms in each cell forms a high-dimensional predictor. The response variable in both cases is the proportion of species having the most common base at that location. For both datasets, random forest, neural network, and other models are fit and evaluated. This makes it possible to identify which predictor variables are most informative, and that in turn tells which features of the local context most constrain base conservation, which are interpreted from both a statistical and biological point of view. Poorly predicted nucleotides are labeled and explained.

Share

COinS