Mathematics Ph.D. Dissertations
Predicting base conservation scores in RNA 3D structures
Date of Award
2023
Document Type
Dissertation
Degree Name
Doctor of Philosophy (Ph.D.)
Department
Statistics
First Advisor
Craig L. Zirbel (Committee Chair)
Second Advisor
Andrew C. Layden (Other)
Third Advisor
Umar Islambekov (Committee Member)
Fourth Advisor
Junfeng Shang (Committee Member)
Abstract
This dissertation explores the relationship between the local context of a nucleotide in an RNA 3D structure and the extent to which the nucleotide base (A, C, G, U) is conserved across different species. Two datasets are studied, a small dataset from E. coli where the local context is described in terms of human-annotated interactions such as base pairs and base stacking, and a large dataset in which the local context is divided into 675 grid cells and the number of atoms in each cell forms a high-dimensional predictor. The response variable in both cases is the proportion of species having the most common base at that location. For both datasets, random forest, neural network, and other models are fit and evaluated. This makes it possible to identify which predictor variables are most informative, and that in turn tells which features of the local context most constrain base conservation, which are interpreted from both a statistical and biological point of view. Poorly predicted nucleotides are labeled and explained.
Recommended Citation
Bulbul, Gul Bahar, "Predicting base conservation scores in RNA 3D structures" (2023). Mathematics Ph.D. Dissertations. 91.
https://scholarworks.bgsu.edu/math_diss/91