Title

Base Triples in RNA 3D Structures: Identifying, Clustering and Classifying

Date of Award

2011

Document Type

Dissertation

Degree Name

Doctor of Philosophy (Ph.D.)

Department

Biological Sciences

First Advisor

Neocles Leontis, Professor

Second Advisor

Scott Rogers, Professor (Committee Chair)

Third Advisor

Alexander Tarnovsky, Assistant Professor (Committee Member)

Fourth Advisor

Paul Morris, Associate Professor (Committee Member)

Fifth Advisor

Weidong Yang, Assistant Professor (Committee Member)

Abstract

Base triples are recurrent sets of three interacting RNA nucleotides that hydrogen bond to each other along their base edges. Base triples occur frequently in structured RNAs, usually as parts of recurrent structural motifs or of tertiary interactions between parts of the RNA that are distant in the secondary structure.

In almost all base triples, the central base interacts with each of the other bases of the triple to form two base pairs. As described in previous work by Leontis and Westhof, RNA nucleotides pair along their Watson-Crick (W), Hoogsteen (H), or Sugar (S) base edges and the resulting base pairs can be classified by identifying the interacting edges and the mutual orientations of the glycosidic bonds of the interacting nucleotides. The main goal of the present work is to test the hypothesis that the geometric base pair classification of Leontis and Westhof can be extended in a natural way to classify base triples geometrically.

To test this hypothesis, the research leveraged the large number of RNA molecules that have been solved to atomic resolution by x-ray crystallography and deposited in the Protein Data Bank (PDB). Using a non-redundant (NR) data set of atomic-resolution, RNA-containing x-ray structures, a comprehensive survey of base triples was carried out in parallel with a combinatoric enumeration of base triple families predicted on the basis of the 12 geometric base triple families defined by the Leontis-Westhof nomenclature. This work predicts 108 potential geometric base triple families. Comprehensive searching of atomic-resolution RNA 3D structures found instances of 68 of the 108 predicted base triple families. Model building was carried out to determine which of the remaining 40 families are sterically feasible. Significantly, the proposed classification of base triples accounts for all but a handful of base triples observed in the structure database. The exceptions are intermediate cases in which two bases form a Watson-Crick pair and the third base interacts with both bases along their Hoogsteen or Sugar edges, without making a bona fide base pair with either base.

An on-line resource was developed based on this classification of base triples that provides exemplars of all the base triples observed in the structure database and models for unobserved, predicted triples, grouped by base triple family, as well as by three-base combinations (see http://rna.bgsu.edu/Triples). This classification helps to identify recurrent triple motifs which can substitute for each other while preserving RNA 3D structure. It can also assist in producing RNA sequence phylogenies. Moreover, the classification is useful for improving methods for aligning evolutionarily related RNA sequences and modeling RNA 3D structures based on sequence alignments, including applications of RNA 3D structure prediction, such as predicting neutral and lethal mutations in structured RNA molecules.