Title

DEVELOPING TOOLS FOR RNA STRUCTURAL ALIGNMENT

Date of Award

2006

Document Type

Dissertation

Degree Name

Doctor of Philosophy (Ph.D.)

Department

Biological Sciences

First Advisor

Neocles Leontis

Abstract

This work addresses current problems of RNA sequence alignment and describes different tools for solving them. RNA molecules form basepairs that fold the molecule into its secondary and tertiary structures. These structures are more conserved in evolution than primary sequence because they directly affect the function of the molecule. Thus, sequence alignment of RNA molecules, unlike that of other biological molecules, must proceed by aligning homologous pairs of positions to each other. The state of the art methods used today for aligning RNA are based on Stochastic Context Free Grammars (SCFG). These methods are able to characterize and thus align nested RNA basepairs, but are incapable of dealing with crossing basepair patterns. In addition, the current application of this algorithm ignores 3D structure and thus deals best with only one type of basepairs. Although this type (the cis Watson-Crick/Watson-Crick or cWW type) is the most common in RNA, there are at least eleven other families of basepairs that account for about one third of RNA interactions. Each of these families has its own structural dimension, and therefore its own patterns of accepted isosteric substitutions in sequence alignments. Here, an RNA alignment analysis and evaluation tool that takes into consideration 3D structure with all types of basepairs is described. This tool is used to structurally evaluate alignments and locate errors in them. A discussion and classification of tertiary interactions of the G/U wobble basepair is then presented. Novel conserved interactions are discovered, and their sequence signatures are used to further enhance sequence alignments. Finally, a better SCFG approach for automatic RNA alignment is suggested and tested. This approach takes into consideration the 3D structure of all families of basepairs. It is also coupled with another theory, Markov Random Field (MRF), to align areas where crossing basepairs occur.