Towards Automating Structural Analysis of Complex RNA Molecules and Some Applications In Nanotechnology
RNA has emerged as a versatile and multi-faceted player in gene expression and the informational metabolism of living cells. RNA molecules can function by virtue of their sequences, storing or transmitting genetic information, as well as by forming complex three-dimensional (3D) structures that can bind specifically to proteins, small molecules or other RNA or DNA molecules to carry out diverse recognition functions, including chemical catalysis. In fact, large RNA-based molecular machines, like the ribosome and spliceosome, are now known to be RNA enzymes or “ribozymes” that rely on complex RNA 3D structure to sequentially bind and release their macro-molecular substrates and co-factors. As a result of revolutions in RNA 3D structure determination and high-throughput DNA and RNA sequencing, on-line databases are brimming with new structure and sequence data. The large amounts of new data are creating new challenges in data management, curation, search, visualization, and access. For example ribosomes have been solved in many different functional states, with tRNAs variously bound to the A-, P-, or E-sites, or associated with different translation factors (i.e. initiation, elongation, termination or recycling factors) or antibiotics. Bound tRNAs may be cognate, near cognate or non-cognate to the bound mRNA codon sequences present at the A-, P- or E-sites of the ribosome. Detailed and accurate functional annotations are needed to enable focused database searches for specific states and bound ligands and to uncover new relationships regarding structure, function, and evolution of RNA molecules and their complexes. As large numbers of new structures are accumulating in databases faster than they can be manually annotated, automated annotation procedures need to be developed and deployed by databases such as the Nucleic Acid Database (NDB). In addition to annotation of individual structures, related structures must be identified, compared and clustered, and representative structures chosen for detailed analysis. To facilitate research, clustering should be dynamic and user driven. The unifying theme of this dissertation is to develop new conceptual frameworks and analytical approaches to assess and improve the automated annotation and analysis of RNA 3D structures, and to connect these data to structural changes in functional state and evolutionary changes. More specifically, the dissertation focuses on 3D structures of ribosomal RNA (rRNA) from the large (LSU) and small (SSU) subunits of ribosomes, and the hairpin and internal loop motifs extracted from them.
As a first step toward automated annotation of the functional states of ribosome structures, a manual analysis was carried out of all atomic-resolution, 3D structures of the SSU of the bacterium T. thermophilus found in the NDB as of April, 2014. The data set used corresponds to the structure equivalence class NR_4.0_81883.24 of release 1.56 of ribosome structures posted in the NDB (see http://rna.bgsu.edu/rna3dhub/nrlist). Each structure was manually examined to determine the functional annotations for the state of the ribosome with all bound tRNAs, mRNAs and other factors. These data were combined with meta-data downloaded from NDB in a single spreadsheet found at this link: http://tinyurl.com/16S-T-Thermophilus-summary.
NDB maintains a Non-redundant (NR) set of structures that is updated each week and used to construct the 3D Motif Atlas of RNA hairpin and internal loop motifs (http://rna.bgsu.edu/rna3dhub/motifs), updated monthly. To assess the quality of motif clustering in the Motif Atlas, links and meta-data were downloaded for all motif instances in release 1.14 (http://rna.bgsu.edu/rna3dhub/motifs/release/IL/1.14). Motifs were grouped by molecule and location within each homologous molecule to determine whether homologous motifs were clustered in the Motif Atlas in the same or similar structural groups (identified by motif ID). The focus of analysis was rRNA for which multiple instances from different organisms are available. Overall, motifs found in conserved regions of the rRNAs were placed in the same motif groups by the automated clustering. Large RNA structures like the ribosome are organized hierarchically into subunits, domains, stacked helical units and individual helices and their component hairpin and internal loop motifs. Different parts of the structure are responsible for binding mRNA, tRNA and translation factors, but all must work together to correctly translate the mRNA. For example, the amino-acyl end of a tRNA should not be allowed into the A-site of the 23S rRNA unless the 16S rRNA detects cognate binding between the anti-codon end of the tRNA and the mRNA codon present in the decoding site, detection done in helix 44. Thus, information must flow between different parts of the ribosome to report on the state of one part that is relevant to another part. How this happens in complex RNA machines is an area of active research. One hypothesis is that networks of RNA tertiary and quaternary interactions play a central role in the ribosome. As a first step to automatically detecting and comparing networks of RNA-RNA interactions in the ribosome, a new module was created for the FR3D (“Find RNA 3D”) suite of RNA analysis tools. The module takes as input CSV formatted data, listing the manually constructed, hierarchical assignments of each nt in an RNA 3D structure, and the atomic-resolution PDB structure of the molecule. The program outputs a table with the number of interactions between elements, individual PDB files for each pair of interacting elements for visual inspection, and a text file with the list of interactions between elements. Thus, it is possible to analyze large structures automatically.