RNA secondary structure design

Bernd Burghardt; Alexander K Hartmann

doi:10.1103/PhysRevE.75.021920

RNA secondary structure design

Phys Rev E Stat Nonlin Soft Matter Phys. 2007 Feb;75(2 Pt 1):021920. doi: 10.1103/PhysRevE.75.021920. Epub 2007 Feb 28.

Authors

Bernd Burghardt¹, Alexander K Hartmann

Affiliation

¹ Institut für Theoretische Physik, Universität Göttingen, Friedrich-Hund-Platz 1, D-37077 Göttingen, Germany. burghard@physik.uni-goe.de

PMID: 17358380
DOI: 10.1103/PhysRevE.75.021920

Abstract

We consider the inverse-folding problem for RNA secondary structures: for a given (pseudo-knot-free) secondary structure we want to find a sequence that has a certain structure as its ground state. If such a sequence exists, the structure is called designable. We have implemented a branch-and-bound algorithm that is able to do an exhaustive search within the sequence space, i.e., gives an exact answer as to whether such a sequence exists. The bounds required by the branch-and-bound algorithm are calculated by a dynamic programming algorithm. We consider different alphabet sizes and an ensemble of random structures, which we want to design. We find that for two letters almost none of these structures are designable. The designability improves for the three-letter case, but still a significant fraction of structures is undesignable. This changes when we look at the natural four-letter case with two pairs of complementary bases: undesignable structures are the exception, although they still exist. Finally, we also study the relation between designability and the algorithmic complexity of the branch-and-bound algorithm. Within the ensemble of structures, a high average degree of undesignability is correlated with a long time to prove that a given structure is (un-)designable. In the four-letter case, where the designability is high everywhere, the algorithmic complexity is highest in the region of naturally occurring RNA.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms*
Base Sequence
Computer Simulation
Models, Chemical*
Models, Molecular*
Models, Statistical
Molecular Sequence Data
Nucleic Acid Conformation
RNA / chemistry*
RNA / ultrastructure*
Sequence Analysis, RNA / methods*

Substances

RNA