Luck R; Steger G; Riesner D Institut fur Physikalische Biologie, Heinrich-Heine-Universitat, Dusseldorf, FRG. J Mol Biol 258: 813-26 (1996)An algorithm for prediction of conserved secondary structure of single-stranded RNA is presented. For each RNA of a set of homologous RNAs, optimal and suboptimal secondary structures are calculated and stored in a base-pair probability matrix. A multiple sequence alignment is performed for the set of RNAs. The resulting gaps are introduced into the individual probability matrices. These homologous probability matrices are summed to give a consensus probability matrix emphasizing the conserved secondary structure elements of the RNA set. Thus the algorithm combines the advantages of thermodynamic structure prediction by energy minimization with the information obtained from phylogenetic alignment of sequences.
The algorithm is applied to three examples. The REV-responsive element of HIV, the structure of which is well known from the literature, was chosen to test the algorithm. The second example is the 3' terminal segment of genomic single-stranded RNAs of cucumber mosaic viruses; a structure similar to that of the related brome mosaic virus was expected and was confirmed. The third example is the prion-protein mRNA from different organisms; the structure of this mRNA is not known. By application of the algorithm highly conserved hairpins were found in the prion-protein mRNA. Introduction
Abbreviations used: bp, base-pair; nt, nucleotides; ssRNA, single-stranded RNA; ORF, open reading frame; UTR, untranslated region
PrP is the major component of prions (for review see Prusiner, 1994), which cause several neurodegenerative diseases in humans and animals, including scrapie in sheep, bovine spongiform encephalopathy (BSE) in cattle, and Creutzfeldt-Jakob disease (CJD) in man. During prion infection, an abnormal isoform of PrP, in the case of scrapie designated PrP Sc , is produced from the cellular isoform of PrP C , which is encoded by a chromosomal gene, by an unknown process. In spite of intensive studies, differences between the chemical compositions of PrP C and PrP Sc could not be found (Stahl et al., 1993). Thus, a mere conformational change as the origin of the transformation of PrP C into PrP Sc is an attractive model (for review see Prusiner, 1994). It might be that putative structural elements in the PrP mRNA influence the kinetics of sequential folding of the protein during translation, and that the process of infection acts via those structural elements. A potential stemloop structure in one PrP mRNA, i.e. that of man, has been discussed (Wills & Hughes, 1990).
If, however, structural elements of the PrP mRNA are considered as functionally relevant for the development of all prion diseases, this feature must be found in every species that is susceptible to this class of disease. Therefore the new algorithm will be applied to check PrP mRNA for evolutionarily conserved secondary structures.
Prediction of structural elements in mRNA of prion protein
From the EMBL data bank, 23 PrP mRNA sequences of the following species were available: three rodents (mouse, hamster, rat), two ruminants (cattle, sheep), the human and 17 non-human primates consisting of four apes (gorilla, chim-panzee, gibbon, orangutan), seven old-world mon-keys (colobus, presbytis, baboon, mandrill, rhesus macaque, Macaca arctoides, African green monkey), and six new-world monkeys (spider, squirrel, capuchin, aotes, marmoset, titi). Calculations of individual secondary structures of these molecules with the programs RNAfold and LinAll revealed that thermodynamically stable elements of sec-ondary structure are located mainly in the 5' region including the ORF. In contrast, the 3' UTR with a lower G + C content (41% G + C versus 53% G + C in the ORF of the hamster sequence) is mainly single stranded. As examples, the calculated secondary structures of human and hamster PrP mRNAs are shown in Figure 5.
Alignment of the 23 sequences with CLUSTAL V (Higgins & Sharp, 1988; Higgins, 1994) showed considerable homology in an 1100 nt fragment (individual sequences starting between nt -42 and -10 and ending at nt 788 to 1039; numbering relative to the start of the ORF). These fragments that contain the ORF were chosen for further studies. They have a length of 735 to 795 nt and showed a high degree of homology (74 to 97%). The secondary structure distributions of the individual fragments were calculated with RNAfold at 50C. With the alignment mentioned above, the consensus base-pair probability matrix was calculated and is presented as dot plot in Figure 6.
Surprisingly, the dot plot shows that PrP mRNAs contain conserved structural elements at only very few positions despite both the high degree of sequence conservation and the high degree of base-pairing as predicted for the individual PrP mRNAs (Figure 5). From this dot plot we identified as candidates of conserved structural elements five hairpins (HP A to E); their occurrence in the different organisms is listed in Table 1. Only HP C is present in all sequences and shows a high probability of base-pairing in most sequences. Owing to nucleotide exchanges the detailed structure of the HP C stem differs between the groups of species (Figure 7); a tendency exists, however, to preserve or to restore all base-pairs. In particular the loop of HP C is highly conserved and has the sequence 5'-ACCHCA-3' with H being a pyrimidine, except in the capuchin and the marmoset sequences. HP C is located at nt 140 to 165 (nt 149 to 174 for the ruminants) relative to the start of the ORF. On the amino acid level the triplet 5'-CCH-3' of the HP C loop represents the first amino acid of the octarepeat region, which consists of five nearly perfect tandem repeats of eight amino acids.
Figure 5. Schematic drawings of the optimal secondary structure of human (left) and hamster (right) PrP mRNA as predicted by RNAfold at 50C.
Figure 6. Consensus dot plot of 23 PrP mRNA fragments including the ORF. The individual base-pair dot plots were calculated with RNAfold at 50C and overlaid according to the alignment produced by CLUSTAL. Size of dots is proportional to the cubic power of the base-pair probability (a = 3, b = 1); thus helices of low probability are suppressed. Hatched areas show the gaps in the alignment; with increasing darkness gaps are present in an increasing number of sequences. Hairpins marked HP A to E have a high probability of base-pairing and are conserved.
Table 1. Comparison of individual base-pairing probabilities of five hairpins (HP A to E) conserved in PrP mRNA Species HP Rodents Cattle, sheep Man, apes OW monkeys NW monkeys A +++ - - +(-)a +(-)b B ++ +++ +(++)a ++ - (+)a C +++(++)a + +++(++)a +++ +++(++)b D - ++ +++ +++ +++(++)a E ++(+)a - +++ +++ +++(++)a The hairpins are present with a base-pairing probability of at least 0.8 (+++), in between 0.03 and 0.8 (++), 0.001 and 0.03 (+) and less than 0.001 (-). a A second value in parenthesis represents one species that differs from the rest of the group. b A second value in parenthesis represents two species that differ from the rest of the group.
Figure 7. Schematic drawings of hairpin C of PrP mRNAs:
According to the widely accepted prion hypothesis (Prusiner, 1982), formation of the isoform PrP Sc from the cellular form PrP C is the basic event for prion infectivity and pathology. The molecular mechanism for the formation of PrP Sc is not known. There is no evidence for the existence of a specific nucleic acid essential for prion infectivity (for review see Riesner, 1991; Kellings et al., 1992), and no chemical difference between the two isoforms could be found (Stahl et al., 1993). Thus, it was suggested that the two PrP isoforms differ only in protein conformation (for review see Prusiner, 1994). Although the measurable differences in conformation between PrP C and PrP Sc are generated in a slow post-translational process, the possibility was never excluded that the nucleus for such a transition, which could comprise only a few PrP molecules, had already been formed during the translation process and that the PrP mRNA structure might have an influence on protein structure formation.
In particular, a conservation of structural motifs among the PrP mRNAs of susceptible species may point to a potential biological function for these motifs. Even if PrP mRNA structure were not involved in the mechanism of infection or pathogenesis of prions it could play a part in the regulation of PrP C expression. In either case, this problem seemed attractive to us for the application for the new algorithm. The minimal energy foldings of the individual PrP mRNAs revealed a high degree of structure formation inside the ORF and a much less structured 3' UTR as a consequence of the higher A + U content. In contrast to both the high degree of secondary structure in the individual mRNA sequences and the high degree of sequence homology, only a very few structural elements were found to be conserved in the ensemble of PrP mRNAs. This feature appears remarkable if the properties of PrP mRNA are compared with those of CMV RNA.
Although the sequence homology of all CMV RNAs was only 72 (213)%, a highly conserved over-all structure could be derived; nothing comparable was found in PrP mRNA in spite of 86(26)% homology in the ORF. Therefore, we have to conclude that a detailed PrP mRNA structure like those shown in Figure 5 is not of biological importance, and only the evolutionarily conserved hairpin structures (HP A to E) should be considered. Three of them (HP A to C) are located in the ORF in a cluster 5' of the octarepeat region, and two (HP D and E) are further downstream close to the stop codon.
In earlier studies of the human PrP mRNA sequence (Wills & Hughes, 1990; Wills, 1992), it was proposed that the octarepeat region contains structural elements with possible relevance for development of prion diseases, and binding of cellular proteins to this region was reported (SchroĈ der et al., 1994). In the context of evolutionary conservation these structures could not be confirmed. Therefore it is unlikely that those elements act as structural targets at the RNA level. While investigating RNA secondary structure conservation inside the coding region of mRNA, one has to take into account the influence of amino acid conservation. Is a conserved mRNA structural motif a mere coincidence of codon usage for the amino acid sequence? This could be tested either by looking for compensatory base exchanges in stem regions or by comparing the general codon usage of an mRNA with the codons used in the structural element.
While conserving the amino acid sequence, compensatory base exchanges in a structural element are only possible if certain base-pairs are formed between wobble positions of codons. This is not the case with HP C (see symbols in Figure 7). An investigation of codon usage in HP C revealed that in the 5' part of the stem common codons are mostly used. In contrast, the 3' part of the stem, which contains three glycine codons, uses GGU in 70% of all codons although this codon is generally used below 30%. In addition, there is an obvious selection pressure for the wobble base in the first proline codon of the loop. It is coded by CCA in all 23 sequences despite approximately 20 % codon usage, whereas in the adjacent proline CCU, CCC and CCA are used. Therefore, we conclude that the conserved nucleic acid structures in the ORF are not a coincidence from the codon usage but are a product of selection for a nucleic acid structure.
Among the five conserved hairpins, HP C is present with a high base-pairing probability in most of the studied sequences. In contrast to the slight variability of the HP C stem, its loop region is formed by the highly conserved sequence 5'-AC-CHCA- 3'. Thus the stem of HP C might be responsible only for exposing the conserved loop region from the global molecular structure, as was suggested for stem 1 of RRE (see above). The loop may serve as the essential target element for binding to cellular proteins or even to the infectious isoform of PrP. Another potential function of HP C (and of the other conserved hairpins) during the develop-ment of prion diseases and the formation of infectious PrP Sc might be correlated to the general influence of mRNA secondary structure on protein conformation.
The in vivo folding of a polypeptide chain into its native three-dimensional structure is a sequential process (for review see Jaenicke, 1993). It was argued that the conformation of some proteins depends on the rate and kinetics of translation, which is influenced by the presence and stability of structural elements of the translated mRNA (Purvis et al., 1987). Translation initiation is most sensitive to structures close to the entry site of ribosomes (Kozak, 1989), but also structures located further 3' in the coding region still reduce translation efficiency (Liebhaber et al., 1992). A modulation of protein folding by translational pauses resulting from mRNA secondary structure and codon usage has been proposed for the MS2 coat protein of Escherichia coli (Guisez et al., 1993).The examples mentioned from the literature suggest that the conserved secondary structure elements in PrP mRNA might well affect translation and thereby the conformation of PrP.
Edited by R. Huber
(Received 5 September 1995; received in revised form 26 February 1996; accepted 4 March 1996)
Supplementary material for this paper, comprising one Figure, is available from JMB Online.