Prion gene good for something after all?
Knockouts vs knockouts: is the prion gene essential after all?
Phenotypic GSS and new mutations D202N and Q212P
Hydrogen bond donors and acceptors
Nematode genome 85% complete: no prion yet
Triplet repeat diseases: innocent inclusions?
CpG codon depletion
22 Oct 98 webmasterThe prion gene may be good for something after all -- helping resolve a vexatious issue in mammalian evolution.
Eutherian mammals are thought to have experienced a very rapid radiation roughly 100 million years ago (mya) into the various taxonomic orders of today such as rodents, primates, carnivores, and ruminants. Distant events long ago are difficult to resolve by aligning sequence data because fixed mutational changes are rare during the brief window critical to tree topology. If the radiation considered here took place over 1 million years, the 1:99 ratio of branch lengths in the phylogenetic tree implies observed change will be overwhelmingly post-radiation and irrevelent.
For many genes, eg cytochrome oxidase or DNA polymerase, there is no reason whatsoever to expected a favorable acceleration of rate of change during the divergence window -- these genes have the same function whatever the skeletal morphology or habitat niche so do not experience a slackening of selective pressure (though founder drift effects could be enhanced). Recent work in hox genes further illustrate that remarkable changes in morphology can arise from a few modest point mutations or duplications in genes that direct development.
In the prion gene and many others, the rate of evolutionary change varies markedly by codon, by as much as a factor of 50. Stretches such as AGAAAAGA see no change accepted for more than 310my in any lineage; at the other extreme, serine- asparagine toggle codons have fixed changes in many disparate lineages with a characteristic time scale of 10my and are often seen in extant species as alleles. Other codons, such as the ancestral tryptophan in DWEDRY, exhibit synapomorphic changes, here to tyrosine (a 2bp change with DCEDRY as probable intermediate, seen only in post-guinea pig rodents).
Thus the situation is really worse than the first two factors [branch fraction, steady rates] suggest: the mutations most likely to occur during the critical 1mya window are in codons with highest rates of evolution -- exactly those which are likely to be over-written numerous times subsequently. In the main prion gene, the codon most likely to change in 1my is GGHNQW. Following any particular lineage forward in time results in several changes. Since the only data is from extant species, nothing reliable can then be inferred about the period of interest 100mya.
Codons with phylogenetic signal relative to the mammalian radiation are thus rare because of conflicting requirements -- a slowly evolving codon needs to have changed during a particular narrow window. If the rate of change by codon position is plotted, codons can be weighted for relevence by convolution of the Fourier transform with a time scale semi-gaussian for the polytenic node in question, ie, by low band-pass Fourier filtration.
Codons that change too fast are effectively discarded; codons that change too slowly are moot. It is sometimes hypothesized that rate of change is regionaly smooth, yet adjacent amino acids in an alpha helix do not have their side chains in proximity -- one may be a structurally critical interior residue, the next a weakly constrained polar surface residue. In the prion gene, 256 codons are quickly reduced to a handful of potentially synapomorphic positions because there are many invariant residues and additionally rapidly changing positions. The key idea here is to not treat good information (codons with an appropriate rate of change) on an equal footing with mediocre information (codons experiencing multiple hits).
This approach is equally applicable to mutational sites involving insertions and deletions (called indels when the event cannot be resolved) in genes that are not evolving chaotically. However, those indels involving the tandem repeats and oligo-glycines in the prion gene are the analogue of ser-asn toggle codons: they change too fast to have applicability to 100my time scales. Specific indels are rare to begin with; to occur twice in separate lineages or to revert has the effect of squaring an already miniscule probability. (Tandem repeats have special structural features that enhance these rates through replication slippage; retrotransposons raise other special issues.)
Fortuitously, the prion gene contains the perfect indel relative to the mammalian radiation in its signal region. The event, most parsimoniously explained as a 6 bp deletion (two codons so no frameshift), cleanly separates rodents, primates, lagomorphs from the ancestral lineage (marsupial sequence) and ferungulates (ruminants, cetaceans, carnivores, perissodactyls). In other words, the deletion establishes the existence of a common ancestor to the mouse-human-rabbit lineage not shared by the cow-mink-horse lineage.
MA--NLGYWLLALFVATWTDVGLCKK rodents (8 species) MA--NLGCWMLVLFVATWSDLGLCKK great apes (6 species) MA--NLGCWMLVLFVATWSDLGLCKK old world monkeys (15 species) MA--NLGCWMLVLFVATWSDLGLCKK new world monkeys (8 species) MA--HLGYWMLLLFVATWSDVGLCKK lagomorphs (1 species) MVKSHIGSWILVLFVAMWSDVGLCKK artiodactyl (20 species) MVKSHIGSWLLVLFVATWSDIGFCKK carnivores (6 species) MVKSHIGSWILVLFVAMWSDVGLCKK perissodactyls (3 species) MGKIQLGYWILVLFIVTWSDLGLCKK marsupials (1 species)
Second, the 26-residue signal region has been quite stable to point mutation, almost comparable to mature protein. All genetic change accepted in the signal region in the last 100my in 80 species can be explained by a dozen or so point mutational events and the indel under consideration here; another few point mutations accommodates the marsupial divergence at 178my. There is little singlet change (where sequencing experimental error is concentrated in any case), a few conservative toggle codons, and a limited number of deeper synapomorphic changes all consistent with seldom-disputed aspects of the phylogenetic tree (assumed here throughout). For example, ancestral tyrosine at position 8 has gone to cysteine in old world monkeys-great apes and to 4-codon serine in ferungulates.
(Serine is unique in having 6 codons not in the same column of the standard genetic code; direct change requires two base changes and seldom occurs; threonine and cysteine, at the intersections of serine rows and columns, usually mediate change. The effect results in relatively frozen 4-codon and 2-codon serine, giving constraints on toggle opportunities; 2-codon serine can be safely inferred from ser-asn toggles, 4-codon serine from ser-ala-thr.)
Next, note that the indel event can be simply explained by an insertion or deletion involving codons 3 and 4. Two similar scenarios also work, with the indel beginning at position 2 or 3 of codon 2. This region has seen very little change at silent codon positions.
MA--NLG primates MA--NLG rodents MA--HLG lagomorph MVKSHIG ferungulates MGKIQLG marsupial MARLLTT chicken m a NH l g ATG GCG --- --- CAC CTC GGC rabbit ATG GCG --- --- AAC CTT GGC primates ATG GCG --- --- AAC CTT GGC rodents ATG GTG AAA AGC CAC ATA GGC ferung ATG GGA AAA ATC CAA TTG GGA marsup ATG GCT AGG CTC CTC ACC ACC chicken m gv k is hq il g
Alignment programs such as ClustalW or Blast often do not gap correctly in this region. This error then trickles down through research papers on the rate of change in the prion gene (or of nuclear genes in general) when alignments are not hand-gapped. The effect is not trivial when compounded with gross errors regarding the octarepeat region, because illusory changes can then quantitatively dominate the picture of prion gene evolution. Note that it is most unclear at both the DNA and protein level which residues are still homologous (or even what homologous means) because of the split between function and descendancy.
Chicken and marsupial, safe outgroups, have 26 residues in the signal region, as do all ferungulates. Despite the great span of time, they align quite well with conservative single base changes needed for concordance. This argues for the indel being a deletion within the rabbit-primate-rodent lineage, rather than separate insertions within birds, marsupial, and ferrungulates. Further support could be obtained by sequencing mamalian orders not represented in the data, such as basal sloths: all are predicted to have 26 residues in the signal region.
MA--NLGYWLLALFVATWTDVGLC-KK rodent
MA--HLGYWMLLLFVATWSDVGLC-KK rabbitt
MA--NLGCWMLVLFVATWSDLGLC-KK sq monkey
MVKSHIGSWLLVLFVATWSDIGFC-KK mink
MVKSHIGSWILVLFVAMWSDVGLC-KK deer
MGKIQLGYWILVLFIVTWSDLGLC-KK marsup
MARLLTTCCLLALLLAACTDVALS-KK bird
MVKSHLGYWILVLFVATWSDVGLC-KK ancestral placental mammal
Another region where an indel synapomorphy seems to cleanly separate rodent-primate-lagomorph from ferungulates is in the terminal octapeptide repeat, which is always a nonapeptide in ferungulates but never in the other group. A tetra-glycine becomes a tri-glycine. The marsupial also has a nonapeptide but is not overwhelming in its similarity otherwise.. The first and final repeats are not subject to erasure and over-writing like middle repeats by the nature of the slippage mechanism.
In conclusion, [(rodent-primate-lagomorph), ferrungulate), marsupial] is the only tree with topology consistent with the signal and repeat region deletions. There are further synapomorphic codons but they simply confirm agreed-upon divisions such rodent-primate or ferrungulate-others. Ruminants cannot be separated from carnivores with protein sequences from this region. The rabbit node cannot be placed -- there is little value in long branch sequencing unless taken in 3's to suppress singlets, eg [(rabbit, hare), pika]. Two similarly chosen marsupials would be of even more value; [(opossum, kangaroo), monotreme] works, again because the topology is certain and the branches are not too short. Prion researchers have squandered immense resources sequencing too closely related taxa.
Here is a very curious recent paper on this same topic concerning a mitochondrial gene that also goes against the grain:
Cao Y, Janke A, Waddell PJ, Westerman M, Takenaka O, Murata S, Okada N, Paabo S, Hasegawa M J Mol Evol 1998 Sep;47(3):307-22"The phylogenetic relationship among primates, ferungulates (artiodactyls + cetaceans + perissodactyls + carnivores), and rodents was examined using proteins encoded by the H strand of mtDNA, with marsupials and monotremes as the outgroup. Trees estimated from individual proteins were compared in detail with the tree estimated from all 12 proteins (either concatenated or summing up log-likelihood scores for each gene). Although the overall evidence strongly suggests ((primates, ferungulates), rodents), the ND1 data clearly support another tree, ((primates, rodents), ferungulates).
To clarify whether this contradiction is due to (1) a stochastic (sampling) error; (2) minor model-based errors (e.g., ignoring site rate variability), or (3) convergent and parallel evolution (specifically between either primates and rodents or ferungulates and the outgroup), the ND1 genes from many additional species of primates, rodents, other eutherian orders, and the outgroup (marsupials + monotremes) were sequenced. The phylogenetic analyses were extensive and aimed to eliminate the following artifacts as possible causes of the aberrant result: base composition biases, unequal site substitution rates, or the cumulative effects of both.
Neither more sophisticated evolutionary analyses nor the addition of species changed the previous conclusion. That is, the statistical support for grouping rodents and primates to the exclusion of all other taxa fluctuates upward or downward in quite a tight range centered near 95% confidence. These results and a site-by-site examination of the sequences clearly suggest that convergent or parallel evolution has occurred in ND1 between primates and rodents and/or between ferungulates and the outgroup. While the primate/rodent grouping is strange, ND1 also throws some interesting light on the relationships of some eutherian orders, marsupials, and montremes. In these parts of the tree, ND1 shows no apparent tendency for unexplained convergences."
This sounds like a prescription that would tolerate rapid rates of change yet it does not. It also sounds like a prescription for convergent evolution or for building exportable proteins by swapping in a universal signal domain (analogue of the Rossmann fold for nucleotides). So why, on a Blast search against an 850,000,000 bp data set, do prion queries only return other prion signal regions?
The answer may be that signal peptides are very ancient, dating back to the divergence with eubacteria -- there has been a great span of time in which to diverge. Convergent evolution does not act in this instance to drive the domain to a universal linear sequence, rather to a common generic property pattern within an immense sequence space (20 to the 26th power). There may not be many proteins with 'new' signal peptides; the source may be existing signal proteins through gene duplication and divergence.
Note: why use INDEL for INsertion or DELetion? Because when aligning sequences, one often sees that a gap has to be introduced. That doesn't necessarily mean the shorter sequence had a deletion; the longer sequence might equally have had an insertion. In many situations it is not possible to resolve the issue. So rather than call it 'insertion or deletion' which is too cumbersome for constant use, or call it 'deletion' which is biased, people went for a neutral term, indel.
The indel in the signal region of the prion protein happened to be resolvable and was a deletion. Resolution is only probablistic: 1 rare event is a whole lot better than 2 rare events, eg the ancestral signal could be 24 aa and the marsupial and ferungulate lineages could each have had the same 2 aa insert in the same spot while mouse-human stayed at ancestral length. Resolution is also predictive: 3-toed sloths, elephants, platypus prions etc. will have 26 aa. Guinea pig is likely 24 aa but could go either way, depending on exactly when it branched off relative to the deletion event in the common ancestor of rodent-primate-rabbit not shared by artiodactyl-carnivore.
Synapomorphy is one of many learned-sounding terms in newer taxonomic theory that do not convey any meaning per se. however, these terms end up being convenient. It refers to a character value [here an aligned amino acid, elsewhere a bump on a tooth] that occurs only and everywhere on a topological subtree. example: DWEDRY in all rodents is DYEDRY in every other species. The tryptophan at this position is a good synapomorphic character for rodents. But the tyrosine is not, because it does not identify a monophyletic subtree. This is a 2bp change that presumbably passed through cysteine (which may still be present in some pre-Murinid rodents). One could also speak of local synapomorphies, eg at codon 4, serine is diagnostic of hamsters within rodents but not within mammals.
26 Oct 98 webmaster opinionRecent papers about knockout mice are split between several finding no ill effects and a few finding minor abnormalities. In either case, there is no support for essentiality of the prion gene, no disease phenotype much less lethality associated with loss-of-function, and thus no explanation for the conservative evolution of the gene. Note transgenic mice expressing PrP with specific amino-proximal deletions develop a neurologic syndrome with ataxia and cerebellar lesions.
How good are the controls for knockout mice? Terrible. No one has ever determined the prion sequence of a real Mus musculus domesticus. A highly inbreed lab mouse bears less of a relationship to a wild housemouse than a toy poodle does to a Canadian wolf.
Could 'wildtype' mice already be knockouts of normal prion function? Suppose the prion gene were essential but inbreeding fixed a bad allele, with a compensatory change in another gene strongly selected by the inbreeding process: all the experiments then compare a point mutation knockout to a deletion knockout.
No one in their right mind would ever use linc [long incubation] mice as controls -- this allele is doubly defective relative to sinc [short incubation]: L108F - T189V. (Linc is thus 3 base changes from sinc: C428T and ACc671-673GTc; the 'missing link' would have ile or ala as transition. Sinc can be shown to be closer to wildtype -- see below.). While linc doesn't cause TSE during mouse lifetimes, TSE is not a disease of normal function. The loci in humans that cause familial CJD are speculated to thermodynamic destabilize native protein, yet these are mainly mild conservative changes compared to linc. (It should not be thought that long incubation times (to scrapie passage) means this protein is 'better' than wildtype; on the contrary, it probably means that it is less like a real prion in structure, hence harder to recruit under the like-like principle of the species barrier. Linc mice are analagous to the many bizarre alleles in sheep prion -- artefacts of animal husbandry.
Is linc is a knockout (or severe setback) of some aspect of normal function? Yes, the severity of the mutations implies this. The argument is threefold quantitative: a residue's functional importance is inversely related to its characteristic rate of evolutionary change; codons 108 and 189 (and surrounding domains) are experiencing exceeding slow rates of change. (The baseline is set in pseudogenes, introns, or intra-gene loop regions of similar base composition not experiencing selection.) Second, statistical measures of the substitutability of one residue for another reflect general design criteria of proteins (roughly PAM or Blosum matrices): if there is to be an accepted change at some codon, then it is far more likely to be certain residues than others. Third, multiple mutations are generally worse than additive in effect.
Assuming for the moment that sinc has full wildtype function, to replace both a leucine with a phenylalanine and a threonine with valine at codon positions with 100 million year scales of invariance is a bit like winning the lottery the same day you shoot a hole-in-one at golf blindfolded. This is the measure of neutrality of linc relative to sinc for retaining the full gamut of normal prion function. [A third strain of mouse was in wide use in the 1980's but has not shown up in sequences since, M133V relative to sinc; similar arguments apply to it. I call it kinc in view of probable induced structural changes.]
Naturally very few papers in the prion literature actually state up front which of the 3x3=9 genotypes of mice were used. A person familiar with the myriad strain names and their histories might be able to work this out in some cases. The key issue may be whether linc mice were derived from sinc or vice versa. The latter scenario means that even though sinc might be closer to wildtype, if it got there through a compensated linc mouse, its knockouts are no better than linc knockouts.
Alternatively, it could be argued that loss of prion function in 'wildtype' is simply not detected under conditions of cage life (or swimming or maze tests). That is, a mouse could be deaf, dumb, lack night vision and olefaction, and roll over for predators -- what does it matter when you are never more than 6 inches from your food bowl? A gene may be essential in the wild but not in the animal room. (E. coli can dispense with hundreds of genes when grown in rich broth.)
Forgetting now the differences between sinc, linc, and kinc mice, let us concentrate on dubious variations mice have relative to what is known about this protein from its evolution. One sees immediately mice have too many changes:

Key:
Rodent sequences are globally colored by tree topology. Internal magenta highlights recent synapomorphies and plesiomorphies; internal yellow suppresses singlets. Uncertainty in consensus sequences indicated by lower case.
*Ferungulate, marsupial and bird lines show only unambiguous residues relevent to rodent issues; indels are suppressed to hold the alignment flat.
| Note | Clade affected | Mutant | Conservation | Comment |
| 1 | mouse+rat+gerbil | A14T | placental | stable codon |
| 2 | mouse-rat+gerbil | T15M | MT toggle | M occurs sporadically |
| 3 | mouse only | G55del | mammal | repeat region |
| 4 | mouse only | D72S | mammal | repeat region |
| 5 | mouse only | G80S | mammal | repeat region |
| 6 | mouse+rat | M109L | placental | L in marsupial too |
| 7 | rodents-(g.pig) | Y145W | non-rodents | 2bp synapomorphy |
| 8 | rodents | -127D | non-rodents | pre-GPI anchor |
| 9 | mouse only | -A234ST | mammal | post GPI-anchor |
This is difficult to assess because sophisticated identification algorthms [eg Psort] don't like any rodent signal peptides. However, using a virtual chimera of human signal shows that mouse gpi attachment is still expected. The hypervariable region surrounding the GPI join has no good explanation; the 3' terminus itself shows extraordinary conservation. Asp preceding the join, DGRRS-ss appears to be a very old insertion in rodents. On top of this, mouse has a slippage insert with terminal point mutation immediately after the GPI. One might suppose that the GPI splice signature would be critical and conserved but instead it is one of the most variable regions in the whole protein.
While serines are tolerated sporadically at the first and final repeats, it is precisely these serine substitutions at the second and third repeat that are unprecedented and quite possibly enough to knock out or disrupt the structure/function of the repeat region. Mouse prion is a very poor place to study copper and zinc binding to this region because of these unique serine substitutions; most studies fortunately have been done with PHGGGWQG repeats (general mammal).
Non-Redundant rodent prion protein resouce in fasta format:
8 species, + 2 alleles + ancestral sequence (lower case indicates uncertainty). >ancestral rodent MANLGYWLLALFVAtWTDVGLCKKRPKPGGWNTGGSRYPGQGSPGGNRYPPQGGGTWGQPHGGGWGQPHGGGWGQPHGGGWGQPHGGGWGQGGGTHNQWNKPSKPKTNMKHvAGAAAAGAVVGGLGGYMLGSAMSRPMIHFGNDWEDRYYRENMYRYPNQVYYRPVDQYsNQNNFVHDCVNITIKQHTVTTTTKGENFTETDVKMMERVVEQMCVTQYQKESQAYYDGRRSSAVLFSSPPVILLISFLIFLIVG >musmus sinc MANLGYWLLALFVTMWTDVGLCKKRPKPGGWNTGGSRYPGQGSPGGNRYPPQGGTWGQPHGGGWGQPHGGSWGQPHGGSWGQPHGGGWGQGGGTHNQWNKPSKPKTNLKHVAGAAAAGAVVGGLGGYMLGSAMSRPMIHFGNDWEDRYYRENMYRYPNQVYYRPVDQYSNQNNFVHDCVNITIKQHTVTTTTKGENFTETDVKMMERVVEQMCVTQYQKESQAYYDGRRSSSTVLFSSPPVILLISFLIFLIVG >musmus linc MANLGYWLLALFVTMWTDVGLCKKRPKPGGWNTGGSRYPGQGSPGGNRYPPQGGTWGQPHGGGWGQPHGGSWGQPHGGSWGQPHGGGWGQGGGTHNQWNKPSKPKTNFKHVAGAAAAGAVVGGLGGYMLGSAMSRPMIHFGNDWEDRYYRENMYRYPNQVYYRPVDQYSNQNNFVHDCVNITIKQHTVVTTTKGENFTETDVKMMERVVEQMCVTQYQKESQAYYDGRRSSSTVLFSSPPVILLISFLIFLIVG >musmus kinc MANLGYWLLALFVTMWTDVGLCKKRPKPGGWNTGGSRYPGQGSPGGNRYPPQGGTWGQPHGGGWGQPHGGSWGQPHGGSWGQPHGGGWGQGGGTHNQWNKPSKPKTNLKHVAGAAAAGAVVGGLGGYMLGSAVSRPMIHFGNDWEDRYYRENMYRYPNQVYYRPVDQYSNQNNFVHDCVNITIKQHTVTTTTKGENFTETDVKMMERVVEQMCVTQYQKESQAYYDGRRSSSTVLFSSPPVILLISFLIFLIVG >ratrat MANLGYWLLALFVTTCTDVGLCKKRPKPGGWNTGGSRYPGQGSPGGNRYPPQSGGTWGQPHGGGWGQPHGGGWGQPHGGGWGQPHGGGWSQGGGTHNQWNKPSKPKTNLKHVAGAAAAGAVVGGLGGYMLGSAMSRPMLHFGNDWEDRYYRENMYRYPNQVYYRPVDQYSNQNNFVHDCVNITIKQHTVTTTTKGENFTETDVKMMERVVEQMCVTQYQKESQAYYDGRRSSAVLFSSPPVILLISFLIFLIVG >gerbil MANLGYWLLALFVTMWTDVGLCKKRPKPGGWNTGGSRYPGQGSPGGNRYPPQGGGWGQPHGGGWGQPHGGGWGQPHGGGWGQPHGGGWGQGGGTHSQWNKPSKPKTNMKHVAGAAAAGAVVGGLGGYMLGSAMSRPMIHFGNDWEDRYYRENMYRYPNQVYYRPVDQYSNQNNFVHDCVNITIKQHTVTTTTKGENFTETDVKMMERVVEQMCVTQYQKESQAYYDGRRSSAVLFSSPPVILLISFLLFLIVG >sighis MANLGYWLLALFVATWTDVGLCKKRPKPGGWNTGGSRYPGQGNPGGNRYPPQGGGTWGQPHGGGWGQPHGGGWGQPHGGGWGQPHGGGWGQGGGTHSQWNKPSKPKTNMKHVAGAAAAGAVVGGLGGYMLGSAMSRPMIHFGNDWEDRYYRENMYRYPNQVYYRPVDQYNNQNNFVHDCVNITIKQHTVTTTTKGENFTETDVKMMERVVEQMCVTQYQKESQAYYDGRRSSAVLFSSPPMILLISFLIFLIVG >sigful MANLGYWLLALFVATWTDVGLCKKRPKPGGWNTGGSRYPGQGNPGGNRYPPQGGGTWGQPHGGGWGQPHGGGWGQPHGGGWGQPHGGSWGQGGGTHSQWNKPSKPKTNMKHVAGAAAAGAVVGGLGGYMLGSAMSRPMIHFGNDWEDRYYRENMYRYPNQVYYRPVDQYNNQNNFVHDCVNITIKQHTVTTTTKGENFTETDVKMMERVVEQMCVTQYQKESQAYYDGRRSSAVLFSSPPMILLISFLIFLIVG >cricmigr MANLSYWLLALFVATWTDVGLCKKRPKPGGWNTGGSRYPGQGSPGGNRYPPQGGGTWGQPHGGGWGQPHGGGWGQPHGGGWGQPHGGGWGQGGGTHNQWNKPNKPKTSMKHMAGAAAAGAVVGGLGGYMLGSAMSRPMLHFGNDWEDRYYRENMNRYPNQVYYRPVDQYNNQNNFVHDCVNITIKQHTVTTTTKGENFTETDVKMMERVVEQMCVTQYQKESQAYYDGRRSSAVLFSSPPVILLISFLIFLIVG >cricgriseus MANLSYWLLALFVATWTDVGLCKKRPKPGGWNTGGSRYPGQGSPGGNRYPPQGGGTWGQPHGGGWGQPHGGGWGQPHGGGWGQPHGGGWGQGGGTHNQWNKPSKPKTNMKHVAGAAAAGAVVGGLGGYMLGSAMSRPMLHFGNDWEDRYYRENMNRYPNQVYYRPVDQYNNQNNFVHDCVNITIKQHTVTTTTKGENFTETDVKMMERVVEQMCVTQYQKESQAYYDGRRSSAVLFSSPPVILLISFLIFLIVG >mesocrisauratus MANLSYWLLALFVAMWTDVGLCKKRPKPGGWNTGGSRYPGQGSPGGNRYPPQGGGTWGQPHGGGWGQPHGGGWGQPHGGGWGQPHGGGWGQGGGTHNQWNKPSKPKTNMKHMAGAAAAGAVVGGLGGYMLGSAMSRPMMHFGNDWEDRYYRENMNRYPNQVYYRPVDQYNNQNNFVHDCVNITIKQHTVTTTTKGENFTETDIKIMERVVEQMCTTQYQKESQAYYDGRRSSAVLFSSPPVILLISFLIFLMVG
J Neuropathol Exp Neurol 1998 Oct;57(10):979-88 Piccardo P, Dlouhy SR, Lievens PM, Young K, Bird TD, Nochlin D, Dickson DW, Vinters HV, Zimmerman TR, Mackenzie IR, Kish SJ, Ang LC, De Carli C, Pocchiari M, Brown P, Gibbs CJ Jr, Gajdusek DC, Bugiani O, Ironside J, Tagliavini F, Ghetti BGerstmann-Straussler-Scheinker disease (GSS), a cerebello-pyramidal syndrome associated with dementia and caused by mutations in the prion protein gene (PRNP), is phenotypically heterogeneous. The molecular mechanisms responsible for such heterogeneity are unknown. Since we hypothesize that prion protein (PrP) heterogeneity may be associated with clinico-pathologic heterogeneity, the aim of this study was to analyze PrP in several GSS variants. Among the pathologic phenotypes of GSS, we recognize those without and with marked spongiform degeneration. In the latter (i.e. a subset of GSS P102L patients) we observed 3 major proteinase-K resistant PrP (PrPres) isoforms of ca. 21-30 kDa, similar to those seen in Creutzfeldt-Jakob disease. In contrast, the 21-30 kDa isoforms were not prominent in GSS variants without spongiform changes, including GSS A117V, GSS D202N, GSS Q212P, GSS Q217R, and 2 cases of GSS P102L.
This suggests that spongiform changes in GSS are related to the presence of high levels of these distinct 21-30 kDa isoforms. Variable amounts of smaller, distinct PrPres isoforms of ca. 7-15 kDa were seen in all GSS variants. This suggests that GSS is characterized by the presence PrP isoforms that can be partially cleaved to low molecular weight PrPres peptides.
Comment (webmaster): Two of these mutations are apparently new and not on Medline. One presumes they turned up during screening of GSS patients. People should stop calling GSS a disease or just pick one genotype for it. I favor getting rid of both FFI and GSS and just sticking with 'CJD D202N M129M' or whatever. GSS is a subset of CJD with no deep underlying definition or common ground -- no wonder there is paper after paper wrestling with 'phenotypic variability.'
Both D202N and Q212P are found in alpha helix 3 in the mouse and hamster nmr structures. D202 is an invariant residue in mammals (but glutamate in birds) just past the 2nd glycosylation site, hydrogen bonded to Y149, Y157, T199, and T199 amide. Q212 is also strongly invariant (but deleted in birds) just prior to the second cysteine of the disulphide and hydrogen bonded to T216
Better3D blow-ups will be posted shortly. There is also a whole long story to about how hydrogen bond acceptors cannot be replaced by donors or non-acceptors or donors by acceptors or non-donors etc. etc. even though under other circumstances these mutations might be conservative. There are applications to sheep allele hazards and to whether any of the lab mice strains have normal functioning prion protein. E200K R208H V210I Q217R M232R are the other known mutations in this vicinity .
Context:
manlgcwmlvlfvatwsdlglc KKRPKPGGWNTGGSRYPGQGSPGGNRYPPQGGGGWGQPHGGGWGQPHGGGWGQPHGGGWGQPHGGGWGQ GGGTHSQWNKPSKPKTNMKHMAGAAAAGAVVGGLGG YMLGSAMSRPIIHFGSDYEDRYYRENMHRYPNQVYYRPMDEYSNQNNFVHD CVNITIKQHTVTTTTKGENFTETDVKMMERVVEQMCITQYERESQAYYQRGS smvlfssppvillisfliflivg
Adapted from 27 July 98 PNAS Riek et al
| Fig.code | Donor | Acceptor | #Conformers |
|---|---|---|---|
| a | Tyr-128 H | sidechain O Asp-178 | 11 |
| b | Asn-143 H | sidechain O Glu-146 | 16 |
| c | Tyr-149 H | sidechain O Asp-202 | 16 |
| d | Tyr-150 H | backbone O Pro-137 | 17 |
| e | Arg-151 H | sidechain O Glu-152 | 8 |
| f | Asn-153 H | backbone O Tyr-149 | 18 |
| g | Arg-156 H | sidechain O Glu-196 | 13 |
| H | Tyr-157 H | backbone O Asp-202 | 7 |
| i | Gln-160 H | backbone O Gly-131 | 19 |
| j | Tyr-162 NH | sidechain O Thr-183 | 12 |
| k | Arg-164 H | sidechain O Asp-178 | 7 |
| l | Asn-174 H | backbone O Asn-171 | 12 |
| m? | Thr-183 OH | backbone O Cys-179 | 6 |
| n | Thr-188 OH | backbone O Ile-184 | 12 |
| o | Thr-191 OH | backbone O His-187 | 8 |
| p | Thr-192 OH | backbone O Glu-196 | 7 |
| q | Lys-194 H | sidechain O Glu-196 | 7 |
| r | Thr-199 NH | sidechain O Asp-202 | 17 |
| s | Thr-199 OH | sidechain O Asp-202 | 14 |
| t | Asp-202 NH | sidechain O Thr-199 | 15 |
| u | Thr-216 OH | backbone O Gln-212 | 10 |
| v | Gln-217 H | backbone O Ala-133 | 8 |
10/16/1998 WUSTL and the Sanger Centre have finished sequencing 85,341,695 bases of the 100 Mb Caenorhabditis elegans genome (make that 86,572,592 as of 25 Nov 98)Comment (webmaster):
The prion gene has been tracked back 410 million years to the fish-mammal divergence using antibody 3F4 to the core invariant epitope. Yet there is no sign of the gene earlier in yeast, fruit fly, or nematode.
Here is the hnRNP gene product, still the best Blast hit to prion protein, found long ago by hybridization. Note that its terminal repeat does bear an uncanny resemblance, in composition and residue order, to the prion protein octarepeat (and also to the yeast sup35 prionlike protein repeat.
MTDVEIKAENGSGDASLEPENLRKIFVGGLTSNTTDDLMREFYS QFGEITDIIVMRDPTTKRSRGFGFVTFSGKTEVDAAMKQRPHIIDGKTVDPKRAVPRD DKNRSESNVSTKRLYVSGVREDHTEDMLTEYFTKYGTVTKSEIILDKATQKPRGFGFV TFDDHDSVDQCVLQKSHMVNGHRCDVRKGLSKDEMSKAQMNRDRETRGGRSR ...DGQRGGYN .GGGGGGGGWGG PAQRGGPGAYGG .PGGGGQGGYGG ....DYGGGWGQ .QGGGGQGGWGG PQQQQGGGGWGQ .QGGGGQGGWGG .PQQQQQGGWGG PQQGGGGGGWGGQ ...GQQQGGWGGQ ....SGAQQWAHA ...QGGNRNYYeast Sup35:
MSDSNQGNNQQN YQQ YSQNGNQQQGNNR YQGYQA YNAQAQPAGGY YQN YQGYSG YQQGG YQQYNPDAG YQQQYNPQGG YQQ.YNPQGG YQQQFNPQGGRGN YKNFNYNNNLQG YQAGFQPQ...Also, RNA polymerase II has a very similar repeat to bird prion, YSPTSPS in later eucaryotes, YSPASPA in Mastigamoeba.
Iwasaki M, Okumura K, Kondo Y, Tanaka T, Igarashi H Nucleic Acids Res 1992 Aug 11;20(15):4001-7...The evolutionary conservation of the PrP gene has been reported in the genomes of many vertebrates as well as certain invertebrates. In the genome of nematode Caenorhabditis elegans, the sequence capable of hybridizing with the mammalian PrP cDNA probe has been demonstrated, predicting the presence of the PrP gene homologue in C.elegans. In this study, Southern analysis with the hamster PrP cDNA (HaPrP) probe confirmed the previous observation. Moreover, Northern analysis revealed that the sequence is actively transcribed in adult worms.
Thus, we screened C.elegans cDNA libraries with the HaPrP probe and isolated a cDNA that hybridizes to the same sequence in C.elegans that hybridized with the HaPrP probe in the Southern and Northern analyses. The deduced amino acid sequence of this cDNA, however, is substantially homologous with heterogeneous nuclear ribonucleoprotein (hnRNP) core proteins rather than mammalian PrPc. The hnRNPs contain the glycine-rich domain in the C-terminal half of the molecule, which also seemed to be in PrPc at the N-terminal half of the molecule. Both of the glycine-rich domains are composed of tracts with high G + C content, indicating that these tracts may [cause] the hybridizing signals. These results suggest that this cDNA clone is derived from a novel hnRNP gene homologue in C.elegans but not from a predicted PrP gene homologue.
Katrina L. Kelner opinion piece Science 22 Oct 98" In a curious set of neurodegenerative diseases, a long string of the nucleotide triplet CAG lodges within genes, causing the death of subsets of neurons and ultimately disease. Exactly how these strings of repeats cause cell death is not known, but they do not simply disrupt the function of their target gene. Rather, the long CAG string has a deadly--but undefined--effect of its own.
One popular idea is that the CAG repeats cause the protein to form a toxic aggregate in the nucleus of cells. These so-called nuclear inclusions are common in the brains of patients with these disorders. But in two recent papers in Cell, this explanation is called into question. One group shows, in a cultured cell model system for Huntington's disease (F. Saudou et al., Cell 95, 55 1998), that cells may die even without the presence of nuclear inclusions. In the most dramatic experiment, expression of a fragment of the mutant huntingtin protein containing a 68-repeat insertion, together with an inhibitory form of the ubiquitin-conjugating enzyme, resulted in far fewer intranuclear inclusions. The mutant huntingtin actually triggered more cell death in this situation than it would have in the presence of inclusions, leading the authors to the bold suggestion that the inclusions may actually be protective.
A second group made transgenic mice that mimicked the disorder spinocerebellar atrophy type 1 (A. Klement et al., ibid., p. 41.), in which the repeat-containing protein ataxin-1 lacked a self-aggregating region. These mice had no nuclear inclusions, but still showed the characteristic degeneration of cerebellar Purkinje cells. The field may now have to look elsewhere for the mechanism by which these repeats do their damage to the cell."
Comment (webmaster): While these two Cell papers should be taken seriously (not forgetting the large literature on these diseases pointing in the other direction), we have seen similar errors of interpretation many times in CJD. It is impossible to show absence of aggregates, only non-detectibility up to the sensitivity of whatever methods used. Many other effects also come into play in transgenic mutants when using proteins of unknown function and neuropathological phenotyping.
Speaking of ontogeny recapitulating phylogeny, here is the phylogenetic version of repeat disease anticipation:
Choong CS, Kemppainen JA, Wilson EM J Mol Evol 1998 Sep;47(3):334-42Comparison of androgen receptor from five primate species, human, chimpanzee), baboon, macaque) and collared brown lemur supports their phylogeny with complete conservation of the DNA and steroid binding domain protein sequence. A linear increase in trinucleotide repeat expansion of homologous CAG and GGC sequences occurs in the NH2-terminal transcriptional activation region and is proportional to the time of species divergence.
A serine phosphate/glutamine repeat interaction is observed where increasing CAG repeat length is associated with an increased rate of serine 94 phosphorylation. Disparity in the calculated and apparent molecular weight with CAG repeat expansion of an AR NH2-terminal fragment suggests self-aggregation with increasing glutamine repeat length into the pathological range. These results suggest that a CAG/glutamine repeat expanded during divergence of the higher primate species, which may have a direct effect on AR structure and support a common pathway in CAG trigenic diseases in the pathophysiology of neurodegeneration observed in X-linked spinal bulbar and muscular atrophy.
21 Oct 98 webmaster
Human codon use per ten thousand codons, based on 7,168,914 codons from 14,529 proteins. TTT F 164 TCT S 143 TAT Y 122 TGT C 97 TTC F 209 TCC S 177 TAC Y 167 TGC C 127 TTA L 67 TCA S 112 TAA * 06 TGA * 12 TTG L 118 TCG S 44 TAG * 05 TGG W 132 CTT L 121 CCT P 172 CAT H 99 CGT R 47 CTC L 194 CCC P 203 CAC H 148 CGC R 110 CTA L 65 CCA P 165 CAA Q 117 CGA R 61 CTG L 399 CCG P 70 CAG Q 342 CGG R 115 ATT I 157 ACT T 127 AAT N 168 AGT S 115 ATC I 228 ACC T 204 AAC N 207 AGC S 193 ATA I 69 ACA T 147 AAA K 234 AGA R 111 ATG M 224 ACG T 65 AAG K 334 AGG R 110 GTT V 106 GCT A 184 GAT D 222 GGT G 110 GTC V 151 GCC A 288 GAC D 267 GGC G 235 GTA V 67 GCA A 155 GAA E 283 GGA G 167 GTG V 294 GCG A 75 GAG E 406 GGG G 167
Mouse codon use per ten thousand codons, based on 3,403,144 codons from 7,272 proteins. TTT 158 TCT 154 TAT 120 TGT 109 TTC 214 TCC 180 TAC 172 TGC 129 TTA 59 TCA 111 TAA 06 TGA 12 TTG 122 TCG 45 TAG 05 TGG 130 CTT 121 CCT 187 CAT 98 CGT 48 CTC 193 CCC 191 CAC 151 CGC 100 CTA 74 CCA 174 CAA 118 CGA 65 CTG 387 CCG 69 CAG 342 CGG 102 ATT 146 ACT 133 AAT 157 AGT 120 ATC 228 ACC 199 AAC 217 AGC 198 ATA 66 ACA 157 AAA 217 AGA 115 ATG 223 ACG 61 AAG 348 AGG 115 GTT 101 GCT 198 GAT 216 GGT 120 GTC 157 GCC 265 GAC 276 GGC 231 GTA 69 GCA 153 GAA 269 GGA 179 GTG 290 GCG 71 GAG 398 GGG 161
Cow codon use per ten thousand codons, based on 528,7904 codons from 1,277 proteins. TTT 162 TCT 126 TAT 116 TGT 96 TTC 243 TCC 175 TAC 195 TGC 138 TTA 52 TCA 94 TAA 07 TGA 12 TTG 111 TCG 46 TAG 05 TGG 138 CTT 109 CCT 149 CAT 81 CGT 42 CTC 206 CCC 206 CAC 146 CGC 111 CTA 54 CCA 143 CAA 100 CGA 56 CTG 426 CCG 76 CAG 325 CGG 111 ATT 150 ACT 116 AAT 151 AGT 97 ATC 259 ACC 217 AAC 230 AGC 185 ATA 66 ACA 136 AAA 222 AGA 103 ATG 226 ACG 76 AAG 352 AGG 111 GTT 101 GCT 173 GAT 212 GGT 113 GTC 168 GCC 309 GAC 298 GGC 253 GTA 61 GCA 133 GAA 269 GGA 164 GTG 320 GCG 82 GAG 416 GGG 173