A new intein in Cyanobacteria and its significance for the spread of inteins

Trends In Genet. 12 (8), 287-288 (1996).

Inteins are protein "introns" encoded inside the polypeptide sequence of other proteins. The inteins splice out post-translationally by a proteolytic cleavage and ligation process. Inteins appear to autocatalyze their own excision and some are site specific endonucleases1. Inteins are mobile genetic elements and at least one can home, that is, insert a copy of its DNA into its integration site in an intein-less allele2. Fifteen inteins have been found in various organisms including mycobacteria3-6, thermophilic archaebacteria7,8, yeast9,10, and chloroplast of red alga 11. Inteins are not very similar to one another5 but homologous sites in archaebacterial DNA polymerases and in mycobacterial gyrase-A proteins6 contain homologous inteins. However, the mycobacterial RecA proteins4 and DNA polymerases also contain different inteins in different integration sites.

Searching for inteins in sequence databases 5 I have identified a 429 amino acid region of an open reading frame (ORF) from the thermophilic cyanobacterium Synechocystis sp. (Ssp)12 (GenBank accession D64003, ORF slr0833 positions 381-809). The region contains all of the known intein motifs5 and can be confidently identified as an intein (expectant value 13 of 7.8*10e-13 for finding the multiple conserved regions). This is the first intein reported in cyanobacteria and the first eubacterial intein outside mycobacteria. The two regions flanking this intein have significant sequence similarity to prokaryotic replicative DnaB helicases (blast search 14 P values of 2.2*10e-22 to 6.5*10e-130).

The Ssp dnaB intein has weak sequence similarity to known inteins. The sequence can be aligned end-to-end with the mycobacterial GyrA inteins and the Mycobacterium tuberculosis RecA intein with 22-25% amino acid identity but is only similar to the conserved regions of other inteins. The Ssp DnaB intein and the DnaB intein from the chloroplast of the red alga Porphyra purpurea (Ppu) share only local sequence similarities and have distinctly different lengths. Nevertheless, both are found in the same point in the dnaB gene. The integration points are in the center of a region of 15 identical amino acids. However, nine of these have silent substitutions, including the two amino acids flanking the integration point (Fig. 1). The two DnaB inteins could be the result of a single integration event into a common ancestor of the cyanobacterial and chloroplast dnaB genes. Such an event was suggested to explain the presence of similar group I introns at an identical position in the leucine tRNA gene of diverse chloroplasts and cyanobacteria 15,16. Red alga chloroplasts probably originated from cyanobacteria 17 about 1.25 to 2.1 billion years ago 18,19. This scenario implies that (1) inteins are extremely ancient and (2) the DnaB inteins survived in their hosts a remarkably long time. The different conservation of the DnaB hosts and the DnaB inteins (Fig. 1) can be the result of different selection pressures. DnaB replicative helicase is a multifunctional protein closely interacting with DNA and other proteins 20. Selection on the DnaB helicases is probably linked to changes in the proteins with which they interact and to the evolution of the whole DNA replication process.

Conversely inteins might be molecular parasites that merely need to ensure the transmission of their genes 21. Rapid and efficient protein splicing is essential for minimal interference in the function of the protein host. Inteins can also increase their survival chances by homing into intein-less alleles, spreading in the population. The Ppu DnaB intein is the shortest intein (150 compared with 360-538 amino acids) and is missing the endonuclease-like motifs found in all other inteins5 while the Ssp DnaB intein has a typical length and all the characteristic motifs. It seems that over their long separate evolution the two inteins have opted for different survival strategies.

Alternatively, separate integration events might have led to the presence of different inteins in homologous hosts. Such events occurred in the RecA proteins and DNA polymerases but at different integration sites. Separate integration at the same sites in the dnaB genes implies a particular susceptibility of the sites (such as cleavage by the same endonuclease). The difference in nucleotide sequences around the integration sites does not completely rule out this idea. Restriction endonucleases can cleave ambiguous targets and this was shown for a homing intein endonuclease 2. It might also be that it is difficult to "dislodge" the inteins from their particular integration site or that being in this position somehow assists the protein splicing.

Separate integration events seem to account for the presence of distinct group I introns at the same location in nuclear rDNA genes of different phyla 22. It is not known if inteins confer any advantage to their hosts but the separate integrations of inteins into RecA proteins were proposed as indication for selective advantage 4. Such advantage might help explain the very long persistence time of the DnaB inteins or the incentive for separate integrations. The hedgehog developmental regulatory proteins might be relevant to this issue. These proteins undergo autoproteolytic cleavage at a specific site 23 that is significantly similar to inteins N-terminal splice junction motif of inteins 24. Experimental evidence suggests that the auto-processing of the Drosophila hedgehog protein regulate its range of action 23. Similarly, some inteins might regulate the activity of their host proteins.

Shmuel Pietrokovski
"pietro@sparky.fhcrc.org".
Fred Hutchinson Cancer Research Center, 1124 Columbia St. Seattle, WA.

Acknowledgment
The author is a Howard Hughes Medical Institute Fellow of the Life Sciences Research Foundation.

  1. Cooper, A.A. and Stevens, T.H. (1995) Trends Biochem Sci 20, 351-356
  2. Gimble, F.S. and Thorner, J. (1992) Nature 357, 301-306
  3. Davis, E.O. et al. (1992) Cell 71, 201-210
  4. Davis, E.O., Thangaraj, H.S., Brooks, P.C. and Colston, M.J. (1994) EMBO J 13, 699-703
  5. Pietrokovski, S. (1994) Prot Sci 3, 2340-2350
  6. Fsihi, H., Vincent, V. and Cole S.T. (1996) Proc. Natl. Acad. Sci. USA 93, 3410-3415
  7. Perler, F.B. et al. (1992) Proc. Natl. Acad. Sci. USA 89, 5577-5581
  8. Hodges, R.A., Perler, F.B., Noren, J.N. and Jack, W.E. (1992) Nucl. Acid. Res. 20, 6153-6157
  9. Kane, P.M. et al. (1990) Science 250, 651-657
  10. Gu, H.H., Xu, J., Gallagher, M. and Dean, G.E. (1993) J. Biol. Chem. 268, 7372-7381
  11. Reith, M.E. and Munholland, J. (1995) Plant Mol. Biol. Rep. 13, 333-335
  12. Kaneko, T. et al. (1995) DNA Res. 2, 153-166
  13. Henikoff, S. and Henikoff, J.G. (1994) Genomics 19, 97-107
  14. Altschul, S.F. et al. (1990) J. Mol. Biol. 215, 403-410
  15. Kuhsel, M.G., Strickland, R. and Palmer, J.D. (1990) Science 250, 1570-1573
  16. Xu, M.Q. et al. (1990) Science 250, 1566-1570
  17. Gray, M.W. (1993) Curr Opin Genet Dev 3, 884-890
  18. Han, T.-M. and Runnegar, B. (1992) Science 257, 232-235
  19. Doolittle, R.F. et al. (1996) Science 271, 470-477
  20. Kornberg, A. and Baker, T.A. (1992) DNA replication, W.H. Freeman and Company
  21. Hickey, D.A. (1994) Trends Genet 10, 147-149
  22. Vader, A. et al. (1994) Nucleic Acids Res 22, 4553-4559
  23. Porter, J.A. et al. (1995) Nature 374, 363-366
  24. Koonin, E.V. (1995) Trends Biochem Sci 20, 141-142
  25. Schuler, G.D., Altschul, S.F. and Lipman, D.J. (1991) Proteins: Structure, Function, and Genetics 9, 180-190
  26. Smith, T.F. and Waterman, M.S. (1981) J. Mol. Biol. 147, 195-197


    Figure 1
    (a)
    
       Ssp dnaB GAATCTAGAACTAACAAACGGCCAATGATGTCAGATTTAAGAGAGAGTGGC*AGTATCGAACAAGACGCAGATTTAATTATGATGATTTATCGAGATGAATATTATAAT
                 |||  | |   ||| ||| | ||  |  | |||||||| ||||| |||||    ||| ||||| || || |||||  |||| ||| | ||| ||||    |||||||||
    Ppu cp dnaB GAAAGTCGTCATAATAAAAGACCCTTATTATCAGATTTGAGAGAAAGTGGA*TCTATAGAACAGGATGCTGATTTGGTTATCATGCTATATAGAGAAAGCTATTATAAT
       Ssp DnaB  E  S  R  H  N  K  R  P  L  L  S  D  L  R  E  S  G  *S  I  E  Q  D  A  D  L  V  I  M  L  Y  R  E  S  Y  Y  N
                 |  |  |     |  |  |  |  :  :  |  |  |  |  |  |  |   |  |  |  |  |  |  |  |  :  :  |  :  |  |  :  :  |  |  | 
    Ppu cp DnaB  E  S  R  T  N  K  R  P  M  M  S  D  L  R  E  S  G  *S  I  E  Q  D  A  D  L  I  M  M  I  Y  R  D  E  Y  Y  N
    
    
    (b)
    
       Ssp DnaB
               
    Ppu cp DnaB
    
       Ssp DnaB  CISGDSLISLASTGK  KKLVYILKTRLGRTIKATANHRFLTIDGWKRLDELSLKEHI  WDSIVSITETGVEEVFDLTVPGPHNFVANDIIVHN
                 |||  | |  : ::|  || :| ::|: :: :: |:||::||: ||:| | |  :: |  ::::::|: :: ::|||:::   :||:||:|||||
    Ppu cp DnaB  CISKFSHIMWSHVSK  KKTTYKIRTNSEKYLELTSNHKILTLRGWQRCDQLLCNDMI  FETLANINISNFQNVFDFAANPIPNFIANNIIVHN
    

    Figure legend
    Figure 1
    Comparison of the two DnaB inteins.
    (a) The Synechocystis (Ssp) and Porphyra (Ppu) inteins integration sites in the DnaB proteins. Bars mark identities, colons conserved amino acids substitutions, and asterisks the integration sites. The two DnaB proteins can be aligned end to end (415 amino acids) with only 3 gaps, 37% amino acid identities and 79% amino acid identities and conserved substitutions. (b), Sequence similarity of the two inteins. The similar blocks in the two DnaB intein sequences are hatched and shown below the diagram. No other significantly similar regions were found in the two sequences using the Macaw multiple alignment program 25 and the Smith-Waterman optimal alignment algorithm 26. The P values for the block alignments are 9.5*10e-1, 1.1*10e-8 and 6.8*10e-7 15. The GenBank sequence accession codes are D64003 for Ssp and U38804 for Ppu.


    [Inteins home page]