Inteins are protein "introns" encoded inside the polypeptide sequence of other proteins. The inteins splice out post-translationally by a proteolytic cleavage and ligation process. Inteins appear to autocatalyze their own excision and some are site specific endonucleases1. Inteins are mobile genetic elements and at least one can home, that is, insert a copy of its DNA into its integration site in an intein-less allele2. Fifteen inteins have been found in various organisms including mycobacteria3-6, thermophilic archaebacteria7,8, yeast9,10, and chloroplast of red alga 11. Inteins are not very similar to one another5 but homologous sites in archaebacterial DNA polymerases and in mycobacterial gyrase-A proteins6 contain homologous inteins. However, the mycobacterial RecA proteins4 and DNA polymerases also contain different inteins in different integration sites.
Searching for inteins in sequence databases 5 I have identified a 429 amino acid region of an open reading frame (ORF) from the thermophilic cyanobacterium Synechocystis sp. (Ssp)12 (GenBank accession D64003, ORF slr0833 positions 381-809). The region contains all of the known intein motifs5 and can be confidently identified as an intein (expectant value 13 of 7.8*10e-13 for finding the multiple conserved regions). This is the first intein reported in cyanobacteria and the first eubacterial intein outside mycobacteria. The two regions flanking this intein have significant sequence similarity to prokaryotic replicative DnaB helicases (blast search 14 P values of 2.2*10e-22 to 6.5*10e-130).
The Ssp dnaB intein has weak sequence similarity to known inteins. The sequence can be aligned end-to-end with the mycobacterial GyrA inteins and the Mycobacterium tuberculosis RecA intein with 22-25% amino acid identity but is only similar to the conserved regions of other inteins. The Ssp DnaB intein and the DnaB intein from the chloroplast of the red alga Porphyra purpurea (Ppu) share only local sequence similarities and have distinctly different lengths. Nevertheless, both are found in the same point in the dnaB gene. The integration points are in the center of a region of 15 identical amino acids. However, nine of these have silent substitutions, including the two amino acids flanking the integration point (Fig. 1). The two DnaB inteins could be the result of a single integration event into a common ancestor of the cyanobacterial and chloroplast dnaB genes. Such an event was suggested to explain the presence of similar group I introns at an identical position in the leucine tRNA gene of diverse chloroplasts and cyanobacteria 15,16. Red alga chloroplasts probably originated from cyanobacteria 17 about 1.25 to 2.1 billion years ago 18,19. This scenario implies that (1) inteins are extremely ancient and (2) the DnaB inteins survived in their hosts a remarkably long time. The different conservation of the DnaB hosts and the DnaB inteins (Fig. 1) can be the result of different selection pressures. DnaB replicative helicase is a multifunctional protein closely interacting with DNA and other proteins 20. Selection on the DnaB helicases is probably linked to changes in the proteins with which they interact and to the evolution of the whole DNA replication process.
Conversely inteins might be molecular parasites that merely need to ensure the transmission of their genes 21. Rapid and efficient protein splicing is essential for minimal interference in the function of the protein host. Inteins can also increase their survival chances by homing into intein-less alleles, spreading in the population. The Ppu DnaB intein is the shortest intein (150 compared with 360-538 amino acids) and is missing the endonuclease-like motifs found in all other inteins5 while the Ssp DnaB intein has a typical length and all the characteristic motifs. It seems that over their long separate evolution the two inteins have opted for different survival strategies.
Alternatively, separate integration events might have led to the presence of different inteins in homologous hosts. Such events occurred in the RecA proteins and DNA polymerases but at different integration sites. Separate integration at the same sites in the dnaB genes implies a particular susceptibility of the sites (such as cleavage by the same endonuclease). The difference in nucleotide sequences around the integration sites does not completely rule out this idea. Restriction endonucleases can cleave ambiguous targets and this was shown for a homing intein endonuclease 2. It might also be that it is difficult to "dislodge" the inteins from their particular integration site or that being in this position somehow assists the protein splicing.
Separate integration events seem to account for the presence of distinct group I introns at the same location in nuclear rDNA genes of different phyla 22. It is not known if inteins confer any advantage to their hosts but the separate integrations of inteins into RecA proteins were proposed as indication for selective advantage 4. Such advantage might help explain the very long persistence time of the DnaB inteins or the incentive for separate integrations. The hedgehog developmental regulatory proteins might be relevant to this issue. These proteins undergo autoproteolytic cleavage at a specific site 23 that is significantly similar to inteins N-terminal splice junction motif of inteins 24. Experimental evidence suggests that the auto-processing of the Drosophila hedgehog protein regulate its range of action 23. Similarly, some inteins might regulate the activity of their host proteins.
Shmuel Pietrokovski
Acknowledgment
The author is a
Howard Hughes Medical Institute Fellow of the
Life Sciences Research Foundation.
(a)
Ssp dnaB GAATCTAGAACTAACAAACGGCCAATGATGTCAGATTTAAGAGAGAGTGGC*AGTATCGAACAAGACGCAGATTTAATTATGATGATTTATCGAGATGAATATTATAAT
||| | | ||| ||| | || | | |||||||| ||||| ||||| ||| ||||| || || ||||| |||| ||| | ||| |||| |||||||||
Ppu cp dnaB GAAAGTCGTCATAATAAAAGACCCTTATTATCAGATTTGAGAGAAAGTGGA*TCTATAGAACAGGATGCTGATTTGGTTATCATGCTATATAGAGAAAGCTATTATAAT
Ssp DnaB E S R H N K R P L L S D L R E S G *S I E Q D A D L V I M L Y R E S Y Y N
| | | | | | | : : | | | | | | | | | | | | | | | : : | : | | : : | | |
Ppu cp DnaB E S R T N K R P M M S D L R E S G *S I E Q D A D L I M M I Y R D E Y Y N
(b)
Ssp DnaB
Ppu cp DnaB
Ssp DnaB CISGDSLISLASTGK KKLVYILKTRLGRTIKATANHRFLTIDGWKRLDELSLKEHI WDSIVSITETGVEEVFDLTVPGPHNFVANDIIVHN
||| | | : ::| || :| ::|: :: :: |:||::||: ||:| | | :: | ::::::|: :: ::|||::: :||:||:|||||
Ppu cp DnaB CISKFSHIMWSHVSK KKTTYKIRTNSEKYLELTSNHKILTLRGWQRCDQLLCNDMI FETLANINISNFQNVFDFAANPIPNFIANNIIVHN