Inteins integration points

inteins integration points sequence logo
Sequence conservation of intein integration points. Ninety-eight intein protein integration points were aligned by that point. The heights of the amino acids at each position corresponds to their conservation. Sequence weighting was used to correct for the uneven representation of different allelic integartion points.
Inteins are integrated in various points in diverse proteins. Some of these points are homologous but there are common sequence features for the non-homologous integration points except the amino acid C' to the intein. This residue is either Cys, Ser or Thr and its thiol or hydroxyl side chain has a crucial part in the protein splicing reaction.

There is also no apparent common function to these proteins or points. The only published structure of an intein in its integration point (i.e. before it is spliced out) is by Poland et al. JBC 275:16408-16413 '00 ( 1EF0. However, the crystalized protein has a mutated intein N-terminal residue that blocks splicing, a non-native N-terminal flank for the intein and the two independent molecules in the asymmetric subunit of the crystal have diferent structure of the integration site. Thus, the relevance of these structures to the native structures of intein integration points.

Structures of several proteins that are very similar to ones with inteins are known. None of these proteins have inteins but they can be used to model the conformations of some intein integration points. Of course, these would be the conformations after the inteins spliced out. In some proteins the intein integration points are inside the protein fold or in deep clefts. Hence, the intein containing homologs of these proteins probably adopt their final fold after their intein/s splice out.

What is common to many, and probably all, intein integration points is their presence in highly conserved protein sequence regions. This is probably due to the fact that these are points in which the inteins can "survive" longest is on the DNA level. The "survival" is evolutionary and counteracts deletion and recombination events that might excise the intein gene. Inteins apparently only benefit themselves and are probably selfish genetic elements. Thus their host organism does not need them to survive. However, intein integration in conserved protein regions forces the genetic events that can remove them to be precise. Only the intein coding DNA must be excised. If more will be lost the conserved region where the intein was integrated will be disrupted. If not all intein will be lost the remainder will probably not be able to splice and the conserved region will have an insertion. This favors the survival of inteins found in conserved sequence regions.

The conserved nature of intein integration sites is also clear upon examining the structure of homologs of intein-containing proteins. Most are either parts of active sites or ligand/cofactor binding sites but they have no common structure.
You can examine sites corresponding to intein integration points in the structures of proteins homologous to these intein-containing proteins:
inteins pol integration points Structure of 1TGO, a close homolog of intein-containing archaeal DNA polymerases. Points corresponding to the three known intein integration sites in these proteins are indicated in cyan spheres. More details here.


See additional information on all inteins listed by species and by host proteins.


[Inteins home page]
Page last modified July 2001
Shmuel Pietrokovski <pietro@weizmann.ac.il>