Intein alleles (June '04)

Protein host (number of known inteins) Phylogenetic distribution
of recognized host homologs1
Integration point (number of known alleles) Organism Intein name Intein length Intein EN domain type2 Integration site
Vacuolar-type ATPase catalytic subunit (23) Eucarya,
Archaea,
Bacteria (some).
Also homologous to other types of bacterial and other ATPase b and a subunits.
VMA-a (16) Saccharomyces cerevisiae Sce VMA 454 aa DOD
DAIIYVG*CGERGNE  283/738
Saccharomyces castellii CBS-4309 Sca VMA 517 aa DOD
DSIIYVG*CGERGNE  283/801
Saccharomyces species DH1-1A SceDH1-1A VMA1 454 aa DOD
DAIIYVG*CGERGNE  328/483 
Saccharomyces cariocanus UFRJ-50791 Scar VMA 454 aa DOD
DAIIYVG*CGERGNE  318/473
Saccharomyces dairenensis CBS-421 Sda VMA 501 aa DOD
DTIIYVG*CGERGNE  318/520
Saccharomyces exiguus CBS-379 Sex VMA 502 aa DOD
DAIIYVG*CGERGNE  318/521
Saccharomyces unisporous CBS-398 Sun VMA 414 aa DOD
DTIIYVG*CGERGNE  318/433
Kluyveromyces lactis CBS-683 Kla VMA 410 aa DOD
DAIIYVG*CGERGNE  318/429
Kluyveromyces polysporus CBS-2163 Kpo VMA 433 aa DOD
DAIIYVG*CGERGNE  318/452
Zygosaccharomyces bailii CBS-685 Zba VMA 456 aa DOD
DAIIYVG*CGERGNE  318/475
Zygosaccharomyces bisporus CBS-702 Zbi VMA 450 aa DOD
DAIIYVG*CGERGNE   39/460
Zygosaccharomyces rouxii CBS-688 Zro VMA 450 aa DOD
DAIVYVG*CGERGNE  318/469
Torulaspora globosa CBS-764 Tgl VMA 456 aa DOD
DVIIYVG*CGERGNE  318/47
Torulaspora pretoriensis CBS-5080 Tpr VMA 455 aa DOD
DAIIYVG*CGERGNE  318/474
Candida glabrata Cgl VMA 415 aa DOD
*  278/692
Candida tropicalis Ctr VMA 471 aa DOD
DVIIYVG*CGERGNE  283/755

VMA-b (7) Pyrococcus horikoshii OT3 Pho VMA 377 aa DOD
GPFGSGK*TVTQHQL  240/617
Pyrococcus furiosus Pfu VMA 424 aa DOD
GPFGSGK*TVTQHQL  240/665
Pyrococcus abyssi Pab VMA 429 aa DOD
GPFGSGK*TVTQHQL  240/670
Thermoplasma acidophilum Tac VMA 173 aa -
GPFGSGK*TVIQHQL 235/408
Thermoplasma volcanium Tvo VMA 186 aa -
GPFGSGK*TVIQHQL  236/421
Ferroplasmaspecies type II FspII VMA 535 aa DOD
*  237/773
Picrophilus torridusDSM 9790 Pto VMA 333 aa DOD
*  236-570

DnaB helicase (16) Bacteria,
Eucarya (plastids).
dnaB-a (9) Porphyra purpurea (chloroplast) Ppu dnaB 150 aa -
SDLRESG*SIEQDAD  361/512
Guillardia theta (plastid) Gth dnaB 160 aa -
SDLKESG*SIEQDAD  376/537
Synechocystis spp. PCC6803 Ssp dnaB 429 aa DOD
SDLRESG*SIEQDAD  380/810
Nostoc spp. PCC7120 (Anabaena PCC7120) Asp dnaB 429 aa DOD
SDLRESG*SIEQDAD  388/818
Nostoc punctiforme Npu dnaB 429 aa DOD
SDLRESG*SIEQDAD  388/818
Trichodesmium erythraeum IMS101 Ter dnaB-2 177 aa -
SDLRESG*SIEQDAD  2030/2208
Mycobacterium tuberculosis Mtu dnaB 415 aa DOD
ADLRESG*SLEQDAD  399/816
Mycobacterium smegmatis Msm dnaB-2 426 aa DOD
SDLRESG*SLEQDAD  543/969
Rhodothermus marinus Rma dnaB 428 aa DOD
SDLRESG*SIEQDAD  421/850

dnaB-b (8) Mycobacterium leprae Mle dnaB 146 aa -
ARPGVGK*STLGLDF  233/379
Mycobacterium smegmatis Msm dnaB-1 140 aa -
ARPGVGK*STLGLDF  237/377
Mycobacterium avium Mav dnaB 337 aa DOD
ARPGVGK*STLGLDF  232/570
Mycobacterium avium paratuberculosis MavPT dnaB 418 aa DOD
ARPGVGK*STLGLDF 3215/633
Mycobacterium intracellulare Min dnaB 335 aa DOD
ARPGVGK*STLGLDF  316/350
Trichodesmium erythraeum IMS101 Ter dnaB-1 >1650 aa DOD
GRPSMGK*TSFAVNI  212/1863
Crocosphaera watsonii WH8501 Cwa dnaB 496 aa DOD
GRPSMGK*TAFGLGV  213/708
Gloeobacter violaceus PCC7421 Gvi dnaB 258 aa DOD
GRPGMGK*TAFSLSI  221/480

dnaB-c (1) Coxiella burnetii Cbu dnaB 146 aa -
VMSDLRE*SGAIEQD  389/536

DNA gyrase subunit A (7) Bacteria,
Eucarya (DNA topoisomerase II),
Archaea (some)
gyrA-a (7) Mycobacterium leprae Mle gyrA 420 aa DOD
PPAAMRY*TEARLTP  130/551
Mycobacterium flavescens Mfl gyrA 421 aa DOD
PPAAMRY*TEARLTP  369/491
Mycobacterium gordonae Mgo gyrA 420 aa DOD
PPAAMRY*TEARLTP  366/487
Mycobacterium kansasii Mka gyrA 420 aa DOD
PPAAMRY*TEARLTP  365/486
Mycobacterium xenopi Mxe gyrA 198 aa -
PPAAMRY*TEAPLTP  365/264
Mycobacterium malmoense Mma gyrA 420 aa DOD
PPAAMRY*TEARRAI  349/470
Mycobacterium gastri Mga gyrA 420 aa DOD
PPAAMRY*TEARLTP  364/484

Ribonucleotide reductase - class I a subunit and class II (32) Eucarya (class I),
Archaea (class II),
Bacteria (class I and II).
RIR1-a (2) Pyrococcus furiosus Pfu RIR1-1 455 aa DOD
IQKMGGG*TGLNFSK  301/756
Pyrococcus abyssi Pab RIR1-1 399 aa DOD
IQKMGGG*TGLNFSK  301/702

RIR1-c (1) Pyrococcus abyssi Pab RIR1-2 438 aa DOD
GTTTGAA*SGPVSFM  722/1162

RIR1-h (1) Azotobacter vinelandii Avi RIR1 378 aa DOD
TAVAVNQ*GGKRKGA  442/821

RIR1-e (2) Thermus thermophilus Tth RIR1-1 439 aa DOD
RVVRQGG*TRRGAGM  300/739
Gloeobacter violaceus PCC7421 Gvi RIR1-1 413 aa DOD
KITTGNK*SRRGAFM  193/607

RIR1-d (2) Nostoc spp. PCC7120 (Anabaena PCC7120) Asp RIR1 407 aa DOD
VAGNIRR*SAGMRQF  275/683
Trichodesmium erythraeum IMS101 Ter RIR1-1 394 aa -
VAGNIRR*SAGIRQG  274/669

RIR1-b (20) Chilo iridescent virus (CIV) CIV RIR1 339 aa DOD
TIKQSNL*CSEIILP  271/611
prophage SPb (integrated in B.subtilis) Spb RIR1 385 aa DOD
KVKFSNL*CSEVLQS  380/766
A prophage in Bacillus spp. M1918 (integrated in B.species M1918) BspM1918 RIR1 385 aa DOD
KVKFSNL*CSEVLQS 3 30/416
Trichodesmium erythraeum IMS101 Ter RIR1-2 373 aa DOD
ARLGLNP*CGEIIGS 698/1072
Gloeobacter violaceus PCC7421 Gvi RIR1-2 367 aa DOD
FITTTNP*CGEIWLP  755/1123
Crocosphaera watsonii WH8501 Cwa RIR1 347 aa DOD
YIPGVNL*CTESFSN  462/810
Synechococcus elongatus PCC7942 Sel RIR1 370 aa DOD
QRYGLNP*CGEILGA  409/780
Deinococcus radiodurans Dra RIR1 367 aa DOD
EIRSTNP*CGEIPLT  524/891
Thermus thermophilus Tth RIR1-2 408 aa DOD
QIRSTNP*CGEIPLT  887/1295
Methylococcus capsulatus Mca RIR3 381 aa DOD
RIEATNP*CAEQPLP  341/722
Carboxydothermus hydrogenoformans Chy RIR1 345 aa DOD
EIESTNP*CGEQPLL  276/621
Desulfitobacterium hafniense Dha RIR1 365 aa DOD
EIEATNP*CGEQPLL  283/650
Staphylococcus epidermidis Sep RIR1 384 aa DOD
EIKMSNL*CTEIFQY  377/762
Pyrococcus furiosus Pfu RIR1-2 383 aa DOD
PIRATNP*CGEEPLY  914/1297
Pyrococcus horikoshii OT3 Pho RIR1 385 aa DOD
PIRATNP*CGEEPLY  460/846
Pyrococcus abyssi Pab RIR1-3 382 aa DOD
PIRATNP*CGEEPLY 1297/1681
Methanobacterium thermoautotrophicum Mth RIR1 134 aa -
RIEATNP*CGEQPLL  274/409
Ferroplasma acidiphilum Fac RIR1 366 aa DOD
YIESTNP*CGEQPLL 3437/804
Ferroplasma acidarmanus type I FacI RIR1 366 aa DOD
YIESTNP*CGEQPLL 437/804
Ferroplasmaspecies type II FspII RIR1 366 aa DOD
NIESTNP*CGEQPLL 437/804

RIR1-f (1) Trichodesmium erythraeum IMS101 Ter RIR1-3 323 aa DOD
IIGSNFH*CNLSEIH 1081/1405

RIR1-g (1) Trichodesmium erythraeum IMS101 Ter RIR1-4 381 aa DOD
TTVQPSG*TKSLLTN 1534/1916

DNA reverse gyrase/Topoisomerase-I (topA) domain (4) Archaea,
Bacteria,
Eucarya (DNA topoisomerase III).
r-gyr-a (3) Methanococcus jannaschii Mja r-gyr 494 aa DOD
ELFELGL*CTYHRTS  866/1361
Pyrococcus horikoshii OT3 Pho r-gyr 410 aa DOD
DLFEAGL*CTYHRTD  953/1364
Pyrococcus furiosus Pfu topA 373 aa DOD
SLYEKGF*CSYPRTE  314/688

topA-a (1) Haloarcula marismortui ATCC43049 Hma TopA 494 aa DOD
DKGLGTK*STRHNSI  327/865

DNA polymerase family B (archaeal polA gene) (18) Archaea,
Eucarya (Dpols a, d, and e),
Bacteria (Dpol II).
pol-a4 (4) Thermococcus fumicolans Tfu pol-1 360 aa DOD
IAYLDFR*SLYPSII  406/767
Thermococcus aggregans Tag pol-1 360 aa DOD
IAYLDFR*SLYPSII  409/770
Pyrococcus kodakaraensis (strain KOD1) Pko pol-1 360 aa DOD
IVYLDFR*SLYPSII  406/767
Methanococcus jannaschii Mja pol-1 369 aa DOD
IISMDFR*SLYPSII  425/795

pol-b4 (9) Thermococcus spp. GE8 TspGE8 pol-1 536 aa DOD
AIKILAN*SYYGYYG  491/1027
Thermococcus hydrothermalis Thy pol-1 538 aa DOD
AIKILAN*SYYGYYG 3458/996
Thermococcus litoralis Tli pol-1 538 aa DOD
AIKLLAN*SYYGYMG  494/1033
Thermococcus aggregans Tag pol-2 538 aa DOD
AVKLLAN*SYYGYMG  854/1393
Pyrococcus spp. GB-D Psp pol-1 537 aa DOD
AIKILAN*SYYGYYG  492/1030
Pyrococcus kodakaraensis (strain KOD1) Pko pol-2 537 aa DOD
AIKILAN*SYYGYYG  851/1389
Pyrococcus horikoshii OT3 Pho pol 460 aa DOD
AIKILAN*SYYGYYG  492/953
Methanococcus jannaschii Mja pol-2 476 aa DOD
SLKILAN*SVYGYLA  882/1359
Haloarcula marismortui ATCC43049 Hma PolB1 409 aa DOD
AVKVIMN*SLYGVLG 589/999

pol-c4 (7) Thermococcus spp. GE8 TspGE8 pol-2 390 aa DOD
FKVLYAD*TDGFFAT 1075/1465
Thermococcus hydrothermalis Thy pol-2 390 aa DOD
FKVLYAD*TDGFFAT 31044/1434
Thermococcus litoralis Tli pol-2 390 aa DOD
FKVLYAD*TDGFYAT 1081/1472
Thermococcus fumicolans Tfu pol-2 389 aa DOD
FKVLYAD*TDGFFAT  900/1290
Thermococcus aggregans Tag pol-3 157 aa -
FKVLYAD*TDGFYAT 1441/1599
Haloferax volcanii DS2 ATCC29605 Hvo PolB1 437 aa DOD
YDVAYGD*TDSVMLE 625/1063
Nanoarchaeum equitans Kin4-M Neq pol 98+30 aa -
FKVIYGD*         556/
*TDSLFIS /31

KlbA virulence protein (4) Archaea (type II secretion system proteins),
Bacteria (virB proteins, ATPases involved in pilli formation).
klbA-a (4) Pyrococcus horikoshii OT3 Pho klbA 521 aa DOD
MNTGHDG*CMGTIHS  451/972
Methanococcus jannaschii Mja klbA 169 aa -
MNTGHDG*CSGTLHA  404/573
Pyrococcus furiosus Pfu klbA 522 aa DOD
MNTGHDG*CMGTIHA  463/986
Pyrococcus abyssi Pab klbA 196 aa -
MNTGHDG*CMGTIHS  453/650

Translation initiation factor bIF-2 / EF2 translation elongation and release factor (5) Archaea,
Bacteria,
Eucarya.
IF2-a (5) Methanococcus jannaschii Mja IF2 547 aa DOD
GHVDHGK*TTLLDKI   30/577
Methanopyrus kandleri AV19 Mka EF2 523 aa DOD
AHIDHGK*TTLSDQL   34/557
Pyrococcus horikoshii OT3 Pho IF2 445 aa DOD
GHVDHGK*TTLLDKI   19/464
Pyrococcus furiosus Pfu IF2 387 aa DOD
GHVDHGK*TTLLDRI   19/407
Pyrococcus abyssi Pab IF2 394 aa DOD
GHVDHGK*TTLLDRI   20/415

Uncharacterized protein, InterPro family UPF0027 (E.coli rtcB homologs) (6) Archaea,
Eucarya,
Bacteria.
rtcB-a5 (6) Methanococcus jannaschii Mja rtcB 489 aa DOD
GVGFDIN*CGVRLIR   97/586
Methanopyrus kandleri AV19 Mka rtcB 483 aa DOD
GVGYDIN*CGVRVMK  100/583
Pyrococcus horikoshii OT3 Pho rtcB 390 aa DOD
GIGYDIN*CGVRLIR   97/488
Pyrococcus furiosus Pfu rtcB 481 aa DOD
GIGYDIN*CGVRLIR   97/579
Pyrococcus abyssi Pab rtcB 437 aa DOD
GIGYDIN*CGVRLIR   97/534
Nostoc punctiforme Npu rtcB 323 aa HNH
AVGVDIG*CGMSAIK   77/400

Replication factor C, 37 Kd subunit / DNA polymerase III t subunit (10) Archaea (RFC),
Eucarya (RFC),
Bacteria (DNA polymerase III g/t subunit).
RFC-a (4) Pyrococcus horikoshii OT3 Pho RFC 526 aa DOD
GPPGVGK*TTAALAL   58/584
Methanococcus jannaschii Mja RFC-1 549 aa DOD
GPPGVGK*TTAALCL   53/602
Pyrococcus furiosus Pfu RFC 525 aa DOD
GPPGVGK*TTAALAL   59/585
Pyrococcus abyssi Pab RFC-1 499 aa DOD
GPPGVGK*TTAALAL   61/561

RFC-b (2) Methanococcus jannaschii Mja RFC-2 437 aa DOD
NFLELNA*SDERGID  626/1063
Methanopyrus kandleri AV19 Mka RFC 306 aa -
NFLELNA*SDERGID   82/388

RFC-c (2) Methanococcus jannaschii Mja RFC-3 544 aa DOD
VCRFILS*CNYPSKI 1124/1668
Pyrococcus abyssi Pab RFC-2 608 aa DOD
NVRFILS*CNYSSKI  647/1256

dnaX-a (2) Synechocystis spp. PCC6803 Ssp dnaX 431 aa unknown
KVYVIDE*CHMLSTA  129/560
Spirulina platensis C1 Spl dnaX 136 aa -
KVYVIDE*CHMLSTA  129/266

recA / radA DNA repair protein (12) Bacteria, (recA)
Archaea (radA),
Eucarya (rad51/dmc1).
recA-a (1) Mycobacterium tuberculosis Mtu recA 441 aa DOD
VKVVKNK*CSPPFKQ  251/692

recA-b (9) Mycobacterium leprae Mle recA 366 aa DOD
KIGVMFG*SPETTTG  205/571
Mycobacterium flavescens Mfl recA 364 aa DOD
KIGVMFG*SPETTTG  205/570
Mycobacterium flavescens ATCC14474 Mfl recA 14474 365 aa DOD
KIGVMFG*SPETTTG  3
Mycobacterium fallax Mfa recA 364 aa DOD
KIGVMFG*SPETTTG  3
Mycobacterium chitae Mch recA 365 aa DOD
KIGVMFG*SPETTTG  3
Mycobacterium gastri Mga recA 369 aa DOD
KIGVMFG*SPETTTG  3
Mycobacterium shimoidei Msh recA 365 aa DOD
KIGVMFG*SPETTTG  3
Mycobacterium thermoresistibile Mth recA 366 aa DOD
KIGVMFG*SPETTTG  3
Thermomonospora fusca Tfus recA-2 358 aa DOD
KVGVMFG*SPETTSG  626/984

recA-c (2) Pyrococcus horikoshii OT3 Pho radA 173 aa -
GEFGSGK*TQLAHTL  149/322
Thermomonospora fusca Tfus recA-1 423 aa DOD
GPESSGK*TTVALHA   72/495

Cell division control protein 21 (CDC21) (6) Archaea,
Eucarya,
Bacteria (distant homologs - E.coli f516, Aquifex AQ_291 and Rhizobium NifA).
CDC21-a (4) Pyrococcus horikoshii OT3 Pho CDC21-1 169 aa -
GDPGVAK*SQLLRYV  334/503
Pyrococcus abyssi Pab CDC21-1 164 aa -
GDPGVAK*SQLLRYI  334/499
Halobacterium spp. NRC-1 Hsp CDC21 183 aa -
GDPGTGK*SQMISYV  282/463
Haloarcula marismortui ATCC43049 Hma CDC21 477 aa DOD
GDPGTGK*SQMLSYI  331/809

CDC21-b (3) Pyrococcus horikoshii OT3 Pho CDC21-2 261 aa -
SSSAAGL*TAAVVRD  529/790
Pyrococcus furiosus Pfu CDC21 367 aa DOD
SSSAAGL*TAAAVRD  361/799
Pyrococcus abyssi Pab CDC21-2 268 aa -
SSSAAGL*TAAVVRD  525/794

ATP-dependent protease LA (lon gene) (3) Archaea,
Bacteria,
Eucarya.
lon-a (3) Pyrococcus horikoshii OT3 Pho lon 475 aa DOD
VRHDPFQ*SGGLGTP  203/678
Pyrococcus furiosus Pfu lon 401 aa DOD
VRHDPFQ*SGGLGTP  203/605
Pyrococcus abyssi Pab lon 333 aa DOD
VRHDPFQ*SGGLGTP  220/554

Archaeal DNA polymerase Pol II, DP2 subunit (Archaeal polC gene) (3) Archaea. polC-a (4) Pyrococcus horikoshii OT3 Pho polC 167 aa -
HAAKRRN*CDGDEDA  951/1118
Pyrococcus abyssi Pab polC 185 aa -
HAAKRRN*CDGDEDA  954/1140
Halobacterium spp. NRC-1 Hsp polC 196 aa -
HAAKRRN*CDGDEDC  925/1121
Haloarcula marismortui ATCC43049 Hma polC 179 aa -
HAAKRRN*CDGDEDC  965/1146

Uncharacterized protein, InterPro UPF0051 family (E.coli sufD homologs, mycobacterial pps1 proteins) (6) Bacteria,
Eucarya (chloroplast, cyanelle),
Archaea.
pps1-a (1) Mycobacterium leprae Mle pps1 387 aa DOD
TAVWSGG*SFIYVPP  201/588

pps1-b (4) Mycobacterium tuberculosis Mtu pps1 360 aa DOD
YVHYVEG*CTAPIYK  252/612
Ferroplasma acidiphilum Fac pps1 356 aa DOD
KVHYIEG*CTAPKYN  242/599
Ferroplasma acidarmanus type I FacI pps1 356 aa DOD
KVHYIEG*CTAPKYN  242/599
Ferroplasmaspecies type II FspII pps1 356 aa DOD
KVHYIEG*CTAPKYN  242/599

pps1-c (1) Mycobacterium gastri Mga pps1 378 aa DOD
GVAAQYE*SEVVYHQ   15/393

pre-mRNA splicing factor Prp8 (4) Eucarya.
PRP8-a (4) Filobasidiella (Cryptococcus) neoformans Cne PRP8 172 aa -
GLFWEKA*SGFEESM 1531/1704
Ajellomyces (Histoplasma) capsulatus Hca PRP8 530 aa DOD
GLFWERA*SGFEESM 31515/2046
Aspergillus fumigatus Afu PRP8 819 aa DOD
GLFWERA*SGFEESM 3658/1478
Emericella (Aspergillus) nidulans FGSC-4A Ani PRP8 605 aa DOD
* 1525/2131
dTDP-glucose 4-6-dehydratase (3) Bacteria,
Eucarya,
Archaea.
rfbB-a (2) Synechococcus spp. PCC7002 Ssp2 RfbB 332 aa DOD
NPIGIRS*CYDEGKR  150/483
Trichodesmium erythraeum IMS101 Ter RfbB-1 336 aa DOD
NCIGIRS*CYDEGKR  144/481

rfbB-b (1) Trichodesmium erythraeum IMS101 Ter RfbB-2 430 aa DOD
RVARIFN*TYGPRML  511/942

DNA gyrase subunit B (2) Bacteria,
Archaea,
Eucarya (DNA topoisomerase II).
gyrB-a (2) Synechocystis spp. PCC6803 Ssp gyrB 436 aa HNH
EGDSAGG*SAKQGRD  436/872
Trichodesmium erythraeum IMS101 Ter gyrB 244 aa -
EGDSASG*SAKQGRD  439/684

SNF2 helicase Bacteria,
Eucarya.
snf2-a (2) Deinococcus radiodurans Dra snf2 343 aa DOD
DDMGLGK*TLQTLAH  693/1037
Trichodesmium erythraeum IMS101 Ter snf2 469 aa DOD
DDMGLGK*TIQTIAF  271/741

Conserved protein similar to halobacterial Thy1 thymidylate synthase complementing protein Bacteria,
Archaea.
Thy1-a (2) Synechococcus spp. PCC7002 Ssp2 Thy1 444 aa HNH
GVSFDVQ*SFRYTGQ   96/541
Trichodesmium erythraeum IMS101 Ter Thy1 298 aa -
RTHRIGC*SFDVQSY   91/390

Phosphoenol pyruvate synthase (2) Archaea,
Bacteria,
Eucarya.
PEPsyn-a (2) Methanococcus jannaschii Mja PEPsyn 412 aa DOD
TDEGGLT*CHAAIVS  410/823
Crocosphaera watsonii WH8501 Cwa PEPsyn 394 aa DOD
TNQGGRT*CHAAIIA  453/848

DNA polymerase III a subunit (18) Bacteria. dnaE-d (1) Trichodesmium erythraeum IMS101 Ter DnaE1-1 1308 aa DOD
KETYGVL*CYQEQIM  719/2028

dnaE-b (2) Thermus thermophilus Tth DnaE-1 424 aa DOD
EADLLRR*SMGKKKV  767/1191
Trichodesmium erythraeum IMS101 Ter DnaE1-2 428 aa DOD
EADLLRR*CMGKKKV  2053/2482

dnaE-a (13) Synechocystis spp. PCC6803 Ssp DnaE 6 123+36 aa -
MVKFAEY*         774/
*CFNKSHS /37
Synechococcus spp. PCC7002 Ssp2 DnaE 6 106+36 aa -
MVKFAEY*         775/
*CFNKSHS /37
Nostoc spp. PCC7120 (Anabaena PCC7120) Asp DnaE 6 102+36 aa -
MLKFAEY*         775/
*CFNKSHS /37
Nostoc punctiforme Npu DnaE 6 102+36 aa -
MLKFAEY*         774/
*CFNKSHS /37
Trichodesmium erythraeum IMS101 Ter DnaE-3 6 102+36 aa -
MIKFAEY*        2525/
*CFNKSHS /37
Thermosynechococcus elongatus BP-1 Tel DnaE 6 117+35 aa -
MLDFAEY*         871/
*CFNKSHS /36
Crocosphaera watsonii WH8501 Cwa DnaE 6 106+36 aa -
MIKFAEY*         774/
*CFNKSHS /37
Synechococcus elongatus PCC7942 Sel DnaE 6 113+36 aa -
MVLFAEY*         757/
*CFNKSHS /37
Anabaena variabilis ATCC29413 Ava DnaE 6 102+36 aa -
MLKFAEY*         775/
*CFNKSHS /37
Aphanizomenon ovalisporum Aov DnaE 6 101+36 aa -
MLNFAEY*         ???/
*CFNKSHS /37
Oscillatoria limnetica Oli DnaE 6 112+36 aa -
MVKFAEY*         ???/
*CFNKSHS /37
Thermosynechococcus vulcanus Tvu DnaE 6 117+35 aa -
MLDFAEY*         ???/
*CFNKSHS /36
Aphanothece halophytica Aha DnaE 6 110+36 aa -
MIKFAEY*         ???/
*CFNKSHS /37

dnaE-c (2) Thermus thermophilus Tth DnaE-2 424 aa DOD
ANYGFNK*SHAAAYS 1238/1662
Gemmata obscuriglobus UQM 2246 Gob DnaE 446 aa DOD
GGYGFNK*SHTAAYA  761/1208
Protein types with more than one intein (in the same polypetide or on homologs) but all are in different integration points - inteins are not alleles
class-III (anaerobic) Ribonucleotide reductase (2) Archaea,
Bacteria.
(Eucarya - similarity to a 60 aa on the N-termini of a subunits of class I ribonucleotide reductase.)
RNR-a (1) Methanococcus jannaschii Mja RNR-1 454 aa DOD
YVARGGQ*TIFSSIN  337/791

RNR-b (1) Methanococcus jannaschii Mja RNR-2 534 aa DOD
TQTPAES*TAGRFAR 1058/1592
Protein types with only one known intein
Protein similar to a type of phage terminase large subunits (E.coli ymfN gene) Bacteria. terA-a (1) Clostridium thermocellum (probably a prophage) Cth terA 334 aa DOD
IPKKNGK*SELAAAV   84/418

Protein similar to a type of phage terminase large subunits (i.e. Rhodobacter capsulatusputative terminase large subunit) Bacteria (some),
Archaea (some).
terA2-a (1) Methanococcoides burtonii Mbu terA2 318 aa -
EGRSPDR*CDALVWA  126/445

Bacterial conserved hypothetical protein of unknown function; has N' ParB nuclease domain and C' DNA methylase domain (see Interpro domains ParB nuclease and DNA methylase) Bacteria (some),
Archaea (some).
o681-a (1) Gemmata obscuriglobus UQM 2246 Gob o681 287 aa unknown
YHWKHEP*CLYGWID  286/574

TrbC - a protein with an AAA ATPase domain (see Interpro domain AAA ATPases) Bacteria (some). trbC-a (1) Methylobacterium extorquens Mex trbC 366 aa DOD
GTTGAGK*TETLLGF   51/417

Some type of helicase (see Interpro families Helicases and DEAD/DEAH box helicases) Bacteria,
Eucarya (some).
00302-a (1) Methylobacterium extorquens Mex Helic 270 aa -
DIVVVDE*CHRWFEM  119/390

SpoVR - function unknown, family includes B.subtilis protein involved in spore cortex synthesis, related proteins from other bacilli, enterobacteria (E.coli YcgB), Pseudomonads and an Archaeon (Halobacterium) Bacteria (some),
Archaea (some).
SpoVR-a (1) Chloroflexus aurantiacus Cau SpoVR 277 aa -
ILNEGWA*SYWHSTI  278/555

Clp protease catalytic subunit (1) Eucaria,
Bacteria.
clpP-a (1) Chlamydomonas eugametos (chloroplast) Ceu clpP 457 aa DOD
VMIHQPE*SSIQGQA  447/904

class-I Ribonucleotide reductase b subunit (1) Bacteria,
Eucarya.
RIR2-a (1) Aquifex aeolicus Aae RIR2 347 aa DOD
YINRDEL*CHVTLFR  229/576

ATP dependent helicase; LHR - large helicase-related protein (1) Archaea,
Eucarya,
Bacteria.
LHR-a (1) Pyrococcus horikoshii OT3 Pho LHR 476 aa DOD
GELRAVV*SSTSLEL  345/821

DEAD/DEAH-box helicase (MJ1124) (1) Archaea,
Eucarya,
Bacteria.
helicase-a (1) Methanococcus jannaschii Mja helicase 502 aa DOD
ICCTPTL*SAGLNLP  337/839

Molybdenum cofactor biosynthesis A (MoaA) like protein Archaea,
bacteria,
Eucarya.
moaA-a (1) Pyrococcus abyssi Pab moaA 437 aa DOD
CNLNCWY*CFFYARE   97/534

Archaeal uncharacterized protein (M.jannaschii MJ0043) (1) Archaea,
Bacteria (Bsu YqxK).
hyp1-a (1) Methanococcus jannaschii Mja hyp1 393 aa DOD
GLIGPAH*CFTPWTS  128/521

Glutamine fructose 6 phosphate transaminase (1) Archaea,
Bacteria,
Eucarya.
GF6P-a (1) Methanococcus jannaschii Mja GF6P 500 aa DOD
GNIGIGH*SRWATHG   74/574

RNA polymerase subunit A' (1) Archaea (RNA pol A' subunit),
Eucarya (RNA pol I/II/III A subunit),
Bacteria (RNA pol b' subunit).
RpolA'-a (1) Methanococcus jannaschii Mja RpolA' 453 aa DOD
FRHNLCV*CPPYNAD  463/916

RNA polymerase subunit A" (1) Archaea (RNA pol A" subunit),
Eucarya (RNA pol I/II/III A subunit),
Bacteria (RNA pol b' subunit).
RpolA"-a (1) Methanococcus jannaschii Mja RpolA" 472 aa DOD
GEPGTQM*TMRTFHY   75/547

Transcription factor IIB (1) Archaea,
Eucarya.
TFIIB-a (1) Methanococcus jannaschii Mja TFIIB 336 aa DOD
VGAPMTY*TIHDKGL   99/435

UDP-glucose dehydrogenase (1) Archaea,
Bacteria,
Eucarya.
UDPGD-a (1) Methanococcus jannaschii Mja UDPGD 455 aa DOD
GIGYGGS*CFPKDVK  260/715

Archaeal uncharacterized protein (A.pernix APE0745) Archaea,
bacteria.
hyp3-a (1) Aeropyrum pernix Ape hyp3 468 aa DOD
QYAITTQ*SAFGWGL  175/644

DNA polymerase I Bacteria,
Eucarya.
dpo1-a (1) Bacteriophage APSE-1 APSE1 dpo1 306 aa -
KTYGGKS*CENICQA  608/915

Phage Mu protein F homolog - function unknown Bacteria MupF-a (1) Methylococcus capsulatus Mca MupF 315 aa unknown
WPPNGFN*CRCRVRP  183/498

CDC48 ATPase cell division control protein 48 Archaea,
Eucarya,
Bacteria (FtsH).
CDC48-a (1) Methanopyrus kandleri AV19 Mka CDC48 395 aa DOD
LSKWVGE*SEKKIRE  634/1029

Vacuolar-type ATPase b subunit Archaea,
Eucarya,
Bacteria.
VatB-a (1) Methanopyrus kandleri AV19 Mka VatB 518 aa DOD
LTDMTNY*CEALREI  260/778
Totals:
46 types of protein family hosts 7.
187 separate proteins with inteins:
150 with a single intein,
14 pairs each with one split intein 2,
18 with 2 inteins,
3 with 3 inteins,
1 with 4 inteins, and
1 pair: one with two and half inteins and the other with half of an intein (the two halves being parts of a split intein)
25/46 host groups found in Archaea, Bacteria and Eucaria,
8/46 found in Archaea and Bacteria,
1/46 found in Archaea and Eucaria,
6/46 found in Bacteria and Eucaria,
1/46 found just in Archaea,
1/46 found just in Eucaria,
4/46 found just in Bacteria.
71 different integration points, 36 with more than one allele 99 different species and strains.
24 eukaryotes (20 in unicellular organisms, 3 in plastids and 1 in viruses),
50 bacteria (4 in bacterio- and pro- phages),
25 archaea.
216 inteins 6 2 known EN domain types2, 47 inteins without an EN domain

1 Phylogenetic distribution was determined by searching the NCBI non-redundant protein database with the blastp program using the intein host proteins as queries.
2 Known intein endonuclease (EN) domain types: DOD - dodecapeptide (LAGLI-DADG) and HNH (see Belfort and Roberts '97).
3 Partial sequences.
4 The intein integration points in the type B DNA polymerases have been renamed on April '99. I joined the NEB Intein database and renamed the integration points according the type B DNA polymerase domain in which they are present. See positions of inteins in their host proteins. Integration point pol-a used to be named here pol-c, pol-b used to be named pol-a and pol-c used to be named pol-b. The colors of the integartion points are unchanged.
5 InterPro family UPF0027 (E.coli rtcB homologs) was previously named here "hyp2". The function of these proteins is still unknown but now they are named and better characterized. Details can be found in Genschik, Drabikowski & Filipowicz, J Biol Chem 273:25516-25526 (1998).
6 Cyanobacterial DnaEs and Nanoarchaeum equitans pol are split inteins - the N' and C' parts of the inteins are found on different protein molecules (e.g., the N' and C' parts of the DNA polymerase III a subunit [DnaE or PolC], respectively, and the pol1 and pol2 N' and C' parts of the Neq DNA polymerase type B). These two molecules are probably trans protein-spliced to form the mature DnaE or PolA proteins.
7 In counting the types of protein families with inteins DNA reverse gyrases are grouped together with Topoisomerase type I and Replication factor C, 37 Kd subunits are grouped together with DNA pol-3 t subunits.


See additional information on inteins listed by species.


[Inteins home page]
Page last modified June 2004
Shmuel Pietrokovski <pietro@weizmann.ac.il>