| Protein host (number of known inteins) | Phylogenetic distribution of recognized host homologs1 |
Integration point (number of known alleles) | Organism | Intein name | Intein length | Intein EN domain type2 | Integration site |
|---|---|---|---|---|---|---|---|
| Vacuolar-type ATPase catalytic subunit (23) |
Eucarya, Archaea, Bacteria (some). Also homologous to other types of bacterial and other ATPase b and a subunits. |
VMA-a (16) | Saccharomyces cerevisiae | Sce VMA | 454 aa | DOD | DAIIYVG*CGERGNE 283/738 |
| Saccharomyces castellii CBS-4309 | Sca VMA | 517 aa | DOD | DSIIYVG*CGERGNE 283/801 |
|||
| Saccharomyces species DH1-1A | SceDH1-1A VMA1 | 454 aa | DOD | DAIIYVG*CGERGNE 328/483 |
|||
| Saccharomyces cariocanus UFRJ-50791 | Scar VMA | 454 aa | DOD | DAIIYVG*CGERGNE 318/473 |
|||
| Saccharomyces dairenensis CBS-421 | Sda VMA | 501 aa | DOD | DTIIYVG*CGERGNE 318/520 |
|||
| Saccharomyces exiguus CBS-379 | Sex VMA | 502 aa | DOD | DAIIYVG*CGERGNE 318/521 |
|||
| Saccharomyces unisporous CBS-398 | Sun VMA | 414 aa | DOD | DTIIYVG*CGERGNE 318/433 |
|||
| Kluyveromyces lactis CBS-683 | Kla VMA | 410 aa | DOD | DAIIYVG*CGERGNE 318/429 |
|||
| Kluyveromyces polysporus CBS-2163 | Kpo VMA | 433 aa | DOD | DAIIYVG*CGERGNE 318/452 |
|||
| Zygosaccharomyces bailii CBS-685 | Zba VMA | 456 aa | DOD | DAIIYVG*CGERGNE 318/475 |
|||
| Zygosaccharomyces bisporus CBS-702 | Zbi VMA | 450 aa | DOD | DAIIYVG*CGERGNE 39/460 |
|||
| Zygosaccharomyces rouxii CBS-688 | Zro VMA | 450 aa | DOD | DAIVYVG*CGERGNE 318/469 |
|||
| Torulaspora globosa CBS-764 | Tgl VMA | 456 aa | DOD | DVIIYVG*CGERGNE 318/47 |
|||
| Torulaspora pretoriensis CBS-5080 | Tpr VMA | 455 aa | DOD | DAIIYVG*CGERGNE 318/474 |
|||
| Candida glabrata | Cgl VMA | 415 aa | DOD | * 278/692 |
|||
| Candida tropicalis | Ctr VMA | 471 aa | DOD | DVIIYVG*CGERGNE 283/755 |
|||
| VMA-b (7) | Pyrococcus horikoshii OT3 | Pho VMA | 377 aa | DOD | GPFGSGK*TVTQHQL 240/617 |
||
| Pyrococcus furiosus | Pfu VMA | 424 aa | DOD | GPFGSGK*TVTQHQL 240/665 |
|||
| Pyrococcus abyssi | Pab VMA | 429 aa | DOD | GPFGSGK*TVTQHQL 240/670 |
|||
| Thermoplasma acidophilum | Tac VMA | 173 aa | - | GPFGSGK*TVIQHQL 235/408 |
|||
| Thermoplasma volcanium | Tvo VMA | 186 aa | - | GPFGSGK*TVIQHQL 236/421 |
|||
| Ferroplasmaspecies type II | FspII VMA | 535 aa | DOD | * 237/773 |
|||
| Picrophilus torridusDSM 9790 | Pto VMA | 333 aa | DOD | * 236-570 |
|||
| DnaB helicase (16) |
Bacteria, Eucarya (plastids). |
dnaB-a (9) | Porphyra purpurea (chloroplast) | Ppu dnaB | 150 aa | - | SDLRESG*SIEQDAD 361/512 |
| Guillardia theta (plastid) | Gth dnaB | 160 aa | - | SDLKESG*SIEQDAD 376/537 |
|||
| Synechocystis spp. PCC6803 | Ssp dnaB | 429 aa | DOD | SDLRESG*SIEQDAD 380/810 |
|||
| Nostoc spp. PCC7120 (Anabaena PCC7120) | Asp dnaB | 429 aa | DOD | SDLRESG*SIEQDAD 388/818 |
|||
| Nostoc punctiforme | Npu dnaB | 429 aa | DOD | SDLRESG*SIEQDAD 388/818 |
|||
| Trichodesmium erythraeum IMS101 | Ter dnaB-2 | 177 aa | - | SDLRESG*SIEQDAD 2030/2208 |
|||
| Mycobacterium tuberculosis | Mtu dnaB | 415 aa | DOD | ADLRESG*SLEQDAD 399/816 |
|||
| Mycobacterium smegmatis | Msm dnaB-2 | 426 aa | DOD | SDLRESG*SLEQDAD 543/969 |
|||
| Rhodothermus marinus | Rma dnaB | 428 aa | DOD | SDLRESG*SIEQDAD 421/850 |
|||
| dnaB-b (8) | Mycobacterium leprae | Mle dnaB | 146 aa | - | ARPGVGK*STLGLDF 233/379 |
||
| Mycobacterium smegmatis | Msm dnaB-1 | 140 aa | - | ARPGVGK*STLGLDF 237/377 |
|||
| Mycobacterium avium | Mav dnaB | 337 aa | DOD | ARPGVGK*STLGLDF 232/570 |
|||
| Mycobacterium avium paratuberculosis | MavPT dnaB | 418 aa | DOD | ARPGVGK*STLGLDF 3215/633 |
|||
| Mycobacterium intracellulare | Min dnaB | 335 aa | DOD | ARPGVGK*STLGLDF 316/350 |
|||
| Trichodesmium erythraeum IMS101 | Ter dnaB-1 | >1650 aa | DOD | GRPSMGK*TSFAVNI 212/1863 |
|||
| Crocosphaera watsonii WH8501 | Cwa dnaB | 496 aa | DOD | GRPSMGK*TAFGLGV 213/708 |
|||
| Gloeobacter violaceus PCC7421 | Gvi dnaB | 258 aa | DOD | GRPGMGK*TAFSLSI 221/480 |
|||
| dnaB-c (1) | Coxiella burnetii | Cbu dnaB | 146 aa | - | VMSDLRE*SGAIEQD 389/536 |
||
| DNA gyrase subunit A (7) |
Bacteria, Eucarya (DNA topoisomerase II), Archaea (some) |
gyrA-a (7) | Mycobacterium leprae | Mle gyrA | 420 aa | DOD | PPAAMRY*TEARLTP 130/551 |
| Mycobacterium flavescens | Mfl gyrA | 421 aa | DOD | PPAAMRY*TEARLTP 369/491 |
|||
| Mycobacterium gordonae | Mgo gyrA | 420 aa | DOD | PPAAMRY*TEARLTP 366/487 |
|||
| Mycobacterium kansasii | Mka gyrA | 420 aa | DOD | PPAAMRY*TEARLTP 365/486 |
|||
| Mycobacterium xenopi | Mxe gyrA | 198 aa | - | PPAAMRY*TEAPLTP 365/264 |
|||
| Mycobacterium malmoense | Mma gyrA | 420 aa | DOD | PPAAMRY*TEARRAI 349/470 |
|||
| Mycobacterium gastri | Mga gyrA | 420 aa | DOD | PPAAMRY*TEARLTP 364/484 |
|||
| Ribonucleotide reductase - class I a subunit and class II (32) | Eucarya (class I), Archaea (class II), Bacteria (class I and II). |
RIR1-a (2) | Pyrococcus furiosus | Pfu RIR1-1 | 455 aa | DOD | IQKMGGG*TGLNFSK 301/756 |
| Pyrococcus abyssi | Pab RIR1-1 | 399 aa | DOD | IQKMGGG*TGLNFSK 301/702 |
|||
| RIR1-c (1) | Pyrococcus abyssi | Pab RIR1-2 | 438 aa | DOD | GTTTGAA*SGPVSFM 722/1162 |
||
| RIR1-h (1) | Azotobacter vinelandii | Avi RIR1 | 378 aa | DOD | TAVAVNQ*GGKRKGA 442/821 |
||
| RIR1-e (2) | Thermus thermophilus | Tth RIR1-1 | 439 aa | DOD | RVVRQGG*TRRGAGM 300/739 |
||
| Gloeobacter violaceus PCC7421 | Gvi RIR1-1 | 413 aa | DOD | KITTGNK*SRRGAFM 193/607 |
|||
| RIR1-d (2) | Nostoc spp. PCC7120 (Anabaena PCC7120) | Asp RIR1 | 407 aa | DOD | VAGNIRR*SAGMRQF 275/683 |
||
| Trichodesmium erythraeum IMS101 | Ter RIR1-1 | 394 aa | - | VAGNIRR*SAGIRQG 274/669 |
|||
| RIR1-b (20) | Chilo iridescent virus (CIV) | CIV RIR1 | 339 aa | DOD | TIKQSNL*CSEIILP 271/611 |
||
| prophage SPb (integrated in B.subtilis) | Spb RIR1 | 385 aa | DOD | KVKFSNL*CSEVLQS 380/766 |
|||
| A prophage in Bacillus spp. M1918 (integrated in B.species M1918) | BspM1918 RIR1 | 385 aa | DOD | KVKFSNL*CSEVLQS 3 30/416 |
|||
| Trichodesmium erythraeum IMS101 | Ter RIR1-2 | 373 aa | DOD | ARLGLNP*CGEIIGS 698/1072 |
|||
| Gloeobacter violaceus PCC7421 | Gvi RIR1-2 | 367 aa | DOD | FITTTNP*CGEIWLP 755/1123 |
|||
| Crocosphaera watsonii WH8501 | Cwa RIR1 | 347 aa | DOD | YIPGVNL*CTESFSN 462/810 |
|||
| Synechococcus elongatus PCC7942 | Sel RIR1 | 370 aa | DOD | QRYGLNP*CGEILGA 409/780 |
|||
| Deinococcus radiodurans | Dra RIR1 | 367 aa | DOD | EIRSTNP*CGEIPLT 524/891 |
|||
| Thermus thermophilus | Tth RIR1-2 | 408 aa | DOD | QIRSTNP*CGEIPLT 887/1295 |
|||
| Methylococcus capsulatus | Mca RIR3 | 381 aa | DOD | RIEATNP*CAEQPLP 341/722 |
|||
| Carboxydothermus hydrogenoformans | Chy RIR1 | 345 aa | DOD | EIESTNP*CGEQPLL 276/621 |
|||
| Desulfitobacterium hafniense | Dha RIR1 | 365 aa | DOD | EIEATNP*CGEQPLL 283/650 |
|||
| Staphylococcus epidermidis | Sep RIR1 | 384 aa | DOD | EIKMSNL*CTEIFQY 377/762 |
|||
| Pyrococcus furiosus | Pfu RIR1-2 | 383 aa | DOD | PIRATNP*CGEEPLY 914/1297 |
|||
| Pyrococcus horikoshii OT3 | Pho RIR1 | 385 aa | DOD | PIRATNP*CGEEPLY 460/846 |
|||
| Pyrococcus abyssi | Pab RIR1-3 | 382 aa | DOD | PIRATNP*CGEEPLY 1297/1681 |
|||
| Methanobacterium thermoautotrophicum | Mth RIR1 | 134 aa | - | RIEATNP*CGEQPLL 274/409 |
|||
| Ferroplasma acidiphilum | Fac RIR1 | 366 aa | DOD | YIESTNP*CGEQPLL 3437/804 |
|||
| Ferroplasma acidarmanus type I | FacI RIR1 | 366 aa | DOD | YIESTNP*CGEQPLL 437/804 |
|||
| Ferroplasmaspecies type II | FspII RIR1 | 366 aa | DOD | NIESTNP*CGEQPLL 437/804 |
|||
| RIR1-f (1) | Trichodesmium erythraeum IMS101 | Ter RIR1-3 | 323 aa | DOD | IIGSNFH*CNLSEIH 1081/1405 |
||
| RIR1-g (1) | Trichodesmium erythraeum IMS101 | Ter RIR1-4 | 381 aa | DOD | TTVQPSG*TKSLLTN 1534/1916 |
||
| DNA reverse gyrase/Topoisomerase-I (topA) domain (4) | Archaea, Bacteria, Eucarya (DNA topoisomerase III). |
r-gyr-a (3) | Methanococcus jannaschii | Mja r-gyr | 494 aa | DOD | ELFELGL*CTYHRTS 866/1361 |
| Pyrococcus horikoshii OT3 | Pho r-gyr | 410 aa | DOD | DLFEAGL*CTYHRTD 953/1364 |
|||
| Pyrococcus furiosus | Pfu topA | 373 aa | DOD | SLYEKGF*CSYPRTE 314/688 |
|||
| topA-a (1) | Haloarcula marismortui ATCC43049 | Hma TopA | 494 aa | DOD | DKGLGTK*STRHNSI 327/865 |
||
| DNA polymerase family B (archaeal polA gene) (18) | Archaea, Eucarya (Dpols a, d, and e), Bacteria (Dpol II). | ||||||
| pol-a4 (4) | Thermococcus fumicolans | Tfu pol-1 | 360 aa | DOD | IAYLDFR*SLYPSII 406/767 |
||
| Thermococcus aggregans | Tag pol-1 | 360 aa | DOD | IAYLDFR*SLYPSII 409/770 |
|||
| Pyrococcus kodakaraensis (strain KOD1) | Pko pol-1 | 360 aa | DOD | IVYLDFR*SLYPSII 406/767 |
|||
| Methanococcus jannaschii | Mja pol-1 | 369 aa | DOD | IISMDFR*SLYPSII 425/795 |
|||
| pol-b4 (9) | Thermococcus spp. GE8 | TspGE8 pol-1 | 536 aa | DOD | AIKILAN*SYYGYYG 491/1027 |
||
| Thermococcus hydrothermalis | Thy pol-1 | 538 aa | DOD | AIKILAN*SYYGYYG 3458/996 |
|||
| Thermococcus litoralis | Tli pol-1 | 538 aa | DOD | AIKLLAN*SYYGYMG 494/1033 |
|||
| Thermococcus aggregans | Tag pol-2 | 538 aa | DOD | AVKLLAN*SYYGYMG 854/1393 |
|||
| Pyrococcus spp. GB-D | Psp pol-1 | 537 aa | DOD | AIKILAN*SYYGYYG 492/1030 |
|||
| Pyrococcus kodakaraensis (strain KOD1) | Pko pol-2 | 537 aa | DOD | AIKILAN*SYYGYYG 851/1389 |
|||
| Pyrococcus horikoshii OT3 | Pho pol | 460 aa | DOD | AIKILAN*SYYGYYG 492/953 |
|||
| Methanococcus jannaschii | Mja pol-2 | 476 aa | DOD | SLKILAN*SVYGYLA 882/1359 |
|||
| Haloarcula marismortui ATCC43049 | Hma PolB1 | 409 aa | DOD | AVKVIMN*SLYGVLG 589/999 |
|||
| pol-c4 (7) | Thermococcus spp. GE8 | TspGE8 pol-2 | 390 aa | DOD | FKVLYAD*TDGFFAT 1075/1465 |
||
| Thermococcus hydrothermalis | Thy pol-2 | 390 aa | DOD | FKVLYAD*TDGFFAT 31044/1434 |
|||
| Thermococcus litoralis | Tli pol-2 | 390 aa | DOD | FKVLYAD*TDGFYAT 1081/1472 |
|||
| Thermococcus fumicolans | Tfu pol-2 | 389 aa | DOD | FKVLYAD*TDGFFAT 900/1290 |
|||
| Thermococcus aggregans | Tag pol-3 | 157 aa | - | FKVLYAD*TDGFYAT 1441/1599 |
|||
| Haloferax volcanii DS2 ATCC29605 | Hvo PolB1 | 437 aa | DOD | YDVAYGD*TDSVMLE 625/1063 |
|||
| Nanoarchaeum equitans Kin4-M | Neq pol | 98+30 aa | - | FKVIYGD* 556/ |
|||
| KlbA virulence protein (4) | Archaea (type II secretion system proteins), Bacteria (virB proteins, ATPases involved in pilli formation). |
klbA-a (4) | Pyrococcus horikoshii OT3 | Pho klbA | 521 aa | DOD | MNTGHDG*CMGTIHS 451/972 |
| Methanococcus jannaschii | Mja klbA | 169 aa | - | MNTGHDG*CSGTLHA 404/573 |
|||
| Pyrococcus furiosus | Pfu klbA | 522 aa | DOD | MNTGHDG*CMGTIHA 463/986 |
|||
| Pyrococcus abyssi | Pab klbA | 196 aa | - | MNTGHDG*CMGTIHS 453/650 |
|||
| Translation initiation factor bIF-2 / EF2 translation elongation and release factor (5) | Archaea, Bacteria, Eucarya. |
IF2-a (5) | Methanococcus jannaschii | Mja IF2 | 547 aa | DOD | GHVDHGK*TTLLDKI 30/577 |
| Methanopyrus kandleri AV19 | Mka EF2 | 523 aa | DOD | AHIDHGK*TTLSDQL 34/557 |
|||
| Pyrococcus horikoshii OT3 | Pho IF2 | 445 aa | DOD | GHVDHGK*TTLLDKI 19/464 |
|||
| Pyrococcus furiosus | Pfu IF2 | 387 aa | DOD | GHVDHGK*TTLLDRI 19/407 |
|||
| Pyrococcus abyssi | Pab IF2 | 394 aa | DOD | GHVDHGK*TTLLDRI 20/415 |
|||
| Uncharacterized protein, InterPro family UPF0027 (E.coli rtcB homologs) (6) | Archaea, Eucarya, Bacteria. |
rtcB-a5 (6) | Methanococcus jannaschii | Mja rtcB | 489 aa | DOD | GVGFDIN*CGVRLIR 97/586 |
| Methanopyrus kandleri AV19 | Mka rtcB | 483 aa | DOD | GVGYDIN*CGVRVMK 100/583 |
|||
| Pyrococcus horikoshii OT3 | Pho rtcB | 390 aa | DOD | GIGYDIN*CGVRLIR 97/488 |
|||
| Pyrococcus furiosus | Pfu rtcB | 481 aa | DOD | GIGYDIN*CGVRLIR 97/579 |
|||
| Pyrococcus abyssi | Pab rtcB | 437 aa | DOD | GIGYDIN*CGVRLIR 97/534 |
|||
| Nostoc punctiforme | Npu rtcB | 323 aa | HNH | AVGVDIG*CGMSAIK 77/400 |
|||
| Replication factor C, 37 Kd subunit / DNA polymerase III t subunit (10) | Archaea (RFC), Eucarya (RFC), Bacteria (DNA polymerase III g/t subunit). |
RFC-a (4) | Pyrococcus horikoshii OT3 | Pho RFC | 526 aa | DOD | GPPGVGK*TTAALAL 58/584 |
| Methanococcus jannaschii | Mja RFC-1 | 549 aa | DOD | GPPGVGK*TTAALCL 53/602 |
|||
| Pyrococcus furiosus | Pfu RFC | 525 aa | DOD | GPPGVGK*TTAALAL 59/585 |
|||
| Pyrococcus abyssi | Pab RFC-1 | 499 aa | DOD | GPPGVGK*TTAALAL 61/561 |
|||
| RFC-b (2) | Methanococcus jannaschii | Mja RFC-2 | 437 aa | DOD | NFLELNA*SDERGID 626/1063 |
||
| Methanopyrus kandleri AV19 | Mka RFC | 306 aa | - | NFLELNA*SDERGID 82/388 |
|||
| RFC-c (2) | Methanococcus jannaschii | Mja RFC-3 | 544 aa | DOD | VCRFILS*CNYPSKI 1124/1668 |
||
| Pyrococcus abyssi | Pab RFC-2 | 608 aa | DOD | NVRFILS*CNYSSKI 647/1256 |
|||
| dnaX-a (2) | Synechocystis spp. PCC6803 | Ssp dnaX | 431 aa | unknown | KVYVIDE*CHMLSTA 129/560 |
||
| Spirulina platensis C1 | Spl dnaX | 136 aa | - | KVYVIDE*CHMLSTA 129/266 |
|||
| recA / radA DNA repair protein (12) | Bacteria, (recA) Archaea (radA), Eucarya (rad51/dmc1). |
recA-a (1) | Mycobacterium tuberculosis | Mtu recA | 441 aa | DOD | VKVVKNK*CSPPFKQ 251/692 |
| recA-b (9) | Mycobacterium leprae | Mle recA | 366 aa | DOD | KIGVMFG*SPETTTG 205/571 |
||
| Mycobacterium flavescens | Mfl recA | 364 aa | DOD | KIGVMFG*SPETTTG 205/570 |
|||
| Mycobacterium flavescens ATCC14474 | Mfl recA 14474 | 365 aa | DOD | KIGVMFG*SPETTTG 3 |
|||
| Mycobacterium fallax | Mfa recA | 364 aa | DOD | KIGVMFG*SPETTTG 3 |
|||
| Mycobacterium chitae | Mch recA | 365 aa | DOD | KIGVMFG*SPETTTG 3 |
|||
| Mycobacterium gastri | Mga recA | 369 aa | DOD | KIGVMFG*SPETTTG 3 |
|||
| Mycobacterium shimoidei | Msh recA | 365 aa | DOD | KIGVMFG*SPETTTG 3 |
|||
| Mycobacterium thermoresistibile | Mth recA | 366 aa | DOD | KIGVMFG*SPETTTG 3 |
|||
| Thermomonospora fusca | Tfus recA-2 | 358 aa | DOD | KVGVMFG*SPETTSG 626/984 |
|||
| recA-c (2) | Pyrococcus horikoshii OT3 | Pho radA | 173 aa | - | GEFGSGK*TQLAHTL 149/322 |
||
| Thermomonospora fusca | Tfus recA-1 | 423 aa | DOD | GPESSGK*TTVALHA 72/495 |
|||
| Cell division control protein 21 (CDC21) (6) | Archaea, Eucarya, Bacteria (distant homologs - E.coli f516, Aquifex AQ_291 and Rhizobium NifA). |
CDC21-a (4) | Pyrococcus horikoshii OT3 | Pho CDC21-1 | 169 aa | - | GDPGVAK*SQLLRYV 334/503 |
| Pyrococcus abyssi | Pab CDC21-1 | 164 aa | - | GDPGVAK*SQLLRYI 334/499 |
|||
| Halobacterium spp. NRC-1 | Hsp CDC21 | 183 aa | - | GDPGTGK*SQMISYV 282/463 |
|||
| Haloarcula marismortui ATCC43049 | Hma CDC21 | 477 aa | DOD | GDPGTGK*SQMLSYI 331/809 |
|||
| CDC21-b (3) | Pyrococcus horikoshii OT3 | Pho CDC21-2 | 261 aa | - | SSSAAGL*TAAVVRD 529/790 |
||
| Pyrococcus furiosus | Pfu CDC21 | 367 aa | DOD | SSSAAGL*TAAAVRD 361/799 |
|||
| Pyrococcus abyssi | Pab CDC21-2 | 268 aa | - | SSSAAGL*TAAVVRD 525/794 |
|||
| ATP-dependent protease LA (lon gene) (3) | Archaea, Bacteria, Eucarya. |
lon-a (3) | Pyrococcus horikoshii OT3 | Pho lon | 475 aa | DOD | VRHDPFQ*SGGLGTP 203/678 |
| Pyrococcus furiosus | Pfu lon | 401 aa | DOD | VRHDPFQ*SGGLGTP 203/605 |
|||
| Pyrococcus abyssi | Pab lon | 333 aa | DOD | VRHDPFQ*SGGLGTP 220/554 |
|||
| Archaeal DNA polymerase Pol II, DP2 subunit (Archaeal polC gene) (3) | Archaea. | polC-a (4) | Pyrococcus horikoshii OT3 | Pho polC | 167 aa | - | HAAKRRN*CDGDEDA 951/1118 |
| Pyrococcus abyssi | Pab polC | 185 aa | - | HAAKRRN*CDGDEDA 954/1140 |
|||
| Halobacterium spp. NRC-1 | Hsp polC | 196 aa | - | HAAKRRN*CDGDEDC 925/1121 |
|||
| Haloarcula marismortui ATCC43049 | Hma polC | 179 aa | - | HAAKRRN*CDGDEDC 965/1146 |
|||
| Uncharacterized protein, InterPro UPF0051 family (E.coli sufD homologs, mycobacterial pps1 proteins) (6) | Bacteria, Eucarya (chloroplast, cyanelle), Archaea. |
pps1-a (1) | Mycobacterium leprae | Mle pps1 | 387 aa | DOD | TAVWSGG*SFIYVPP 201/588 |
| pps1-b (4) | Mycobacterium tuberculosis | Mtu pps1 | 360 aa | DOD | YVHYVEG*CTAPIYK 252/612 |
||
| Ferroplasma acidiphilum | Fac pps1 | 356 aa | DOD | KVHYIEG*CTAPKYN 242/599 |
|||
| Ferroplasma acidarmanus type I | FacI pps1 | 356 aa | DOD | KVHYIEG*CTAPKYN 242/599 |
|||
| Ferroplasmaspecies type II | FspII pps1 | 356 aa | DOD | KVHYIEG*CTAPKYN 242/599 |
|||
| pps1-c (1) | Mycobacterium gastri | Mga pps1 | 378 aa | DOD | GVAAQYE*SEVVYHQ 15/393 |
||
| pre-mRNA splicing factor Prp8 (4) | Eucarya. |
PRP8-a (4) | Filobasidiella (Cryptococcus) neoformans | Cne PRP8 | 172 aa | - | GLFWEKA*SGFEESM 1531/1704 |
| Ajellomyces (Histoplasma) capsulatus | Hca PRP8 | 530 aa | DOD | GLFWERA*SGFEESM 31515/2046 |
|||
| Aspergillus fumigatus | Afu PRP8 | 819 aa | DOD | GLFWERA*SGFEESM 3658/1478 |
|||
| Emericella (Aspergillus) nidulans FGSC-4A | Ani PRP8 | 605 aa | DOD | * 1525/2131 |
|||
| dTDP-glucose 4-6-dehydratase (3) | Bacteria, Eucarya, Archaea. |
rfbB-a (2) | Synechococcus spp. PCC7002 | Ssp2 RfbB | 332 aa | DOD | NPIGIRS*CYDEGKR 150/483 |
| Trichodesmium erythraeum IMS101 | Ter RfbB-1 | 336 aa | DOD | NCIGIRS*CYDEGKR 144/481 |
|||
| rfbB-b (1) | Trichodesmium erythraeum IMS101 | Ter RfbB-2 | 430 aa | DOD | RVARIFN*TYGPRML 511/942 |
||
| DNA gyrase subunit B (2) | Bacteria, Archaea, Eucarya (DNA topoisomerase II). |
gyrB-a (2) | Synechocystis spp. PCC6803 | Ssp gyrB | 436 aa | HNH | EGDSAGG*SAKQGRD 436/872 |
| Trichodesmium erythraeum IMS101 | Ter gyrB | 244 aa | - | EGDSASG*SAKQGRD 439/684 |
|||
| SNF2 helicase | Bacteria, Eucarya. |
snf2-a (2) | Deinococcus radiodurans | Dra snf2 | 343 aa | DOD | DDMGLGK*TLQTLAH 693/1037 |
| Trichodesmium erythraeum IMS101 | Ter snf2 | 469 aa | DOD | DDMGLGK*TIQTIAF 271/741 |
|||
| Conserved protein similar to halobacterial Thy1 thymidylate synthase complementing protein | Bacteria, Archaea. |
Thy1-a (2) | Synechococcus spp. PCC7002 | Ssp2 Thy1 | 444 aa | HNH | GVSFDVQ*SFRYTGQ 96/541 |
| Trichodesmium erythraeum IMS101 | Ter Thy1 | 298 aa | - | RTHRIGC*SFDVQSY 91/390 |
|||
| Phosphoenol pyruvate synthase (2) | Archaea, Bacteria, Eucarya. |
PEPsyn-a (2) | Methanococcus jannaschii | Mja PEPsyn | 412 aa | DOD | TDEGGLT*CHAAIVS 410/823 |
| Crocosphaera watsonii WH8501 | Cwa PEPsyn | 394 aa | DOD | TNQGGRT*CHAAIIA 453/848 |
|||
| DNA polymerase III a subunit (18) | Bacteria. | dnaE-d (1) | Trichodesmium erythraeum IMS101 | Ter DnaE1-1 | 1308 aa | DOD | KETYGVL*CYQEQIM 719/2028 |
| dnaE-b (2) | Thermus thermophilus | Tth DnaE-1 | 424 aa | DOD | EADLLRR*SMGKKKV 767/1191 |
||
| Trichodesmium erythraeum IMS101 | Ter DnaE1-2 | 428 aa | DOD | EADLLRR*CMGKKKV 2053/2482 |
|||
| dnaE-a (13) | Synechocystis spp. PCC6803 | Ssp DnaE 6 | 123+36 aa | - | MVKFAEY* 774/ |
||
| Synechococcus spp. PCC7002 | Ssp2 DnaE 6 | 106+36 aa | - | MVKFAEY* 775/ |
|||
| Nostoc spp. PCC7120 (Anabaena PCC7120) | Asp DnaE 6 | 102+36 aa | - | MLKFAEY* 775/ |
|||
| Nostoc punctiforme | Npu DnaE 6 | 102+36 aa | - | MLKFAEY* 774/ |
|||
| Trichodesmium erythraeum IMS101 | Ter DnaE-3 6 | 102+36 aa | - | MIKFAEY* 2525/ |
|||
| Thermosynechococcus elongatus BP-1 | Tel DnaE 6 | 117+35 aa | - | MLDFAEY* 871/ |
|||
| Crocosphaera watsonii WH8501 | Cwa DnaE 6 | 106+36 aa | - | MIKFAEY* 774/ |
|||
| Synechococcus elongatus PCC7942 | Sel DnaE 6 | 113+36 aa | - | MVLFAEY* 757/ |
|||
| Anabaena variabilis ATCC29413 | Ava DnaE 6 | 102+36 aa | - | MLKFAEY* 775/ |
|||
| Aphanizomenon ovalisporum | Aov DnaE 6 | 101+36 aa | - | MLNFAEY* ???/ |
|||
| Oscillatoria limnetica | Oli DnaE 6 | 112+36 aa | - | MVKFAEY* ???/ |
|||
| Thermosynechococcus vulcanus | Tvu DnaE 6 | 117+35 aa | - | MLDFAEY* ???/ |
|||
| Aphanothece halophytica | Aha DnaE 6 | 110+36 aa | - | MIKFAEY* ???/ |
|||
| dnaE-c (2) | Thermus thermophilus | Tth DnaE-2 | 424 aa | DOD | ANYGFNK*SHAAAYS 1238/1662 |
||
| Gemmata obscuriglobus UQM 2246 | Gob DnaE | 446 aa | DOD | GGYGFNK*SHTAAYA 761/1208 |
|||
| Protein types with more than one intein (in the same polypetide or on homologs) but all are in different integration points - inteins are not alleles | |||||||
| class-III (anaerobic) Ribonucleotide reductase (2) | Archaea, Bacteria. (Eucarya - similarity to a 60 aa on the N-termini of a subunits of class I ribonucleotide reductase.) |
RNR-a (1) | Methanococcus jannaschii | Mja RNR-1 | 454 aa | DOD | YVARGGQ*TIFSSIN 337/791 |
| RNR-b (1) | Methanococcus jannaschii | Mja RNR-2 | 534 aa | DOD | TQTPAES*TAGRFAR 1058/1592 |
||
| Protein types with only one known intein | |||||||
| Protein similar to a type of phage terminase large subunits (E.coli ymfN gene) | Bacteria. | terA-a (1) | Clostridium thermocellum (probably a prophage) | Cth terA | 334 aa | DOD | IPKKNGK*SELAAAV 84/418 |
| Protein similar to a type of phage terminase large subunits (i.e. Rhodobacter capsulatusputative terminase large subunit) | Bacteria (some), Archaea (some). |
terA2-a (1) | Methanococcoides burtonii | Mbu terA2 | 318 aa | - | EGRSPDR*CDALVWA 126/445 |
| Bacterial conserved hypothetical protein of unknown function; has N' ParB nuclease domain and C' DNA methylase domain (see Interpro domains ParB nuclease and DNA methylase) | Bacteria (some), Archaea (some). |
o681-a (1) | Gemmata obscuriglobus UQM 2246 | Gob o681 | 287 aa | unknown | YHWKHEP*CLYGWID 286/574 |
| TrbC - a protein with an AAA ATPase domain (see Interpro domain AAA ATPases) | Bacteria (some). | trbC-a (1) | Methylobacterium extorquens | Mex trbC | 366 aa | DOD | GTTGAGK*TETLLGF 51/417 |
| Some type of helicase (see Interpro families Helicases and DEAD/DEAH box helicases) | Bacteria, Eucarya (some). |
00302-a (1) | Methylobacterium extorquens | Mex Helic | 270 aa | - | DIVVVDE*CHRWFEM 119/390 |
| SpoVR - function unknown, family includes B.subtilis protein involved in spore cortex synthesis, related proteins from other bacilli, enterobacteria (E.coli YcgB), Pseudomonads and an Archaeon (Halobacterium) | Bacteria (some), Archaea (some). |
SpoVR-a (1) | Chloroflexus aurantiacus | Cau SpoVR | 277 aa | - | ILNEGWA*SYWHSTI 278/555 |
| Clp protease catalytic subunit (1) | Eucaria, Bacteria. |
clpP-a (1) | Chlamydomonas eugametos (chloroplast) | Ceu clpP | 457 aa | DOD | VMIHQPE*SSIQGQA 447/904 |
| class-I Ribonucleotide reductase b subunit (1) | Bacteria, Eucarya. |
RIR2-a (1) | Aquifex aeolicus | Aae RIR2 | 347 aa | DOD | YINRDEL*CHVTLFR 229/576 |
| ATP dependent helicase; LHR - large helicase-related protein (1) | Archaea, Eucarya, Bacteria. |
LHR-a (1) | Pyrococcus horikoshii OT3 | Pho LHR | 476 aa | DOD | GELRAVV*SSTSLEL 345/821 |
| DEAD/DEAH-box helicase (MJ1124) (1) | Archaea, Eucarya, Bacteria. |
helicase-a (1) | Methanococcus jannaschii | Mja helicase | 502 aa | DOD | ICCTPTL*SAGLNLP 337/839 |
| Molybdenum cofactor biosynthesis A (MoaA) like protein | Archaea, bacteria, Eucarya. |
moaA-a (1) | Pyrococcus abyssi | Pab moaA | 437 aa | DOD | CNLNCWY*CFFYARE 97/534 |
| Archaeal uncharacterized protein (M.jannaschii MJ0043) (1) | Archaea, Bacteria (Bsu YqxK). |
hyp1-a (1) | Methanococcus jannaschii | Mja hyp1 | 393 aa | DOD | GLIGPAH*CFTPWTS 128/521 |
| Glutamine fructose 6 phosphate transaminase (1) | Archaea, Bacteria, Eucarya. |
GF6P-a (1) | Methanococcus jannaschii | Mja GF6P | 500 aa | DOD | GNIGIGH*SRWATHG 74/574 |
| RNA polymerase subunit A' (1) | Archaea (RNA pol A' subunit), Eucarya (RNA pol I/II/III A subunit), Bacteria (RNA pol b' subunit). |
RpolA'-a (1) | Methanococcus jannaschii | Mja RpolA' | 453 aa | DOD | FRHNLCV*CPPYNAD 463/916 |
| RNA polymerase subunit A" (1) | Archaea (RNA pol A" subunit), Eucarya (RNA pol I/II/III A subunit), Bacteria (RNA pol b' subunit). |
RpolA"-a (1) | Methanococcus jannaschii | Mja RpolA" | 472 aa | DOD | GEPGTQM*TMRTFHY 75/547 |
| Transcription factor IIB (1) | Archaea, Eucarya. |
TFIIB-a (1) | Methanococcus jannaschii | Mja TFIIB | 336 aa | DOD | VGAPMTY*TIHDKGL 99/435 |
| UDP-glucose dehydrogenase (1) | Archaea, Bacteria, Eucarya. |
UDPGD-a (1) | Methanococcus jannaschii | Mja UDPGD | 455 aa | DOD | GIGYGGS*CFPKDVK 260/715 |
| Archaeal uncharacterized protein (A.pernix APE0745) | Archaea, bacteria. |
hyp3-a (1) | Aeropyrum pernix | Ape hyp3 | 468 aa | DOD | QYAITTQ*SAFGWGL 175/644 |
| DNA polymerase I | Bacteria, Eucarya. |
dpo1-a (1) | Bacteriophage APSE-1 | APSE1 dpo1 | 306 aa | - | KTYGGKS*CENICQA 608/915 |
| Phage Mu protein F homolog - function unknown | Bacteria | MupF-a (1) | Methylococcus capsulatus | Mca MupF | 315 aa | unknown | WPPNGFN*CRCRVRP 183/498 |
| CDC48 ATPase cell division control protein 48 | Archaea, Eucarya, Bacteria (FtsH). |
CDC48-a (1) | Methanopyrus kandleri AV19 | Mka CDC48 | 395 aa | DOD | LSKWVGE*SEKKIRE 634/1029 |
| Vacuolar-type ATPase b subunit | Archaea, Eucarya, Bacteria. |
VatB-a (1) | Methanopyrus kandleri AV19 | Mka VatB | 518 aa | DOD | LTDMTNY*CEALREI 260/778 |
| Totals: | |||||||
| 46 types of protein family hosts 7.
187 separate proteins with inteins: 150 with a single intein, 14 pairs each with one split intein 2, 18 with 2 inteins, 3 with 3 inteins, 1 with 4 inteins, and 1 pair: one with two and half inteins and the other with half of an intein (the two halves being parts of a split intein) |
25/46 host groups found in Archaea, Bacteria and Eucaria, 8/46 found in Archaea and Bacteria, 1/46 found in Archaea and Eucaria, 6/46 found in Bacteria and Eucaria, 1/46 found just in Archaea, 1/46 found just in Eucaria, 4/46 found just in Bacteria. |
71 different integration points, 36 with more than one allele | 99 different species and strains. 24 eukaryotes (20 in unicellular organisms, 3 in plastids and 1 in viruses), 50 bacteria (4 in bacterio- and pro- phages), 25 archaea. |
216 inteins 6 | 2 known EN domain types2, 47 inteins without an EN domain | ||
1 Phylogenetic distribution was determined by searching the NCBI
non-redundant protein database with the blastp program using
the intein host proteins as queries.
2 Known intein endonuclease (EN) domain types: DOD - dodecapeptide
(LAGLI-DADG) and HNH (see Belfort and Roberts '97).
3 Partial sequences.
4 The intein integration points in the type B DNA polymerases
have been renamed on April '99. I joined the NEB Intein database and renamed
the integration points according the type B DNA polymerase domain in which
they are present. See positions of
inteins in their host proteins.
Integration point pol-a used to be named here pol-c, pol-b used to be
named pol-a and pol-c used to be named pol-b.
The colors of the integartion points are unchanged.
5 InterPro family UPF0027 (E.coli rtcB homologs) was previously named here "hyp2". The function of these proteins
is still unknown but now they are named and better characterized. Details can be found in
Genschik, Drabikowski & Filipowicz, J Biol Chem 273:25516-25526 (1998).
6 Cyanobacterial DnaEs and Nanoarchaeum equitans
pol are split inteins - the N' and C' parts of the inteins are
found on different protein molecules (e.g., the N' and C' parts of the DNA polymerase
III a subunit [DnaE or PolC], respectively, and
the pol1 and pol2 N' and C' parts of the Neq DNA polymerase type B). These
two molecules are probably trans protein-spliced to form the mature DnaE
or PolA proteins.
7 In counting the types of protein families with inteins DNA reverse gyrases are grouped together with Topoisomerase type I and Replication factor C, 37 Kd subunits are grouped together with DNA pol-3 t subunits.