DNA-binding sites of Tc1, mariner and pogo transposases

(For a full published version see
Molecular & General Genetics 254, 689-695, 1997,
Pietrokovski, S. & Henikoff, S.
"A helix-turn-helix DNA-binding motif predicted for DNA-mediated transposases"

Medline ID: 97345893)

new Postscript - Experimental verifications new

Introduction

DNA-mediated transposons are ubiquitous, occurring in bacteria (1,2), protozoa (3), plant (4), fungi (5) and most animal phyla (6,7), including humans (8,9,10,11,12,13). Tc1 and mariner are the best studied animal transposons families (6,7). In vitro transposition systems were developed for members of both (14,15) and a Tc1-like transposon was used as a vector for germ-line transformation between genera (16). These transposons are small (1-2 Kb) and code a single protein. This protein is a transposase apparently responsible and sufficient for transposition in the absence of host factors.

Tc1 and mariner transposons
Tc1 and mariner transposons (ITR - Inverted Terminal Repeats)


Transposases from both families contain four conserved acidic residues in their C-terminal half. These residues are belived to form the catalytic site due to their similarity to retroviral integrases and bacterial and phage transposases (17). This is supported by experiments that point mutated some of these residues (18, 19, 20).

Crystal structure of the catalytic domain from Avian Sarcoma Virus (ASV) integrase


The DNA binding site of these transposases was unknown. It has been experimentaly localized at the N-terminal region of two Tc1 type transposases (21, 22). A few TC1 transposases were also found to have a helix-turn-helix (HTH) DNA binding motif at their N-terinal half (23, 24, 25). However, there were also claims for a leucine zipper and an ATP binding motif in the corresponding regions of other Tc1 transposases (26, 27).

Results

We have been able to multiply align sequences from each family. Local multiple alignments were first found by the BlockMaker and MEME systems and later refined by the MACAW program. identifying a number of conserved region in each family. The C-terminal regions correspond to previously identified motifs around the conserved acidic residues of the catalytic domain. The N-terminal region of both families is less conserved than the C-terminal region and it was variably aligned (21, 27, 20)

Conserved sequence regions (blocks) in Tc1 and mariner transposases Conserved sequence regions in Tc1 and mariner transposases


Block to sequences and block to blocks database searches found that one block in the N-terminal region of each family to be significantly similar to HTH DNA binding regions of various proteins (paired domain proteins, bacterial insertion sequences transposases, transcriptional regulators, RNA polymersase sigma factors, resolvases, etc.). This pattern of similarity was previously found for other HTH motifs (28, see here).

Similarity of Tc1 and mariner blocks to blocks of HTH DNA binding regions

Each box corresponds to a block from the Blocks database v9.1 or from the Tc1 and mariner 
families. The color of the boxes correspond to the type of protein family. Lines between blocks 
show a significant similalrity (Z-scores > 5.6, expected to occur by chance < 0.25 times). 
Thick lines denote very significant similarity, expected by chance < 1e-6. Similarities to 
the Tc1 and mariner blocks are shown as red lines. The only other similarity between Tc1 and 
mariner blocks to blocks from the database was of a LacI family HTH block and block Tc1 C.

Comparing the Tc1 and mariner blocks with each other confirmed the similarity between their catalytic domains, showed that their HTH motifs are similar to each other, and indicated another DNA binding region in the Tc1 transposases. Two regions between the Tc1 HTH motif and the catalytic domain were similar to the N' and C' helices of the mariner HTH motif:

Similarity between Tc1 and mariner blocks
blocks
positions
Z-score expected (a)
Tc1-A
mariner-C
10-34/ 8-32
 8.6    ə.0e-6
Tc1-B
mariner-C
18-34/ 1-17
 4.6     6.3e-2
Tc1-C
mariner-C
 1-15/18-32
 8.5    ¾1.0e-6
Tc1-E
mariner-E
 1-21/ 5-25
 4.6     6.6e-2
Tc1-G
mariner-H
 4-22/ 1-19
12.7    ¾1.0e-6
Tc1-H
mariner-I
 6-16/ 1-11
 6.3     1.9e-3
Tc1-I
mariner-K
12-33/ 2-23
 7.5     1.0e-4

(a) The expected occurrences are for searching 100 blocks and 
    the Z-score cutoff is 4.5 (corresponding to the top 7.3e-4 percentile 
    of chance scores).

The Dodd and Egan method (29) for identifying HTH motifs confirmed our findings for most but not all sequences. The identification failures might be due to the mostly prokaryotic sequences used to find the parameters for the Dodd and Egan scoring matrix. Our findings are supported by experimental data as well. Plasterk and co-workers have shown that the N-terminal regions of the Tc1A and Tc3A transposases specifically bind to their respective inverted repeats (21, 22). The number of mutations in the predicted HTH region in the D.mauritiana mariner transposase that reduced and disabled its activity (20) was more than what would be expected by chance alone.


Our results predict a bipartite DNA binding domain in Tc1 transposases. We suggest the N-terminal part to be a 'typical' HTH DNA binding domain and the C-terminal part to be a 'HTH like' domain with a longer region (perhaps a loop) between two DNA binding helices. This part of the DNA binding domain corresponds to the region observed to have non-specific DNA binding activity in Tc1A and Tc3A transposases (21, 22).

Tc1-type proteins block logos

DNA binding domain


2-20 > Tc1 block A < 1-8 aa > Tc1 block B < 0-4 aa > Tc1 block C < 0-1 aa >

(Click on the logos to see the blocks)

Bipartite DNA binding domains with HTH structures are known for a number of different proteins. For example yeast RAP1 telomer binding protein (30), bacterial PurR and lac repressors (31,32), Myb oncoprotein (33). In all these proteins it is probably the N-terminal part of the structure that determines the DNA bindiing specificty (ibid). The paired domain of the Pax transcription factors is also a bipartite DNA binding structure (34). The domain also has limited sequence similarity to Tc1 transposases. The sequence similarity between Prd Pax protein and the Tc1A transposase confirms our prediction for the structure of the Tc1 DNA binding site.

Structure of Prd PAX domain (1PDN) bound to DNA 1PDN; structure of paired domain bound to DNA

The two Prd (SwissProt entry HMPR_DROME, PDB entry 1PDN regions with sequence similarity 
to the Tc1A transposase are in color. In light blue is the region predicted in Tc1A as a 
HTH structure (Tc1 block A). The colored region on the right corresponds to Tc1 block B, 
with the start of the predicted DNA binding helix at the far right.

The two aligned regions (alignment by BLASTP) are:

Score = 43 (19.9 bits), Expect = 0.0061, Sum P(2) = 0.0061  Score = 36 (16.6 bits), Expect = 0.0061, Sum P(2) = 0.0061
Identities = 10/43 (23%), Positives = 23/43 (53%)           Identities = 8/33 (24%), Positives = 15/33 (45%)
 
Prd  47 NIRLKIVEMAADGIRPCVISRQLRVSHGCVSKILNRYQETGSI 89      Prd  94 IGGSKPRIATPEIENRIEEYKRSSPGMFSWEIR 126
        +++  IV     GI   +++ Q++ S   + K++ +YQ   S+                 I   +PR+ T  ++  I    R  P   + +I+
Tc1A 13 DVKKAIVAGFEQGIPTKMLALQIQRSPSTIWKVIKKYQTEKSV 55      Tc1A 59 ISPGRPRVTTHRMDRNILRSAREDPHRTATDIQ 91

Pogo is a DNA-mediated transposon family (named after the first its member identified in a fruit fly). The family is distantly related to the mariner and Tc1 groups but also contains non transposase proteins (13). The mariner and Tc1 blocks could identify similar regions in pogo sequences. Of particular interest was identifying the pogo DNA binding region. The catalytic site was predicted to be in the C-terminal part of pogo proteins that could be aligned with mariner and Tc1 sequences. However, the N-terminal regions showed no apparent similarity to these sequences 13). The DNA binding site was identified and predicted as to be a HTH structure by block comparison. This was confirmed with the Dodd and Egan method.

HTH motifs in pogo sequences

                                                                                     HTH
                                                    HHHHHHHHtttHHHHHHHHH          Z-score
Tigger1             MASKCSSERKSXTSLTLNQKLEMIKLSEEGMSKAEIGQKLGLLRQTVSQVVNAKEKFLKE  5.8
Pogo                     MGKTKRVVGLTLKEKLQIIELVTNKVDKKEICAKFKCDRSTVNRILQKTNEIHEA  3.0
Aot1         MIKTSAIPPKIPKSKKSRVEQEGRILLAISAIKKQEISSFRKAAEIFNIPIATLRYRLNGGSFRNDT  4.2
Pot2                           MKQYTEKQLISAINDVNNGNPIAKTSRKWGIPRSTLQSRLKGSQPYKKA  3.2
Fcc1                  MPQQQRSIQTSCEGRISLAIASYRNNPKQSVRALAVAYDVPKSTLQRRLHGTHARSEI  2.7
CENP-B Hs                MGPKRRQLTFREKSRIIQEVEENPDLRKGEIARRFNIPPSTLSTILKNKRAILAS  5.6

Sequences in bold were each found similar to the mariner C block. Secondary structure 
prediction is shown above the sequences, H stands for helix and T for turn. Segments in color 
mark probable HTH motifs identified by the hth program (29). The hth program Z-scores 
(standard deviates from the mean) are shown for each sequence. All sequences begin at their 
N-terminus. Similar results were obtained with Tc1 block A. Note that the Dodd and Egan 
predicted HTH segments all are in the same relative position inside the block similar segments 
and that these positions in the mariner block were also predicted to be HTH motifs.

The pogo sequences could be also aligned in blocks and block A was similar to various blocks of HTH motifs. However, three non-transposases pogo proteins are missing block A and are not predicted to have a HTH motif. These proteins are fungal transcription factors (PDC2 and RAG3) and the jerky protein. Inactivation of this protein in mice caused them to have epileptic seizures (35). All other pogo blocks are found in these three proteins.

pogo-like proteins. 9 blocks

pogo_drome  370 ----AAAA---BBBBB--CC-----DDDD---EEEE-FFFF--GG--HHHH-----II--
Fot1_fusox  542 ---AAAA--BBBBB--CC-----DDDD---EEEE--FFFF--GG---HHHH-----II---------------------------
Pot2_maggr  535 ---AAAA--BBBBB--CC-----DDDD---EEEE--FFFF--GG---HHHH-----II---------------------------
Pot3_maggr  554 ---AAAA--BBBBB-CC------DDDD---EEEE--FFFF--GG---HHHH-----II-----------------------------
Fcc1_cocca  534 ----AAAA--BBBBB--CC------DDDD---EEEE--FFFF--GG---HHHH-----II-------------------------
Aot1_aspor  382 -----AAAA--BBBBB--CC------DDDD---EEEE--FFFF--GG---HHHHII---
Tigger1_hum 430 ----AAAA-----BBBBB--CC-----DDDD----EEEE-FFFF---GG----HHHH-----II-----
Tigger2_hum 650 ---------------------------------------BBBBB---CC-----DDDD---EEEE--FFFF---GG--HHHH---II-----------------
CENB_HUMAN  599 ----AAAA----BBBBB--CC--------DDDD--EEEE-FFFF--GG--HHHH----II-----------------------------------
ABP1_SCHPO  522 ------------BBBBB--CC-----DDDD--EEEE--FFFF-GG---HHHH-----------------------------
jerky_mouse 509 ------BBBBB--CC-----DDDD--EEEE--FFFF---GG--HHHH---II-----------------------------
PDC2_YEAST  925 -----------BBBBB--CC----DDDD--EEEE-----FFFF--GG--HHHH----------II----------------------------------------------------------------------------------
RAG3_klact  765 -----------BBBBB---CC----DDDD--EEEE-----FFFF--GG--HHHH--------II----------------------------------------------------------

50 aa
--------

The tigger2 sequence might be shorter than shown. It was translated from a consenusus
of several poorly conserved sequences

The CENP-B centromer binding protein protein does have a predicted HTH motif, but is not a transposase. This implies it functions more like the transposases than like the jerky and transcription factors. A tree constructed from the sequences aligned in the pogo blocks shows CENP-B to be only distantly related to jerky. Suprisingly it is not found similar to the yeast (S.pombe) centromer binding protein ABP1. The fungal transposases (Fot1, Pot2, Pot3 and Fcc1) are all clustered together, separate from the animal transposases (pogo and tiggers).

pogo-like proteins tree

Click on sequence names in the image to link to the sequence entries.<

Postscript - Experimental verifications

Our prediction for the position and type of DNA-binding domain in these various eucaryotic transposases was confirmed in molecular (Dm pogo, 36) and structural (Tc3, 37) experiments and is supported by a genetic screen study (Dm mariner, 38). This is further proof that our results can provide specific guidelines for genetic engineering of Tc1, mariner, pogo and related transposases.

structure of N-terminal end of TC3 bound to DNA
Co-crystal structure of the N-terminal end of the Tc3 transposase (residues 2-52) and its target DNA (1TC3). The predicted HTH DNA binding motif (corresponding to positions 11-30 in block A of the TC1 sequence family, aa 22-44 in Tc3 transposase) is shown in cyan. It percisely corresponds to a helix-turn-helix structure that indeed binds the DNA.
Click on the image to see the structure in an interactive mode - you can turn, zoom and otherwise mainpulate the structure. This requires the Chime plugin program, currently available for Mac, PC and SGI computers.


More information

The roberton lab mariner transposons page.

Block logos of Block maps of


Bibilography
  1. Henikoff, S. (1992). Detection of Caenorhabditis Transposon homologs in diverse organisms. The New Biologist 4, 382-388.
  2. Selbitschka, W., Arnold, W., Jording, D., Kosier, B., Toro, N. & Puhler, A. (1995). The insertion sequence element ISRm2011-2 belongs to the IS630-Tc1 family of transposable elements and is abundant in Rhizobium meliloti. Gene 163, 59-64.
  3. Flavell, A. J., Pearce, S. R. & Kumar, A. (1994). Plant transposable elements and the genome. Curr Opin Genet Dev 4, 838-844.
  4. Daboussi, M. J., Langin, T. & Brygoo, Y. (1992). Fot1, a new family of fungal transposable elements. Mol Gen Genet 232, 12-16.
  5. Lam, W. L., Seo, P., Robison, K., Virk, S. & Gilbert, W. (1996). Discovery of amphibian Tc1-like transposon families. J Mol Biol. 257, 359-366.
  6. Plasterk, R. H. (1996). The Tc1/mariner transposon family. Curr. Top. Microbiol. Immunol. 204, 125-143.
  7. Robertson, H. M. (1995). The Tc1-mariner superfamily of transposons in animals. Journal of Insect Physiology 41, 99-105.
  8. Auge-Gouillou, C., Bigot, Y., Pollet, N., Hamelin, M. H., Meunier-Rotival, M. & Periquet, G. (1995). Human and other mammalian genomes contain transposons of the mariner family. FEBS Lett 368, 541-546.
  9. Morgan, G. T. (1995). Identification in the human genome of mobile elements spread by DNA-mediated transposition. J Mol Biol 254, 1-5.
  10. Oosumi, T., Belknap, W. R. & Garlick, B. (1995). Mariner transposons in humans. Nature 378, 672.
  11. Robertson, H. M. (1996). Members of the pogo superfamily of DNA-mediated transposons in the human genome. Mol Gen Genet 252, 761-766.
  12. Robertson, H. M., Zumpano, K. L., Lohe, A. R. & Hartl, D. L. (1996). Reconstructing the ancient mariners of humans. Nature Genet. 12, 360-361.
  13. Smit, A. F. & Riggs, A. D. (1996). Tiggers and other DNA transposon fossils in the human genome. Proc Nat Acad Sci USA 93, 1443-1448.
  14. Lampe, D. J., Churchill, M. E. A. & Robertson, H. M. (1996). Purified mariner transposase is sufficient to mediate transposition in vitro. EMBO J. 15, 5470-5479.
  15. Vos, J. C., De Baere, I. & Plasterk, R. H. (1996). Transposase is the only nematode protein required for in vitro transposition of Tc1. Genes Dev. 10, 755-761.
  16. Loukeris, T. G., Livadaras, I., Arca, B., Zabalou, S. & Savakis, C. (1995). Gene transfer into the medfly, Ceratitis capitata, with a Drosophila hydei transposable element. Science 270, 2002-2005.
  17. Doak, T. G., Doerder, F. P., Jahn, C. L. & Herrick, G. (1994). A proposed superfamily of transposase genes: transposon-like elements in ciliated protozoa and a common "D35E" motif. Proc Natl Acad Sci USA 91, 942-946.
  18. van Luenen, H. G., Colloms, S. D. & Plasterk, R. H. (1994). The mechanism of transposition of Tc3 in C. elegans. Cell 79, 293-301.
  19. Vos, J. C. & Plasterk, R. H. (1994). Tc1 transposase of Caenorhabditis elegans is an endonuclease with a bipartite DNA binding domain. EMBO J 13, 6125-6132.
  20. Lohe, A. R., De Aguiar, D. & Hartl, D. L. (1997). Mutations in the mariner transposase: The D,D(35)E consensus sequence is nonfunctional. Proc Natl Acad Sci USA 94, 1293-1297.
  21. Colloms, S. D., van Luenen, H. G. & Plasterk, R. H. (1994). DNA binding activities of the Caenorhabditis elegans Tc3 transposase. Nucleic Acids Res 22, 5548-5554.
  22. Vos, J. C. & Plasterk, R. H. (1994). Tc1 transposase of Caenorhabditis elegans is an endonuclease with a bipartite DNA binding domain. EMBO J 13, 6125-6132.
  23. Franz, G., Loukeris, T. G., Dialektaki, G., Thompson, C. R. & Savakis, C. (1994). Mobile Minos elements from Drosophila hydei encode a two-exon transposase with similarity to the paired DNA-binding domain. Proc Natl Acad Sci USA 91, 4746-4750.
  24. Selbitschka, W., Arnold, W., Jording, D., Kosier, B., Toro, N. & Puhler, A. (1995). The insertion sequence element ISRm2011-2 belongs to the IS630-Tc1 family of transposable elements and is abundant in Rhizobium meliloti. Gene 163, 59-64.
  25. Vos, J. C., van Luenen, H. G. & Plasterk, R. H. (1993). Characterization of the Caenorhabditis elegans Tc1 transposase in vivo and in vitro. Genes Dev 7, 1244-1253.
  26. Ivics, Z., Izsvik, Z., Minter, A. & Hackett, P. B. (1996). Identification of functional domains and evolution of Tc1-like transposable elements. Proc Nat Acad Sci USA 93, 5008-5013.
  27. Ke, Z., Grossman, G. L., Cornel, A. J. & Collins, F. H. (1996). Quetzal: a transposon of the Tc1 family in the mosquito Anopheles albimanus. Genetica 98, 141-147.
  28. Pietrokovski, S. (1996). Searching databases of conserved sequence regions by aligning protein multiple-alignments. Nucl Acids Res 24, 3836-3845.
  29. Dodd, I. B. & Egan, J. B. (1990). Improved detection of helix-turn-helix DNA-binding motifs in protein sequences. Nucl Acids Res 18, 5019-5026.
  30. Konig, K., Giraldo, R., Chapman, L. & Rhodes, D. (1996). The Crystal Structure of the DNA-Binding Domain of Yeast RAP1 in Complex with Telomeric DNA. Cell 85, 125-136.
  31. Schumacher, M. A., Choi, K. Y., Zalkin, H. & Brennan, R. G. (1994). Crystal structure of LacI member, PurR, bound to DNA: minor groove binding by alpha helices. Science 266, 763-770.
  32. Lewis, M., Chang, G., Horton, N. C., Kercher, M. A., Pace, H. C., Schumacher, M. A., Brenan, R. G. & Lu, P. (1996). Crystal Structure of the Lactose Operon Repressor and Its Complexes with DNA and Inducer. Science 271, 1247-1254.
  33. Ording, E., Kvavik, W., Bostad, A. & Gabrielsen, O. S. (1994). Two functionally distinct half sites in the DNA-recognition sequence of the Myb oncoprotein. Eur J Biochem 222, 113-120.
  34. Xu, W., Rould, M. A., Jun, S., Desplan, C. & Pabo, C. O. (1995). Crystal structure of a paired domain-DNA complex at 2.5 A resolution reveals structural basis for Pax developmental mutations. Cell 80, 639-650.
  35. Toth, M., Grimsby, J., Buzsaki, G. & Donovan, G. P. (1995). Epileptic seizures caused by inactivation of a novel gene, jerky, related to centromere binding protein-B in transgenic mice. Nature Genet 11, 71-75.
  36. Wang, H., Finnegan, D. J. (1997) "Personal communication".
  37. van Pouderoyen, G., Ketting, R. F., Perrakis, A., Plasterk, R. H. A., Sixma, T. K. (1997). Crystal structure of the specific DNA-binding domain of Tc3 transposase of C. elegans in complex with transposon DNA. EMBO J 16, 6044-6054.
  38. Lohe, A. R., De Aguiar, D., Hartl, D. L. (1997) Mutations in the mariner transposase: The D,D(35)E consensus sequence is nonfunctional. Proc Natl Acad Sci USA 94, 1293-1297.


Page last modified September 24, 1998
Shmuel Pietrokovski <pietro@bioinfo.weizmann.ac.il>