IS elements in Shigella flexneri 5a invasion-associated plasmid pWR501

This table presents a detailed characterization of the IS elements and associated ORFs in the Shigella flexneri 5a invasion-associated plasmid pWR501, and constitutes a supplement to the analysis of the plasmid sequence as described by Venkatesan, et al. The table lists (a) known IS elements, (b) new IS elements, and (c) unknown IS-related ORFs. For further information on known IS elements see the IS Database. Each ORF in pWR501 is referred to by its unique identifier (ID; S number), as assigned in the sequence annotations. In the table, ORFs constituting complete IS elements are indicated in bold. Abbreviations used: aa = amino acids, hp = hypothetical protein, id = identical, GB = GenBank accession no., IR = inverted repeat, DR = direct repeat, tnp = transposase, URF = unidentified open reading frame.

Questions regarding content or interpretation should be addressed to Dr. Malabi M. Venkatesan [ Malabi.Venkatesan@NA.AMEDD.ARMY.MIL ].

Known IS elements: pWR501 ORFs with >50% amino acid identity over >60% of the length of the query sequence only (significant homology).

Name and description of IS element pWR501 ORF ID size (aa) homology to protein in database additional comments regarding similarity
IS1

S. dysenteriae iso-IS1 (NuXi), 803 bp  [GB: J01737]
   InsA, orfA: 90 aa
   InsB, orfB: 131 aa
S. boydii IS1SB, 768 bp  [GB: U96388]
   InsA, orfA: 91 aa
   InsB, orfB: 167 aa
(InsAB' translational fusion protein via frameshift at A6C motif)
S0037 122 86% id to NuXi InsB over 122 aa homology to NuXi bp 431-778
S0053 75 96% id to IS1SB InsA S0053-S0054 = complete IS1 with 9 bp target site duplication (CCTCGATAC)
S0054 167 98% id to IS1SB InsB  
S0067 75 96% id to IS1SB InsA over 75 aa + LIR  
S0068 152 86% id to IS1SB InsB over 145 aa missing the terminal 87 bp of 768 bp IS1 seq
S0082 94 71% id to NuXi InsA over 81 aa  
S0083 61 84% id to NuXi InsB over 56 aa  
S0226 62 80% id to NuXi InsB S0226-S0227 homology to NuXi bp 183-522
S0227 55 96% id to 33 aa of IS1-cat IS1-cat protein fragment, 33 aa  [GB: M24180]
S0228 167 98% id to IS1SB InsB S0228-S0229 = complete IS1 with no target site duplication
S0229 92 IS1SB InsA  
S0239 167 IS1SB InsB S0239-S0241 = complete IS1 with 8 bp target site duplication (CACTATCG)
S0240 92 IS1SB InsA  
S0241 70 77% id to IS1 hp IS1 hp, 70 aa  [GB: BAA01301], overlaps InsA (S0240)
S0284 91 79% id to NuXi InsB over 48 aa homology to NuXi InsB aa 76-131
Total IS1 ORFs: 15, complete elements: 3
IS2

E. coli IS2, 1330 bp
   orf1, TnpG: 136 aa  [GB: BAA15011]
   orf2, TnpF: 301 aa  [GB: BAA15010]
S0001 101 88% id to orf2 over 85 aa homology to IS2 bp 1-36, 52-393
S0017 39 56% id to orf2 over 41 aa homology to IS2 bp 4-227
S0052 95 71% id to orf2 over 91 aa homology to IS2 bp 53-235
S0106 301 99.7% id to orf2 S0106-S0107 = complete IS2 with 5 bp target site duplication (GGACA)
S0107 136 100% id to orf1  
S0220 283 99% id to orf2 over 240 aa S0220-S0221 = incomplete IS2, bp 197-1284
S0221 151 98% id to orf1 over 136 aa  
S0222 301 99% id to orf2 S0222-S0223 = complete IS2 with 5 bp target site duplication (TGAGG)
S0223 136 94% id to orf1  
S0224 72 63% id to orf1 over 48 aa  
Total IS2 ORFs: 10, complete elements: 2
IS3

E. coli, 1258 bp  [GB: X02311]
   orfA/hp, TnpA: 102 aa  [GB: BAA97886]
   transposase: 288 aa  [GB: BAA97887]
S0026 71 90% id to transposase over 50 aa homology to IS3 bp 1062-1258
S0033 195 96% id to transposase over 184 aa homology to IS3 bp 672-1258
S0174 121 86% id to transposase over 116 aa S0174-S0175 shows homology to IS3 bp 1-668
S0175 102 100% id to orfA/hp  
S0188 80 90% id to orfA/hp over 80 aa homology to IS3 bp 403-776
S0248 73 97% id to transposase over 51 aa homology to IS3 bp 452-517, 530-791
S0291 73 88% id to transposase over 50 aa homology to IS3 bp 1074-1258, 1215-1258
Total IS3 ORFs: 7
IS4

E. coli, 1426 bp  [GB: J01733]
   orf: 442 aa  [GB: AAA61828]
S0022 442 99% id 10 bp target site duplication (CCGGCGGCCT)
S0243 422 97% id homology to IS4 bp 1-1337
Total IS4 ORFs: 2, complete elements: 1
IS21

P. aeruginosa, 2131 bp  [GB: X14793]
   IstA: 391 aa
   IstB: 266-272 aa
S0018 75 88% id to IstB over 43 aa homology to IS21 bp 1824-1941, 1951-2098
Total IS21 ORFs: 1
IS91

E. coli, 1829 bp  [GB: X17114]
   transposase, orfb: 426 aa  [GB: CAA34970]
   orfa, orf121: 133 aa  [GB: CAA34969]
S0019 133 88% id to orfb over 60 aa S0019-S0020 shows homology to IS91 bp 191-522, followed by CCGG, then to bp 1224-1829
S0020 62 90% id to orfa over 60 aa  
S0184 133 88% id to orfb over 60 aa S0184-S0185 shows homology to IS91 bp 1-522 followed by CCGG, then to bp 1224-1829
S0185 140 94% id to orfa over 120 aa  
Total IS91 ORFs: 4
IS100

Y. pestis, 1954 bp  [GB: Z32853]  IS21 family
   orfB, orf11: 259 aa  [GB: CAA21334]
   orfA, orf12: 340 aa  [GB: CAA21335]
S0035 144 93% id to orfA over 118 aa S0035-S0036 shows homology to IS100 bp 756-1954
S0036 221 99% id to orfB over 220 aa  
S0050 107 98% id to orfA S0050 region shows homology to IS100 bp 1172-1486
S0089 77 83% id to orfB over 77 aa S0089 region shows homology to IS100 bp 1241-1445
S0094 221 100% id to orfB over 220 aa S0094 region shows homology to IS100 bp 1-61, 1173-1954
S0129 340 100% id to orfA S0129 region shows homology to IS100 bp 1-1051
Total IS100 ORFs: 6
IS150

E. coli, 1443 bp  [GB: X07037]
   orfA: 174 aa  [GB: CAA30085]
   orfB: 284 aa  [GB: CAA30086]
S0046 135 97% id to orfB over 100 aa S0046 region shows homology to IS150 bp 800-1099
S0048 81 96% id to orfB over 79 aa S0048-S0049 shows homology to IS150 bp 1-804
S0049 174 100% id to orfA  
Total IS150 ORFs: 3
IS600

S. sonnei, 1264 bp  [GB: X05952]
   orfA, TnpJ: 100 aa  [GB: CAA29384]
   orfB, TnpI: 272 aa  [GB: CAA29385]
S0009 140 91% id to orfB over 116 aa S0009-S0010 region shows homology to IS600 bp 1-743
S0010 101 92% id to orfA  
S0060 63 90% id to orfB over 63 aa S0060-S0061 region shows homology to IS600 bp 44-569
S0061 101 orfA  
S0108 273 orfB S0108-S0109 region shows homology to IS600 bp 1-1244, missing inverted repeat
S0109 101 orfA  
S0123 101 94% id to orfA S0123-S0124 = complete IS600
S0124 236 92% id to orfB over 235 aa  
S0168 250 91% id to orfB over 229 aa S0168 region shows homology to IS600 bp 408-1098
S0169 273 orfB S0169-S0170 = complete IS600
S0170 101 orfA  
S0171 101 orfA S0171 region shows homology to IS600 bp 1-408
S0172 63 95% id to orfB over 62 aa S0172 region shows homology to IS600 bp 1012-1264
S0201 67 97% id to orfB over 61 aa  
S0255 273 orfB S0255-S0256 = complete IS600
S0256 101 orfA  
S0266 273 orfB S0266-S0267 region shows homology to IS600 bp 183-1219; Tn501 insertion site begins immediately upstream of IS600 bp 183
S0267 42 98% id to orfA over 42 aa  
S0275 60 92% id to orfA over 42 aa S0275 region shows homology to IS600 bp 1-189, bp 183-189; (CCTAAAG) appears to be the target site for insertion of Tn501
S0285 100 94% id to orfA S0285-S0286 region shows homology to IS600 bp 1-588; immediately upstream is an IS600 sequence from bp 1157-1264
S0286 62 90% id to orfB over 62 aa  
Total IS600 ORFs: 21, complete elements: 3 (an additional complete element is disrupted by Tn501)
IS629

S. sonnei, 1310 bp  [GB: X51586]
   TnpC: 296 aa  [GB: CAA35936]
   TnpE: 108 aa  [GB: CAA35935]
S. flexneri, 1306 bp  [from GB: AF141323]
(Note: four bases from the S. sonnei IS629 sequence (bp 579-582) are deleted in the S. flexneri IS629)
   TnpC: 206 aa  [GB: AAD44736]; corresponds to aa 91-296 of S. sonnei TnpC
   TnpD: 118 aa  [GB: AAD44737]
   TnpE: 108 aa  [GB: AAD44738]
S0043 223 98% id to S. sonnei TnpC over 222 aa S0043-S0045 = complete IS629, no target site duplication
S0044 119 TnpD  
S0045 109 TnpE  
S0057 108 TnpE S0057-S0059 = partial IS629, homology to IS629 bp 1-410, bp 507-1310
S0058 83 100% id to TnpD over 74 aa  
S0059 223 98% id to TnpC over 223 aa  
S0069 108 TnpE S0069-S0071 region = partial IS629, similar to that seen at S0057-S0059
S0070 84 TnpD  
S0071 215 98% id to TnpC over 215 aa  
S0086 108 TnpE  
S0091 263 98% id to S. flexneri TnpC over 193 aa S0091-S0093 region has a complete IS629 sequence, but TnpC stop codon is beyond the right inverted repeat of IS629 due to a -1 frameshift; no target site duplication
S0092 119 TnpD  
S0093 109 TnpE  
S0096 85 87% id to TnpE  
S0180 109 TnpE S0180-S0182 = complete IS629 with 3 bp target site duplication (TTC)
S0181 118 TnpD  
S0182 206 TnpC in S. flexneri  
S0189 187 100% id to TnpC over 187 aa  
S0190 72 90% id to TnpE over 64 aa  
S0195 108 TnpE S0195-S0197 = complete IS629, no target site duplication
S0196 106 TnpD  
S0197 215 TnpC in S. flexneri  
S0244 109 TnpE S0244-S0246 = complete IS629 with 3 bp target site duplication (AAG)
S0245 118 TnpD  
S0246 223 TnpC  
S0287 122 99% id to TnpC over 115 aa  
S0293 55 94% id to TnpE over 35 aa  
Total IS629 ORFs: 27, complete elements: 5
IS630

S. sonnei, 1153 bp  [GB: X05955, bp 4-1156]
   orf: 343 aa  [GB: CAA29389]
S0029 47 78% id to IS630 orf over 45 aa homology to IS630 bp 269-323 and bp 999-1153, with 15 bp in between lacking homology
S0080 344 99.4% id to IS630 orf S0080 = complete IS630 with 2 bp target site duplication (TA)
S0254 318 97% id to IS630 orf over 313 aa homology to IS630 bp 209-1153
Total IS630 ORFs: 3, complete elements: 1
IS911

S. dysenteriae, 1250 bp  [GB: X17613]  IS3 family
   orfA: 120 aa
   orfB: 273 aa
S0288 113 96% id to orfA over 112 aa S0288-S0289 = complete IS911 with one base missing at IS911 bp 1209; no target site duplication
S0289 280 99% id to orfB over 273 aa  
Total IS911 ORFs: 2, complete elements: 1
IS1294

E. coli plasmid pUB2380, 1688 bp
   tnp1294: 351 aa  [GB: CAA07835]
   translational fusion protein: 389 aa
(Note: sequence of IS1294 is from plasmid pUB2380 [GB: AJ008006, bp 5401-7088]; in the GenBank entry for the E. coli IS1294 sequence itself [GB: X82430] the IS1294 element is from bp 2100-3787, not from bp 363-2011 as annotated)
S0047 399 98% id to 345 aa complete IS element; target site duplication (TGAAC) within the inverted repeat portion of the IS; IS1294 bp 344 missing in homology
S0064 402 94% id over 341 aa IS1294 bp 1353-1659 missing and replaced by 332 bp of non-homologous sequence
S0065 76 55% id over 35 aa to IS1650 transposase homology to IS1294 region in pUB2380, bp 5397-5499. Part of S0065 is located outside the IS1294 sequence; S0065 aa 35-70 of 76 are 56% identical to aa 111-146 of 148 of IS1650 transposase [Streptomyces coelicolor A3(2), see GB: AL117669]
S0075 55 portion of IS1294 S0075-S0077 = complete IS element; target site duplication (TGAAC) within the inverted repeat portion of the IS; IS1294 bp 344 missing in homology
S0076 63 portion of IS1294  
S0077 399 95% id over 345 aa  
S0095 402 94% id over 341 aa IS1294 bp 1353-1659 missing and replaced by 332 bp of non-homologous sequence
S0173 400 95% id over 345 aa complete IS element; target site duplication (GAAC)
S0281 398 94% id over 345 aa IS1294 bp 1-274 missing
Total IS1294 ORFs: 9, complete elements: 3
IS1328

Y. enterocolitica, 1359 bp  [GB: Z48244]
   transposase: 334 aa  [GB: CAA88289]
S0215 217 71% id over 217 aa, from aa 106-322 no DNA homology seen except to IS1328 bp 991-1049; new transposase?
Total IS1328 ORFs: 1
IS1353

IS1353 is found in integron In2, which is found in Tn21;
Tn21 was originally described in Shigella flexneri
1613 bp  [GB: U40482]
   ORFA: 202 aa  [GB: AAC44291]
   ORFB: 304 aa  [GB: AAC44292]
   translational fusion protein: 514 aa  [GB: BAA78795]
S0125 80 56% id over 80 aa of fusion protein no DNA homology to IS1353
S0126 94 73% id over 53 aa of the fusion protein  
Total IS1353 ORFs: 2
Total known IS ORfs: 114, complete elements: 20

New IS elements: pWR501 ORFs with >30% amino acid homology to database sequences over some length of the query sequence with no nucleotide homology to the target sequence. At least one complete element will show IR and/or DR.

Name and description of IS element pWR501 ORF ID size (aa) homology to orf in known IS element additional comments regarding homology &target sequence
ISSfl1

929 bp; 20 bp IR, 3 bp DR

ORF homologies to Streptomyces coelicolor IS1650 ORFs, but no nucleotide homology to IS1650 [GB: AL117669, bp 6054-6951]; IS1650 has two ORFs, possibly encoding a transposase via translational frameshifting:
   IS1650 orfA: 136 aa  [GB: CAB56132]
   IS1650 orfB: 148 aa  [GB: CAB56133]
S0078 110 48% id to orfA over 77 aa S0078-S0079 constitutes bp 16-869 of ISSfl1
S0079 172 44% id to orfB over 139 aa  
S0101 171 43% id to orfB over 139 aa S0101-S0102 constitutes bp 4-827 of ISSfl1
S0102 180 45% id to orfA over 101 aa  
S0203 171 44% id to orfB over 139 aa S0203-S204 constitutes complete ISSfl1 with duplication of TA at target site
S0204 149 46% id to orfA over 115 aa  
Total ISSfl1 ORFs: 6, complete elements: 1
ISSfl2

1373-1376 bp; 8 bp IR: CCCCCCAC

ORF homology to Streptomyces coelicolor IS110 ORF, but no nucleotide homology to IS110 [GB: Y00434]
   IS110 major ORF (site specific recombinase?): 405 aa  [GB: CAB51528]

S0055 398 58% id over 391 aa, aa 3-393  
S0128 399 59% id over 391 aa, from aa 3-393  
Total ISSfl2 ORFs: 2, complete elements: 2
ISSfl3

1301 bp; 18 bp IR: CTGAGGGATCCCCACAAA, 9 bp DR: CCCTTCAGG

ORF homology to IS10 transposase: 402 aa  [GB: I67760]; DR homologous to bp 756-764 of Y. pestis IS100
S0034 397 33% identity over 361 aa  
Total ISSf13 ORFs: 1, complete elements: 1
ISSfl4

2728 bp; 11 bp IR, no DR seen
ISSfl4 is a member of the IS66 family; it is composed of 3 ORFs with homologies to the ISEc8 ORFs of EHEC 0157:H7 strain EDL933:
   ORF L0013: 133 aa  [GB: AAC31492]
   ORF L0014: 115 aa  [GB: AAC31493]
   ORF L0015: 512 aa  [GB: AAC31494]
S0008 183 51% id over 178 aa from aa 32-210 of L0015  
S0023 112 36% id over 104 aa from aa 32-132 of L0013  
S0024 34 73% id over 33 aa from aa 1-33 of L0015  
S0025 435 56% id to 380 aa from aa 26-406 of L0015  
S0072 198 65% id over 133 aa from aa 377-509 of L0015  
S0081 83 59% id to 32 aa from aa 435-466 of L0015  
S0090 119 62% id over 164 aa from aa 3-166 of L0015  
S0116 350 65% id over 340 aa from aa 166-508 of L0015 has IR sequence GTAAGCGCCCC upstream of start codon
S0117 190 38% id to 185 aa from aa 1-163 of L0015 S0116-S0119 constitutes a complete ISSfl4 sequence
S0118 114 69% id over 83 aa from aa 1-81 of L0014  
S0119 224 40% id over 52 aa from aa 8-59 of L0013 has IR sequence GGGGCGCTTAC downstream of stop codon
S0216 350 65% over 340 aa from aa 166-508 of L0015 has IR sequence GTAAGCGCCCC downstream of stop codon
S0217 190 39% id over 185 aa from aa 1-163 of L0015 S0216-S0219 constitutes a complete ISSfl4 sequence
S0218 117 68% id over 83 aa from aa 1-81 of L0014  
S0219 224 40% id over 52 aa from aa 8-59 of L0013 has IR sequence GGGGCGCTTAC upstream of start codon
Total ISSfl4 ORFs: 16, complete elements: 2
Total New IS ORFs: 25, complete elements: 6
Known + New IS ORFs: 140, complete elements: 26

Unknown IS-related ORFs: pWR501 ORFs with some homology (less than significant) to known or putative transposases.

bacterial spp. & database ID# pWR501 ORF ID size (aa) homology to database sequence comments regarding homology to database sequences
E. coli transposase: 406 aa  [GB: AAF60967] S0014 124 87% id over 74 aa from aa 333-406  
S0015 125 86% id over 111 aa from aa 220-330 also shows 32% id over 93 aa to aa 227-312 from A. eutrophus ISAE1 transposase [406 aa;  GB: AAC13658]
ETEC 92 kb plasmid, IS1414 orf: 402 aa  [GB: AAG18473] S0027 52 63% id over 36 aa from aa 1-31 S0027-S0028 is frameshifted and show homology to the same set of sequences in the database
S0028 62 70% id over 62 aa from aa 36-67  
uropathogenic E. coli, orfB: 195 aa  [GB: AAC61729] S0073 130 99% id over 130 aa from aa 66-195 S0073 also shows 34% identity over 126 aa to E. coli IS21 IstB ATP binding protein [265 aa;  GB: P15026], 40% id over 115 aa from aa 123-237 to IS5376 from Bacillus stearothermophilus [251 aa;  GB: CAA48046], and to Rhizobium proteins Y4sD/Y4nD/Y4iQ
E. coli IS10 transposase: 402 aa  [GB: I67760] S0110 111 36% id over 74 aa  
E. coli genome, Kohara clone o230#3: 96 aa  [GB: BAA35811] S0247 91 94% id over 18 aa from aa 2-18 beyond 1-17 aa, S0247 is 42% id over 47 aa to putative transposases from Rhizobium (Y4bL, Y4tB, Y4kJ: 516 aa;  GB: P55379); S0247 also shows homology to S0021, S0242 (91% id over 59 aa) and to S0119 and S0219 (51% id over 35 aa)
E. coli IS21 tnp: 390 aa  [GB: CAA32898] S0127 316 84% id over 57 aa  
Mesorhizobium loti transposase: 347 aa  [GB: NP_105082] S0178 112 46% id to 103 aa from aa 241-343 S0178 also shows 31% id over 110 aa to ISSfl2 orf S0055 (and S0128), and 38% id over 107 aa to Pseudomonas atlantica IS492 (318 aa)
A. eutrophus ISAE1 transposase [406 aa;  GB: AAC13658] S0183 76 46% id over 46 aa from aa 212-255 start codon GTG of S0183 is embedded within the 3' end of an IS91 element; S0183 also shows homology to tnp/Thiobacillus, urf/p0157, putative tnp/Nocardia
Rhizobium sp. NG234 putative transposase Y4bL (Y4kJ, Y4tb): 516 aa  [GB: AAB91627] S0021 94 46% id over 72 aa from aa 15-86 S0021 and S0242 also show homology to aa 17-75 of S0247
S0242 80 46% id to 52 aa from aa 35-86  
Mesorhizobium loti probable transposase: 185 aa  [GB: NP_102655] S0041 121 44% id over 111 aa, from 2-112 aa also shows 41% id over 113 aa to IS1111A/IS1328/IS1533 family transposase from Caulobacter crescentus (354 aa;  GB: AAK24705), 36% id over 113 aa to Rhizobium sp. NGR234 putative transposase Y4pF/Y4sB (387 aa;  GB: AAB91816/AAB91842)
E. coli 0157:H7 EDL933 ISEc8-related ORF: 537 aa;  [GB: AAG55306] S0038 104 80% id over 45 aa from aa 111-155 also shows 52% id over 42 aa to L0015, aa 91-132 [L0015: 512 aa;   AAC31494]
E. coli 0157:H7 EDL933 ISEc8-related ORF: 285 aa;  [GB: AAG55307] S0230 189 98% id over 80 aa from aa 198-277 also shows 34% id over 182 aa to L0015, aa 196-374 [L0015: 512 aa;   AAC31494]
E. coli 0157:H7 EDL933 ISEc8-related ORF: 161 aa;  [GB: AAG55303] S0231 126 95% id over 126 aa from aa 36-161 also shows 40% id over 102 aa to L0015, aa 387-504 [L0015: 512 aa;   AAC31494]
Total Unknown IS ORFs: 15
Known + New + Unknown IS ORFs: 153


Last Updated on May 10, 2001