Escherichia coli K-12 MG1655 sequence and annotations
U00096.1 (aka versions m49 - m54)

Annotation Updates Added genes Deaccessioned genes RNA Genes

The Complete Genome Sequence of Escherichia coli K-12

    Determining the complete E. coli K-12 sequence was among the first targets for whole-genome sequencing (see About Us). Between 1992 and 1995 we published segments of the genome that together comprised over 1 Mbp of contiguous finished sequence [1 - 6]. As 1996 neared its end, the circle was closed, and we deposited the complete 4,639,221 bp genome sequence of Escherichia coli K-12 MG1655 in GenBank on January 16, 1997. The sequence was assigned the accession number U00096 (later U00096.1), although it was released as a series of 400 overlapping sections (accession numbers AE000111 - AE000510) due to GenBank entry size limitations at the time. Our analysis of the genome sequence was reported in the September 5, 1997 issue of Science [7].

Press release from the University of Wisconsin-Madison.

References

  1. D. L. Daniels, G. Plunkett III, V. Burland, & F. R. Blattner (1992) Analysis of the Escherichia coli genome: DNA sequence of the region from 84.5 to 86.5 minutes. Science 257(5071):771-778. [PMID: 1379743].
  2. V. Burland, G. Plunkett III, D. L. Daniels, & F. R. Blattner (1993) DNA sequence and analysis of 136 kilobases of the Escherichia coli genome: organizational symmetry around the origin of replication. Genomics 16(3):551-561. [PMID: 7686882].
  3. G. Plunkett III, V. Burland, D. L. Daniels, & F. R. Blattner (1993) Analysis of the Escherichia coli genome. III. DNA sequence of the region from 87.2 to 89.2 minutes. Nucleic Acids Res 21(15):3391-3398. [PMID: 8346018].
  4. F. R. Blattner, V. Burland, G. Plunkett III, H. J. Sofia, & D. L. Daniels (1993) Analysis of the Escherichia coli genome. IV. DNA sequence of the region from 89.2 to 92.8 minutes. Nucleic Acids Res 21(23):5408-5417. [PMID: 8265357].
  5. H. J. Sofia, V. Burland, D. L. Daniels, G. Plunkett III, & F. R. Blattner (1994) Analysis of the Escherichia coli genome. V. DNA sequence of the region from 76.0 to 81.5 minutes. Nucleic Acids Res 22(13):2576-2586. [PMID: 8041620].
  6. V. Burland, G. Plunkett III, H. J. Sofia, D. L. Daniels, & F. R. Blattner (1995) Analysis of the Escherichia coli genome VI: DNA sequence of the region from 92.8 through 100 minutes. Nucleic Acids Res 23(12):2105-2119. [PMID: 7610040].
  7. F. R. Blattner, G. Plunkett III, C. A. Bloch, N. T. Perna, V. Burland, M. Riley, J. Collado-Vides, J. D. Glasner, C. K. Rode, G. F. Mayhew, J.Gregor, N. W. Davis, H. A. Kirkpatrick, M. A. Goeden, D. J. Rose, B. Mau, & Y. Shao (1997) The complete genome sequence of Escherichia coli K-12. Science 277(5331):1453-1474. [PMID: 9278503].

Subsequent annotation updates

    The E. coli K-12 MG1655 annotations were updated numerous times. We designated version numbers for the annotated sequence to assist in keeping track of corrections, updates, and other changes. The version described in the original 1997 Science paper was designated m49. An annotation update deposited September 2, 1997 was designated m52 (intermediate working versions were available on our ftp site, but not submitted to GenBank). Version m54 was deposited October 13, 1998. Those annotation updates had been managed via an essentially manual process. In order to better deal with that ongoing task, as well as with other genomes we were sequencing, we adopted the ASAP relational database as the venue for maintaining and updating the annotations, as well as enabling community input towards that goal. We also worked with other groups to correlate and reconcile the various lists and databases containing E. coli genomic information.

    Within ASAP, annotation updates included revisions to gene boundaries, gene names, and known or predicted gene products. In addition, new genes were added to the annotations, and perhaps inevitably, several previously annotated genes were deaccessioned. Some of these changes were reported elsewhere as personal communications (see Serres, et al. 2001). Users were directed to ASAP for the most current annotations, but we provided summary information on this page from time to time.

    This version of the sequence was replaced by a corrected sequence (m56, corresponding to GenBank record U00096.2) on Jun 21, 2004.

These genes were added to the annotations (also see RNA genes, below):

lend rend dir bnum type gene syn product
16751 16903 - b4412 CDS hokC gef small toxic membrane polypeptide; component of addiction module
213925 214125 - b4406 CDS yaeP   conserved hypothetical protein
607059 607211 + b4415 CDS hokE   small toxic membrane polypeptide; component of addiction module
1268391 1268498 - b4419 CDS ldrA   small toxic polypeptide; component of addiction module
1268926 1269033 - b4421 CDS ldrB   small toxic polypeptide; component of addiction module
1269461 1269568 - b4423 CDS ldrC   small toxic polypeptide; component of addiction module
1489946 1490095 - b4428 CDS hokB ydcB small toxic membrane polypeptide; component of addiction module
1702575 1702700 + b4409 CDS blr   beta-lactam resistance protein
3697609 3697716 - b4453 CDS ldrD   small toxic polypeptide; component of addiction module
3718077 3718229 - b4455 CDS hokA yiaZ small toxic membrane polypeptide; component of addiction module
4190215 4190415 - b4407 CDS thiS thiG1 sulfur carrier protein
4373895 4374020 + b4410 CDS ecnA   entericidin A (antidote to entericidin B); component of addiction module
4374131 4374277 + b4411 CDS ecnB yjeU bacteriolytic lipoprotein entericidin B; component of addiction module

These previously annotated genes were deaccessioned:

lend rend dir bnum type gene
317526 317795 + b0302 CDS  
338993 339313 + b0322 CDS yahH
348742 349188 - b0332 CDS  
410255 410497 + b0395 CDS  
695581 695916 + b0663 CDS  
695931 696068 + b0667 CDS  
696065 696184 + b0669 CDS  
696185 696337 + b0671 CDS  
1096171 1096422 + b1030 CDS  
1096190 1096603 + b1031 CDS ycdV
2509085 2509429 + b2391 CDS  
3267304 3267468 + b3122 CDS  
4019665 4020006 + b3837 CDS  
4139437 4139766 + b3948 CDS yijI
4311389 4311796 - b4091 CDS phnQ

back to top

RNA genes, including small regulatory RNAs

    Although over 85% of the genome consists of protein-encoding genes, other genes encode RNAs that function without being translated into proteins. These "RNA genes" are often referred to as noncoding or non-coding RNAs (ncRNA); other designations include small RNA (sRNA), non-messenger RNA (nmRNA), small non-messenger RNA (snmRNA), functional RNA (fRNA), and the generic miscellaneous RNA (misc_RNA) used in GenBank. The best-known RNA genes encode transfer RNAs (tRNA) and ribosomal RNAs (rRNA), but since the late 1990s many new noncoding RNAs have been found to play significant roles in the cell.

    In addition to 22 rRNAs and 86 tRNAs, a handful of misc_RNAs were already annotated in our GenBank entry. In consultation with Susan Gottesman, Gisela Storz, and Karen Wassarman, we have begun an effort to add a number of other RNA genes to the annotations – initially in ASAP and eventually in GenBank as well. The following table lists these RNA genes, including their assigned b-numbers; it includes those previously annotated; updated October 30, 2003.

lend rend dir bnum name (synonyms) notes Gene Expression Profile* reference(s)
16952 17006 + b4413 sokC (sof) antisense RNA blocking mokC and hokC translation; component of addiction module   22
189712 189847 + b4414 t44 identified in a large scale screen; function unknown b4414 03
475672 475785 + b0455 ffs 4.5S RNA, component of Signal Recognition Particle (SRP) with the Ffh protein; involved in co-tranlational targeting of proteins to membranes   08
852175 852263 - b4416 rybA identified in a large scale screen; function unknown b4416 02
887199 887277 - b4417 rybB (p25) identified in a large scale screen; function unknown b4417 02
1145812 1145980 + b4418 sraB (pke20) identified in a large scale screen; function unknown b4418 01
1268546 1268612 + b4420 rdlA antisense RNA, trans-acting regulator of ldrA translation; component of addiction module   20
1269081 1269146 + b4422 rdlB antisense RNA, trans-acting regulator of ldrB translation; component of addiction module   20
1269616 1269683 + b4424 rdlC antisense RNA, trans-acting regulator of ldrC translation; component of addiction module   20
1286289 1286459 - b4425 rtT (rttR) (rtV1) released from primary tyrT transcript during tRNA processing; encodes putative Tpr protein; the RNA itself may modulate the stringent response b4425 21
1403676 1403833 - b4426 IS061 identified in a large scale screen; function unknown b4426 04
1435145 1435252 + b4427 tke8 (IS063) identified in a large scale screen; function unknown b4427 04
1490143 1490195 + b4429 sokB antisense RNA blocking mokB and hokB translation; component of addiction module   22
1647406 1647458 + b1574 dicF DicF antisense RNA; inhibits ftsZ translation   09
1762737 1762804 - b4430 rydB (tpe7) (IS082) identified in a large scale screen; function unknown   02
1768396 1768500 + b4431 rprA (IS083) positive regulatory RNA for RpoS translation   01,07
1921090 1921338 + b4432 ryeA (sraC) (tpke79) (IS091) identified in a large scale screen; function unknown b4432 01,02
1921188 1921308 - b4433 ryeB (tpke79) identified in a large scale screen; function unknown   02
1985862 1986021 - b4434 IS092 identified in a large scale screen; function unknown b4434 04
2023249 2023335 - b1954 dsrA (IS095) regulatory RNA; regulates transcriptional silencing by H-NS protein, and enhances translation of RpoS   10
2069337 2069540 + b4435 IS102 identified in a large scale screen; function unknown b4435 04
2151297 2151445 + b4436 ryeC (tp11) (QUAD1a) identified in a large scale screen; function unknown   02
2151632 2151774 + b4437 ryeD (tpe60) (QUAD1b) identified in a large scale screen; function unknown   02
2165134 2165219 + b4438 ryeE identified in a large scale screen; function unknown b4438 02
2311104 2311196 + b4439 micF (IS113) regulatory antisense RNA affecting ompF expression   11
2651875 2652178 + b4440 ryfA (tp1) (PAIR3) identified in a large scale screen; function unknown b4440 02,03
2689212 2689360 - b4441 tke1 identified in a large scale screen; function unknown b4441 03
2753614 2753976 + b2621 ssrA (sipB) 10Sa RNA; tmRNA, acts as both tRNA-Ala and mRNA template for tagging proteins resulting from premature transcription termination   12
2812822 2812897 + b4442 sraD identified in a large scale screen; function unknown b4442 01
2922178 2922537 - b4408 csrB CsrA-binding RNA, antagonizes csrA regulation   13
2940718 2940922 + b4443 gcvB (IS145) small RNA gene divergent from gcvA; represses oppA, dppA, gltI and livJ expression b4443 01,06
2974124 2974211 - b4444 rygA (sraE) (t59) (PAIR2) identified in a large scale screen; function unknown   01,02
2974332 2974407 - b4445 rygB (t59) (PAIR2) identified in a large scale screen; function unknown   02
3054003 3054185 + b2911 ssrS (ssr) 6S RNA; modulates promoter use   14
3054835 3054985 + b4446 rygC (t27) (QUAD1c) identified in a large scale screen; function unknown   02
3192767 3192916 - b4447 rygD (tp8) (C0730) (QUAD1d) (IS156) identified in a large scale screen; function unknown   03,05
3236015 3236203 + b4448 sraF (tpk1) (IS160) identified in a large scale screen; function unknown b4448 01,03
3267857 3268233 - b3123 rnpB M1 RNA; RNA component of RNase P, involved in tRNA and 4.5S RNA processing   15
3308866 3309039 + b4449 sraG (p3) identified in a large scale screen; function unknown b4449 01
3348218 3348325 + b4450 ryhA (sraH) identified in a large scale screen; function unknown b4450 01,02
3578554 3578647 - b4451 ryhB (sraI) (IS176) regulatory RNA mediating Fur regulon response; Fur represses this inhibitory RNA, relieving sdhABCD, sodB, ftnA, bfr, and fumA from RyhB-mediated repression   01,02,16
3662494 3662598 + b4452 IS183 identified in a large scale screen; function unknown b4452 04
3697765 3697828 + b4454 rdlD antisense RNA, trans-acting regulator of ldrD translation; component of addiction module   20
3984045 3984216 + b4456 ryiA (sraJ) (k19) identified in a large scale screen; function unknown b4456 01,02
4047479 4047587 + b3864 spf (IS197) Spot 42 RNA; antisense regulator of galK translation   17
4048616 4048860 + b4457 csrC (sraK) (ryiB) (tpk2) (IS198) CsrA-binding RNA, antagonizes csrA regulation b4457 01,02,03,19
4155864 4155973 - b4458 oxyS global regulatory RNA, induced in response to oxidative stress; activates or represses expression of many genes b4458 18
4275506 4275645 - b4459 ryjA (sraL) identified in a large scale screen; function unknown   01,02

*The gene expression profiles of selected sRNAs were based on data from Affymetrix E. coli antisense genome arrays. The chip design file was modified to fit the newest annotation and data were extracted with dchip software.

back to top

 


© 2002-2021 UW E. coli Genome Project