|
Escherichia coli K-12 MG1655 sequence and annotations U00096.1 (aka versions m49 - m54)
The Complete Genome Sequence of Escherichia coli K-12
Determining the complete E. coli K-12 sequence was among the first targets
for whole-genome sequencing (see About Us). Between 1992 and 1995 we published segments of the genome
that together comprised over 1 Mbp of contiguous finished sequence [1 - 6]. As 1996 neared its end, the circle was closed, and we
deposited the complete 4,639,221 bp genome sequence of Escherichia coli K-12 MG1655 in GenBank on January 16, 1997. The
sequence was assigned the accession number U00096 (later U00096.1),
although it was released as a series of 400 overlapping sections (accession numbers AE000111 - AE000510) due to GenBank entry size
limitations at the time. Our analysis of the genome sequence was reported in the September 5, 1997 issue of Science [7].
Press release
from the University of Wisconsin-Madison.
References
- D. L. Daniels, G. Plunkett III, V. Burland, & F. R. Blattner (1992) Analysis of the Escherichia coli genome: DNA
sequence of the region from 84.5 to 86.5 minutes. Science 257(5071):771-778. [PMID: 1379743].
- V. Burland, G. Plunkett III, D. L. Daniels, & F. R. Blattner (1993) DNA sequence and analysis of 136 kilobases of the
Escherichia coli genome: organizational symmetry around the origin of replication. Genomics 16(3):551-561. [PMID: 7686882].
- G. Plunkett III, V. Burland, D. L. Daniels, & F. R. Blattner (1993) Analysis of the Escherichia coli genome.
III. DNA sequence of the region from 87.2 to 89.2 minutes. Nucleic Acids Res 21(15):3391-3398. [PMID: 8346018].
- F. R. Blattner, V. Burland, G. Plunkett III, H. J. Sofia, & D. L. Daniels (1993) Analysis of the Escherichia coli
genome. IV. DNA sequence of the region from 89.2 to 92.8 minutes. Nucleic Acids Res 21(23):5408-5417. [PMID: 8265357].
- H. J. Sofia, V. Burland, D. L. Daniels, G. Plunkett III, & F. R. Blattner (1994) Analysis of the Escherichia coli
genome. V. DNA sequence of the region from 76.0 to 81.5 minutes. Nucleic Acids Res 22(13):2576-2586. [PMID: 8041620].
- V. Burland, G. Plunkett III, H. J. Sofia, D. L. Daniels, & F. R. Blattner (1995) Analysis of the Escherichia coli
genome VI: DNA sequence of the region from 92.8 through 100 minutes. Nucleic Acids Res 23(12):2105-2119. [PMID: 7610040].
- F. R. Blattner, G. Plunkett III, C. A. Bloch, N. T. Perna, V. Burland, M. Riley, J. Collado-Vides,
J. D. Glasner, C. K. Rode, G. F. Mayhew, J.Gregor, N. W. Davis, H. A. Kirkpatrick, M. A. Goeden, D. J. Rose,
B. Mau, & Y. Shao (1997) The complete genome sequence of Escherichia coli K-12. Science
277(5331):1453-1474. [PMID: 9278503].
Subsequent annotation updates
The E. coli K-12 MG1655 annotations were updated numerous times. We designated version numbers
for the annotated sequence to assist in keeping track of corrections, updates, and other changes. The version described in the
original 1997 Science paper was designated m49. An annotation update deposited September 2, 1997 was designated m52
(intermediate working versions were available on our ftp site, but not submitted to GenBank). Version m54 was deposited
October 13, 1998. Those annotation updates had been managed via an essentially manual process. In order to better deal with that
ongoing task, as well as with other genomes we were sequencing, we adopted the ASAP relational database
as the venue for maintaining and updating the annotations, as well as enabling community input towards that goal. We also worked with
other groups to correlate and reconcile the various lists and databases containing E. coli genomic information.
Within ASAP, annotation updates included revisions to gene boundaries, gene names, and known or
predicted gene products. In addition, new genes were added to the annotations, and perhaps inevitably, several previously
annotated genes were deaccessioned. Some of these changes were reported elsewhere as personal communications
(see Serres, et al. 2001). Users were directed to ASAP for the most
current annotations, but we provided summary information on this page from time to time.
This version of the sequence was replaced by a corrected sequence (m56, corresponding to GenBank record
U00096.2) on Jun 21, 2004.
These genes were added to
the annotations (also see RNA genes, below):
lend |
rend |
dir |
bnum |
type |
gene |
syn |
product |
16751 |
16903 |
- |
b4412 |
CDS |
hokC |
gef |
small toxic membrane polypeptide; component of addiction module |
213925 |
214125 |
- |
b4406 |
CDS |
yaeP |
|
conserved hypothetical protein |
607059 |
607211 |
+ |
b4415 |
CDS |
hokE |
|
small toxic membrane polypeptide; component of addiction module |
1268391 |
1268498 |
- |
b4419 |
CDS |
ldrA |
|
small toxic polypeptide; component of addiction module |
1268926 |
1269033 |
- |
b4421 |
CDS |
ldrB |
|
small toxic polypeptide; component of addiction module |
1269461 |
1269568 |
- |
b4423 |
CDS |
ldrC |
|
small toxic polypeptide; component of addiction module |
1489946 |
1490095 |
- |
b4428 |
CDS |
hokB |
ydcB |
small toxic membrane polypeptide; component of addiction module |
1702575 |
1702700 |
+ |
b4409 |
CDS |
blr |
|
beta-lactam resistance protein |
3697609 |
3697716 |
- |
b4453 |
CDS |
ldrD |
|
small toxic polypeptide; component of addiction module |
3718077 |
3718229 |
- |
b4455 |
CDS |
hokA |
yiaZ |
small toxic membrane polypeptide; component of addiction module |
4190215 |
4190415 |
- |
b4407 |
CDS |
thiS |
thiG1 |
sulfur carrier protein |
4373895 |
4374020 |
+ |
b4410 |
CDS |
ecnA |
|
entericidin A (antidote to entericidin B); component of addiction
module |
4374131 |
4374277 |
+ |
b4411 |
CDS |
ecnB |
yjeU |
bacteriolytic lipoprotein entericidin B; component of addiction
module |
These previously annotated genes were deaccessioned:
lend |
rend |
dir |
bnum |
type |
gene |
317526 |
317795 |
+ |
b0302 |
CDS |
|
338993 |
339313 |
+ |
b0322 |
CDS |
yahH |
348742 |
349188 |
- |
b0332 |
CDS |
|
410255 |
410497 |
+ |
b0395 |
CDS |
|
695581 |
695916 |
+ |
b0663 |
CDS |
|
695931 |
696068 |
+ |
b0667 |
CDS |
|
696065 |
696184 |
+ |
b0669 |
CDS |
|
696185 |
696337 |
+ |
b0671 |
CDS |
|
1096171 |
1096422 |
+ |
b1030 |
CDS |
|
1096190 |
1096603 |
+ |
b1031 |
CDS |
ycdV |
2509085 |
2509429 |
+ |
b2391 |
CDS |
|
3267304 |
3267468 |
+ |
b3122 |
CDS |
|
4019665 |
4020006 |
+ |
b3837 |
CDS |
|
4139437 |
4139766 |
+ |
b3948 |
CDS |
yijI |
4311389 |
4311796 |
- |
b4091 |
CDS |
phnQ |
back to top
RNA genes, including small regulatory RNAs
Although over 85% of the genome consists of protein-encoding genes, other
genes encode RNAs that function without being translated into proteins.
These "RNA genes" are often referred to as noncoding or non-coding RNAs
(ncRNA); other designations include small RNA (sRNA), non-messenger RNA
(nmRNA), small non-messenger RNA (snmRNA), functional RNA (fRNA), and
the generic miscellaneous RNA (misc_RNA) used in GenBank. The best-known
RNA genes encode transfer RNAs (tRNA) and ribosomal RNAs (rRNA), but since
the late 1990s many new noncoding RNAs have been found to play significant
roles in the cell.
In addition to 22 rRNAs and 86 tRNAs, a handful of misc_RNAs were already
annotated in our GenBank entry. In consultation with Susan Gottesman,
Gisela Storz, and Karen Wassarman, we have begun an effort to add a number
of other RNA genes to the annotations initially in ASAP and eventually
in GenBank as well. The following table lists these RNA genes, including
their assigned b-numbers; it includes those previously annotated; updated
October 30, 2003.
lend |
rend |
dir |
bnum |
name (synonyms) |
notes |
Gene Expression Profile* |
reference(s) |
16952 |
17006 |
+ |
b4413 |
sokC (sof) |
antisense RNA blocking mokC and hokC translation;
component of addiction module |
|
22 |
189712 |
189847 |
+ |
b4414 |
t44 |
identified in a large scale screen; function unknown |
b4414 |
03 |
475672 |
475785 |
+ |
b0455 |
ffs |
4.5S RNA, component of Signal Recognition Particle (SRP) with the
Ffh protein; involved in co-tranlational targeting of proteins to
membranes |
|
08 |
852175 |
852263 |
- |
b4416 |
rybA |
identified in a large scale screen; function unknown |
b4416 |
02 |
887199 |
887277 |
- |
b4417 |
rybB (p25) |
identified in a large scale screen; function unknown |
b4417 |
02 |
1145812 |
1145980 |
+ |
b4418 |
sraB (pke20) |
identified in a large scale screen; function unknown |
b4418 |
01 |
1268546 |
1268612 |
+ |
b4420 |
rdlA |
antisense RNA, trans-acting regulator of ldrA translation;
component of addiction module |
|
20 |
1269081 |
1269146 |
+ |
b4422 |
rdlB |
antisense RNA, trans-acting regulator of ldrB translation;
component of addiction module |
|
20 |
1269616 |
1269683 |
+ |
b4424 |
rdlC |
antisense RNA, trans-acting regulator of ldrC translation;
component of addiction module |
|
20 |
1286289 |
1286459 |
- |
b4425 |
rtT (rttR) (rtV1) |
released from primary tyrT transcript during tRNA processing;
encodes putative Tpr protein; the RNA itself may modulate the stringent
response |
b4425 |
21 |
1403676 |
1403833 |
- |
b4426 |
IS061 |
identified in a large scale screen; function unknown |
b4426 |
04 |
1435145 |
1435252 |
+ |
b4427 |
tke8 (IS063) |
identified in a large scale screen; function unknown |
b4427 |
04 |
1490143 |
1490195 |
+ |
b4429 |
sokB |
antisense RNA blocking mokB and hokB translation;
component of addiction module |
|
22 |
1647406 |
1647458 |
+ |
b1574 |
dicF |
DicF antisense RNA; inhibits ftsZ translation |
|
09 |
1762737 |
1762804 |
- |
b4430 |
rydB (tpe7) (IS082) |
identified in a large scale screen; function unknown |
|
02 |
1768396 |
1768500 |
+ |
b4431 |
rprA (IS083) |
positive regulatory RNA for RpoS translation |
|
01,07 |
1921090 |
1921338 |
+ |
b4432 |
ryeA (sraC) (tpke79) (IS091) |
identified in a large scale screen; function unknown |
b4432 |
01,02 |
1921188 |
1921308 |
- |
b4433 |
ryeB (tpke79) |
identified in a large scale screen; function unknown |
|
02 |
1985862 |
1986021 |
- |
b4434 |
IS092 |
identified in a large scale screen; function unknown |
b4434 |
04 |
2023249 |
2023335 |
- |
b1954 |
dsrA (IS095) |
regulatory RNA; regulates transcriptional silencing by H-NS protein,
and enhances translation of RpoS |
|
10 |
2069337 |
2069540 |
+ |
b4435 |
IS102 |
identified in a large scale screen; function unknown |
b4435 |
04 |
2151297 |
2151445 |
+ |
b4436 |
ryeC (tp11) (QUAD1a) |
identified in a large scale screen; function unknown |
|
02 |
2151632 |
2151774 |
+ |
b4437 |
ryeD (tpe60) (QUAD1b) |
identified in a large scale screen; function unknown |
|
02 |
2165134 |
2165219 |
+ |
b4438 |
ryeE |
identified in a large scale screen; function unknown |
b4438 |
02 |
2311104 |
2311196 |
+ |
b4439 |
micF (IS113) |
regulatory antisense RNA affecting ompF expression |
|
11 |
2651875 |
2652178 |
+ |
b4440 |
ryfA (tp1) (PAIR3) |
identified in a large scale screen; function unknown |
b4440 |
02,03 |
2689212 |
2689360 |
- |
b4441 |
tke1 |
identified in a large scale screen; function unknown |
b4441 |
03 |
2753614 |
2753976 |
+ |
b2621 |
ssrA (sipB) |
10Sa RNA; tmRNA, acts as both tRNA-Ala and mRNA template for tagging
proteins resulting from premature transcription termination |
|
12 |
2812822 |
2812897 |
+ |
b4442 |
sraD |
identified in a large scale screen; function unknown |
b4442 |
01 |
2922178 |
2922537 |
- |
b4408 |
csrB |
CsrA-binding RNA, antagonizes csrA regulation |
|
13 |
2940718 |
2940922 |
+ |
b4443 |
gcvB (IS145) |
small RNA gene divergent from gcvA; represses oppA, dppA,
gltI and livJ expression |
b4443 |
01,06 |
2974124 |
2974211 |
- |
b4444 |
rygA (sraE) (t59) (PAIR2) |
identified in a large scale screen; function unknown |
|
01,02 |
2974332 |
2974407 |
- |
b4445 |
rygB (t59) (PAIR2) |
identified in a large scale screen; function unknown |
|
02 |
3054003 |
3054185 |
+ |
b2911 |
ssrS (ssr) |
6S RNA; modulates promoter use |
|
14 |
3054835 |
3054985 |
+ |
b4446 |
rygC (t27) (QUAD1c) |
identified in a large scale screen; function unknown |
|
02 |
3192767 |
3192916 |
- |
b4447 |
rygD (tp8) (C0730) (QUAD1d) (IS156) |
identified in a large scale screen; function unknown |
|
03,05 |
3236015 |
3236203 |
+ |
b4448 |
sraF (tpk1) (IS160) |
identified in a large scale screen; function unknown |
b4448 |
01,03 |
3267857 |
3268233 |
- |
b3123 |
rnpB |
M1 RNA; RNA component of RNase P, involved in tRNA and 4.5S RNA
processing |
|
15 |
3308866 |
3309039 |
+ |
b4449 |
sraG (p3) |
identified in a large scale screen; function unknown |
b4449 |
01 |
3348218 |
3348325 |
+ |
b4450 |
ryhA (sraH) |
identified in a large scale screen; function unknown |
b4450 |
01,02 |
3578554 |
3578647 |
- |
b4451 |
ryhB (sraI) (IS176) |
regulatory RNA mediating Fur regulon response; Fur represses this
inhibitory RNA, relieving sdhABCD, sodB, ftnA, bfr, and fumA
from RyhB-mediated repression |
|
01,02,16 |
3662494 |
3662598 |
+ |
b4452 |
IS183 |
identified in a large scale screen; function unknown |
b4452 |
04 |
3697765 |
3697828 |
+ |
b4454 |
rdlD |
antisense RNA, trans-acting regulator of ldrD translation;
component of addiction module |
|
20 |
3984045 |
3984216 |
+ |
b4456 |
ryiA (sraJ) (k19) |
identified in a large scale screen; function unknown |
b4456 |
01,02 |
4047479 |
4047587 |
+ |
b3864 |
spf (IS197) |
Spot 42 RNA; antisense regulator of galK translation |
|
17 |
4048616 |
4048860 |
+ |
b4457 |
csrC (sraK) (ryiB) (tpk2) (IS198) |
CsrA-binding RNA, antagonizes csrA regulation |
b4457 |
01,02,03,19 |
4155864 |
4155973 |
- |
b4458 |
oxyS |
global regulatory RNA, induced in response to oxidative stress;
activates or represses expression of many genes |
b4458 |
18 |
4275506 |
4275645 |
- |
b4459 |
ryjA (sraL) |
identified in a large scale screen; function unknown |
|
01,02 |
*The gene expression profiles of selected sRNAs were based on data from
Affymetrix E. coli antisense genome arrays. The chip design file
was modified to fit the newest annotation and data were extracted with dchip
software.
back to top
|