|
|
E. coli K-12 sequence and annotations
Update released June 10, 2004: The
Escherichia coli K-12 strain MG1655 sequence and annotations have
been updated; see this announcement for further
information. In addition, an Excel spreadsheet
is available which summarizes the MG1655 update in terms of nucleotide
sequence corrections and the consequent protein sequence changes.
Contribute to annotation updates!
A wealth of new information has become available recently for the annotation
of Escherichia coli K-12 strain MG1655. Annotation of the genome
is an ongoing task that benefits from the work of all end-users of the
sequence. To this end, we have adopted the ASAP
relational database as the venue for maintaining and updating the annotations,
as well as enabling community input towards that goal. Please note that
while you are invited to become a registered annotator and contribute
to the information within ASAP, there is no requirement to register in
order to view the current MG1655 annotations -- simply log on as a guest.
Furthermore, the "Add a note for the curator" function allows even guest
users to suggest additional annotation updates and corrections. Finally,
we are working with other groups to correlate and reconcile the various
lists and databases containing E. coli genomic information (see,
for example, the sites listed at the E.
coli Database Portal).
Some annotations have already been updated within ASAP, including a number
of revised gene boundaries, gene names and known or predicted gene products.
In addition, several new genes have been added to the annotations, and
perhaps inevitably, several previously annotated genes have been deaccessioned.
Some of these changes have been previously reported as personal communications
(see
Serres, et al. 2001). While you are directed to ASAP for the current
annotations, we will provide summary information on this page from time
to time; updated December 8, 2003.
The following genes have been added to the annotations
(also see RNA genes, below):
| lend |
rend |
dir |
bnum |
type |
gene |
syn |
product |
| 16751 |
16903 |
- |
b4412 |
CDS |
hokC |
gef |
small toxic membrane polypeptide; component of addiction module |
| 213925 |
214125 |
- |
b4406 |
CDS |
yaeP |
|
conserved hypothetical protein |
| 607059 |
607211 |
+ |
b4415 |
CDS |
hokE |
|
small toxic membrane polypeptide; component of addiction module |
| 1268391 |
1268498 |
- |
b4419 |
CDS |
ldrA |
|
small toxic polypeptide; component of addiction module |
| 1268926 |
1269033 |
- |
b4421 |
CDS |
ldrB |
|
small toxic polypeptide; component of addiction module |
| 1269461 |
1269568 |
- |
b4423 |
CDS |
ldrC |
|
small toxic polypeptide; component of addiction module |
| 1489946 |
1490095 |
- |
b4428 |
CDS |
hokB |
ydcB |
small toxic membrane polypeptide; component of addiction module |
| 1702575 |
1702700 |
+ |
b4409 |
CDS |
blr |
|
beta-lactam resistance protein |
| 3697609 |
3697716 |
- |
b4453 |
CDS |
ldrD |
|
small toxic polypeptide; component of addiction module |
| 3718077 |
3718229 |
- |
b4455 |
CDS |
hokA |
yiaZ |
small toxic membrane polypeptide; component of addiction module |
| 4190215 |
4190415 |
- |
b4407 |
CDS |
thiS |
thiG1 |
sulfur carrier protein |
| 4373895 |
4374020 |
+ |
b4410 |
CDS |
ecnA |
|
entericidin A (antidote to entericidin B); component of addiction
module |
| 4374131 |
4374277 |
+ |
b4411 |
CDS |
ecnB |
yjeU |
bacteriolytic lipoprotein entericidin B; component of addiction
module |
These previously annotated genes have been deaccessioned:
| lend |
rend |
dir |
bnum |
type |
gene |
| 317526 |
317795 |
+ |
b0302 |
CDS |
|
| 338993 |
339313 |
+ |
b0322 |
CDS |
yahH |
| 348742 |
349188 |
- |
b0332 |
CDS |
|
| 410255 |
410497 |
+ |
b0395 |
CDS |
|
| 695581 |
695916 |
+ |
b0663 |
CDS |
|
| 695931 |
696068 |
+ |
b0667 |
CDS |
|
| 696065 |
696184 |
+ |
b0669 |
CDS |
|
| 696185 |
696337 |
+ |
b0671 |
CDS |
|
| 1096171 |
1096422 |
+ |
b1030 |
CDS |
|
| 1096190 |
1096603 |
+ |
b1031 |
CDS |
ycdV |
| 2509085 |
2509429 |
+ |
b2391 |
CDS |
|
| 3267304 |
3267468 |
+ |
b3122 |
CDS |
|
| 4019665 |
4020006 |
+ |
b3837 |
CDS |
|
| 4139437 |
4139766 |
+ |
b3948 |
CDS |
yijI |
| 4311389 |
4311796 |
- |
b4091 |
CDS |
phnQ |
back to top
RNA genes, including small regulatory RNAs
Although over 85% of the genome consists of protein-encoding genes, other
genes encode RNAs that function without being translated into proteins.
These "RNA genes" are often referred to as noncoding or non-coding RNAs
(ncRNA); other designations include small RNA (sRNA), non-messenger RNA
(nmRNA), small non-messenger RNA (snmRNA), functional RNA (fRNA), and
the generic miscellaneous RNA (misc_RNA) used in GenBank. The best-known
RNA genes encode transfer RNAs (tRNA) and ribosomal RNAs (rRNA), but since
the late 1990s many new noncoding RNAs have been found to play significant
roles in the cell.
New annotations of RNA genes
In addition to 22 rRNAs and 86 tRNAs, a handful of misc_RNAs were already
annotated in our GenBank entry. In consultation with Susan Gottesman,
Gisela Storz, and Karen Wassarman, we have begun an effort to add a number
of other RNA genes to the annotations initially in ASAP and eventually
in GenBank as well. The following table lists these RNA genes, including
their assigned b-numbers; it includes those previously annotated; updated
October 30, 2003.
| lend |
rend |
dir |
bnum |
name (synonyms) |
notes |
Gene Expression Profile* |
reference(s) |
| 16952 |
17006 |
+ |
b4413 |
sokC (sof) |
antisense RNA blocking mokC and hokC translation;
component of addiction module |
|
22 |
| 189712 |
189847 |
+ |
b4414 |
t44 |
identified in a large scale screen; function unknown |
b4414 |
03 |
| 475672 |
475785 |
+ |
b0455 |
ffs |
4.5S RNA, component of Signal Recognition Particle (SRP) with the
Ffh protein; involved in co-tranlational targeting of proteins to
membranes |
|
08 |
| 852175 |
852263 |
- |
b4416 |
rybA |
identified in a large scale screen; function unknown |
b4416 |
02 |
| 887199 |
887277 |
- |
b4417 |
rybB (p25) |
identified in a large scale screen; function unknown |
b4417 |
02 |
| 1145812 |
1145980 |
+ |
b4418 |
sraB (pke20) |
identified in a large scale screen; function unknown |
b4418 |
01 |
| 1268546 |
1268612 |
+ |
b4420 |
rdlA |
antisense RNA, trans-acting regulator of ldrA translation;
component of addiction module |
|
20 |
| 1269081 |
1269146 |
+ |
b4422 |
rdlB |
antisense RNA, trans-acting regulator of ldrB translation;
component of addiction module |
|
20 |
| 1269616 |
1269683 |
+ |
b4424 |
rdlC |
antisense RNA, trans-acting regulator of ldrC translation;
component of addiction module |
|
20 |
| 1286289 |
1286459 |
- |
b4425 |
rtT (rttR) (rtV1) |
released from primary tyrT transcript during tRNA processing;
encodes putative Tpr protein; the RNA itself may modulate the stringent
response |
b4425 |
21 |
| 1403676 |
1403833 |
- |
b4426 |
IS061 |
identified in a large scale screen; function unknown |
b4426 |
04 |
| 1435145 |
1435252 |
+ |
b4427 |
tke8 (IS063) |
identified in a large scale screen; function unknown |
b4427 |
04 |
| 1490143 |
1490195 |
+ |
b4429 |
sokB |
antisense RNA blocking mokB and hokB translation;
component of addiction module |
|
22 |
| 1647406 |
1647458 |
+ |
b1574 |
dicF |
DicF antisense RNA; inhibits ftsZ translation |
|
09 |
| 1762737 |
1762804 |
- |
b4430 |
rydB (tpe7) (IS082) |
identified in a large scale screen; function unknown |
|
02 |
| 1768396 |
1768500 |
+ |
b4431 |
rprA (IS083) |
positive regulatory RNA for RpoS translation |
|
01,07 |
| 1921090 |
1921338 |
+ |
b4432 |
ryeA (sraC) (tpke79) (IS091) |
identified in a large scale screen; function unknown |
b4432 |
01,02 |
| 1921188 |
1921308 |
- |
b4433 |
ryeB (tpke79) |
identified in a large scale screen; function unknown |
|
02 |
| 1985862 |
1986021 |
- |
b4434 |
IS092 |
identified in a large scale screen; function unknown |
b4434 |
04 |
| 2023249 |
2023335 |
- |
b1954 |
dsrA (IS095) |
regulatory RNA; regulates transcriptional silencing by H-NS protein,
and enhances translation of RpoS |
|
10 |
| 2069337 |
2069540 |
+ |
b4435 |
IS102 |
identified in a large scale screen; function unknown |
b4435 |
04 |
| 2151297 |
2151445 |
+ |
b4436 |
ryeC (tp11) (QUAD1a) |
identified in a large scale screen; function unknown |
|
02 |
| 2151632 |
2151774 |
+ |
b4437 |
ryeD (tpe60) (QUAD1b) |
identified in a large scale screen; function unknown |
|
02 |
| 2165134 |
2165219 |
+ |
b4438 |
ryeE |
identified in a large scale screen; function unknown |
b4438 |
02 |
| 2311104 |
2311196 |
+ |
b4439 |
micF (IS113) |
regulatory antisense RNA affecting ompF expression |
|
11 |
| 2651875 |
2652178 |
+ |
b4440 |
ryfA (tp1) (PAIR3) |
identified in a large scale screen; function unknown |
b4440 |
02,03 |
| 2689212 |
2689360 |
- |
b4441 |
tke1 |
identified in a large scale screen; function unknown |
b4441 |
03 |
| 2753614 |
2753976 |
+ |
b2621 |
ssrA (sipB) |
10Sa RNA; tmRNA, acts as both tRNA-Ala and mRNA template for tagging
proteins resulting from premature transcription termination |
|
12 |
| 2812822 |
2812897 |
+ |
b4442 |
sraD |
identified in a large scale screen; function unknown |
b4442 |
01 |
| 2922178 |
2922537 |
- |
b4408 |
csrB |
CsrA-binding RNA, antagonizes csrA regulation |
|
13 |
| 2940718 |
2940922 |
+ |
b4443 |
gcvB (IS145) |
small RNA gene divergent from gcvA; represses oppA, dppA,
gltI and livJ expression |
b4443 |
01,06 |
| 2974124 |
2974211 |
- |
b4444 |
rygA (sraE) (t59) (PAIR2) |
identified in a large scale screen; function unknown |
|
01,02 |
| 2974332 |
2974407 |
- |
b4445 |
rygB (t59) (PAIR2) |
identified in a large scale screen; function unknown |
|
02 |
| 3054003 |
3054185 |
+ |
b2911 |
ssrS (ssr) |
6S RNA; modulates promoter use |
|
14 |
| 3054835 |
3054985 |
+ |
b4446 |
rygC (t27) (QUAD1c) |
identified in a large scale screen; function unknown |
|
02 |
| 3192767 |
3192916 |
- |
b4447 |
rygD (tp8) (C0730) (QUAD1d) (IS156) |
identified in a large scale screen; function unknown |
|
03,05 |
| 3236015 |
3236203 |
+ |
b4448 |
sraF (tpk1) (IS160) |
identified in a large scale screen; function unknown |
b4448 |
01,03 |
| 3267857 |
3268233 |
- |
b3123 |
rnpB |
M1 RNA; RNA component of RNase P, involved in tRNA and 4.5S RNA
processing |
|
15 |
| 3308866 |
3309039 |
+ |
b4449 |
sraG (p3) |
identified in a large scale screen; function unknown |
b4449 |
01 |
| 3348218 |
3348325 |
+ |
b4450 |
ryhA (sraH) |
identified in a large scale screen; function unknown |
b4450 |
01,02 |
| 3578554 |
3578647 |
- |
b4451 |
ryhB (sraI) (IS176) |
regulatory RNA mediating Fur regulon response; Fur represses this
inhibitory RNA, relieving sdhABCD, sodB, ftnA, bfr, and fumA
from RyhB-mediated repression |
|
01,02,16 |
| 3662494 |
3662598 |
+ |
b4452 |
IS183 |
identified in a large scale screen; function unknown |
b4452 |
04 |
| 3697765 |
3697828 |
+ |
b4454 |
rdlD |
antisense RNA, trans-acting regulator of ldrD translation;
component of addiction module |
|
20 |
| 3984045 |
3984216 |
+ |
b4456 |
ryiA (sraJ) (k19) |
identified in a large scale screen; function unknown |
b4456 |
01,02 |
| 4047479 |
4047587 |
+ |
b3864 |
spf (IS197) |
Spot 42 RNA; antisense regulator of galK translation |
|
17 |
| 4048616 |
4048860 |
+ |
b4457 |
csrC (sraK) (ryiB) (tpk2) (IS198) |
CsrA-binding RNA, antagonizes csrA regulation |
b4457 |
01,02,03,19 |
| 4155864 |
4155973 |
- |
b4458 |
oxyS |
global regulatory RNA, induced in response to oxidative stress;
activates or represses expression of many genes |
b4458 |
18 |
| 4275506 |
4275645 |
- |
b4459 |
ryjA (sraL) |
identified in a large scale screen; function unknown |
|
01,02 |
*The gene expression profiles of selected sRNAs were based on data from
Affymetrix E. coli antisense genome arrays. The chip design file
was modified to fit the newest annotation and data were extracted with dchip
software. For more information, please contact ecoli@genome.wisc.edu
back to top
A note on nomenclature: b-numbers, y-names, and all
that
Beginning with our publication of the complete genome sequence (Blattner,
et al. 1997) we have assigned each gene (protein- or RNA-encoding)
a unique numeric identifier beginning with a "b" -- the so-called b-numbers
or Blattner numbers. These designations remain constant through further
updates, gene identifications, etc. It has come to our attention that
others have assigned b-numbers without consulting us; for example, yaeP
(b4406) has been designated B0189.1 in Swiss-Prot, and b4502 in the RefSeq
version of the genome sequence. In general, we will not track those designations,
just as we do not invent our own GenBank accession numbers, etc.
The provisional y-names for uncharacterized ORFs are based on a systematic
nomenclature described by Kenn Rudd (Rudd
1998). Briefly, the first three letters of a "y" name are based on
the map position of an ORF at the time the name was assigned, in a manner
analogous to the "z" naming system for transposon insertions. As with
b-numbers, the y-names are not reused if an ORF is given a new gene name
or if an ORF becomes defunct. According to the original scheme, once a
function was established for an E. coli gene the provisional y-name
would be abandoned and a new gene name chosen. Since the y-names are used
in the literature, ASAP retains them as synonyms when a gene is renamed.
The standard genetic nomenclature for E. coli is that of Demerec
et al. 1966, as subsequently amended through use, and as described
in Instructions to Authors for the Journal of Bacteriology. In order to
avoid chaos, we tend to defer to the E. coli Genetic Stock Center
(CGSC) database at Yale University
as the final authority on gene names.
back to top
|