E. coli K-12 sequence and annotations

Annotation Updates RNA Genes Nomenclature

Update released June 10, 2004: The Escherichia coli K-12 strain MG1655 sequence and annotations have been updated; see this announcement for further information. In addition, an Excel spreadsheet is available which summarizes the MG1655 update in terms of nucleotide sequence corrections and the consequent protein sequence changes.

Contribute to annotation updates!

A wealth of new information has become available recently for the annotation of Escherichia coli K-12 strain MG1655. Annotation of the genome is an ongoing task that benefits from the work of all end-users of the sequence. To this end, we have adopted the ASAP relational database as the venue for maintaining and updating the annotations, as well as enabling community input towards that goal. Please note that while you are invited to become a registered annotator and contribute to the information within ASAP, there is no requirement to register in order to view the current MG1655 annotations -- simply log on as a guest. Furthermore, the "Add a note for the curator" function allows even guest users to suggest additional annotation updates and corrections. Finally, we are working with other groups to correlate and reconcile the various lists and databases containing E. coli genomic information (see, for example, the sites listed at the E. coli Database Portal).

Some annotations have already been updated within ASAP, including a number of revised gene boundaries, gene names and known or predicted gene products. In addition, several new genes have been added to the annotations, and perhaps inevitably, several previously annotated genes have been deaccessioned. Some of these changes have been previously reported as personal communications (see Serres, et al. 2001). While you are directed to ASAP for the current annotations, we will provide summary information on this page from time to time; updated December 8, 2003.

   The following genes have been added to the annotations (also see RNA genes, below):

lend rend dir bnum type gene syn product
16751 16903 - b4412 CDS hokC gef small toxic membrane polypeptide; component of addiction module
213925 214125 - b4406 CDS yaeP   conserved hypothetical protein
607059 607211 + b4415 CDS hokE   small toxic membrane polypeptide; component of addiction module
1268391 1268498 - b4419 CDS ldrA   small toxic polypeptide; component of addiction module
1268926 1269033 - b4421 CDS ldrB   small toxic polypeptide; component of addiction module
1269461 1269568 - b4423 CDS ldrC   small toxic polypeptide; component of addiction module
1489946 1490095 - b4428 CDS hokB ydcB small toxic membrane polypeptide; component of addiction module
1702575 1702700 + b4409 CDS blr   beta-lactam resistance protein
3697609 3697716 - b4453 CDS ldrD   small toxic polypeptide; component of addiction module
3718077 3718229 - b4455 CDS hokA yiaZ small toxic membrane polypeptide; component of addiction module
4190215 4190415 - b4407 CDS thiS thiG1 sulfur carrier protein
4373895 4374020 + b4410 CDS ecnA   entericidin A (antidote to entericidin B); component of addiction module
4374131 4374277 + b4411 CDS ecnB yjeU bacteriolytic lipoprotein entericidin B; component of addiction module

   These previously annotated genes have been deaccessioned:

lend rend dir bnum type gene
317526 317795 + b0302 CDS  
338993 339313 + b0322 CDS yahH
348742 349188 - b0332 CDS  
410255 410497 + b0395 CDS  
695581 695916 + b0663 CDS  
695931 696068 + b0667 CDS  
696065 696184 + b0669 CDS  
696185 696337 + b0671 CDS  
1096171 1096422 + b1030 CDS  
1096190 1096603 + b1031 CDS ycdV
2509085 2509429 + b2391 CDS  
3267304 3267468 + b3122 CDS  
4019665 4020006 + b3837 CDS  
4139437 4139766 + b3948 CDS yijI
4311389 4311796 - b4091 CDS phnQ

back to top

RNA genes, including small regulatory RNAs

Although over 85% of the genome consists of protein-encoding genes, other genes encode RNAs that function without being translated into proteins. These "RNA genes" are often referred to as noncoding or non-coding RNAs (ncRNA); other designations include small RNA (sRNA), non-messenger RNA (nmRNA), small non-messenger RNA (snmRNA), functional RNA (fRNA), and the generic miscellaneous RNA (misc_RNA) used in GenBank. The best-known RNA genes encode transfer RNAs (tRNA) and ribosomal RNAs (rRNA), but since the late 1990s many new noncoding RNAs have been found to play significant roles in the cell.

New annotations of RNA genes

In addition to 22 rRNAs and 86 tRNAs, a handful of misc_RNAs were already annotated in our GenBank entry. In consultation with Susan Gottesman, Gisela Storz, and Karen Wassarman, we have begun an effort to add a number of other RNA genes to the annotations – initially in ASAP and eventually in GenBank as well. The following table lists these RNA genes, including their assigned b-numbers; it includes those previously annotated; updated October 30, 2003.

lend rend dir bnum name (synonyms) notes Gene Expression Profile* reference(s)
16952 17006 + b4413 sokC (sof) antisense RNA blocking mokC and hokC translation; component of addiction module   22
189712 189847 + b4414 t44 identified in a large scale screen; function unknown b4414 03
475672 475785 + b0455 ffs 4.5S RNA, component of Signal Recognition Particle (SRP) with the Ffh protein; involved in co-tranlational targeting of proteins to membranes   08
852175 852263 - b4416 rybA identified in a large scale screen; function unknown b4416 02
887199 887277 - b4417 rybB (p25) identified in a large scale screen; function unknown b4417 02
1145812 1145980 + b4418 sraB (pke20) identified in a large scale screen; function unknown b4418 01
1268546 1268612 + b4420 rdlA antisense RNA, trans-acting regulator of ldrA translation; component of addiction module   20
1269081 1269146 + b4422 rdlB antisense RNA, trans-acting regulator of ldrB translation; component of addiction module   20
1269616 1269683 + b4424 rdlC antisense RNA, trans-acting regulator of ldrC translation; component of addiction module   20
1286289 1286459 - b4425 rtT (rttR) (rtV1) released from primary tyrT transcript during tRNA processing; encodes putative Tpr protein; the RNA itself may modulate the stringent response b4425 21
1403676 1403833 - b4426 IS061 identified in a large scale screen; function unknown b4426 04
1435145 1435252 + b4427 tke8 (IS063) identified in a large scale screen; function unknown b4427 04
1490143 1490195 + b4429 sokB antisense RNA blocking mokB and hokB translation; component of addiction module   22
1647406 1647458 + b1574 dicF DicF antisense RNA; inhibits ftsZ translation   09
1762737 1762804 - b4430 rydB (tpe7) (IS082) identified in a large scale screen; function unknown   02
1768396 1768500 + b4431 rprA (IS083) positive regulatory RNA for RpoS translation   01,07
1921090 1921338 + b4432 ryeA (sraC) (tpke79) (IS091) identified in a large scale screen; function unknown b4432 01,02
1921188 1921308 - b4433 ryeB (tpke79) identified in a large scale screen; function unknown   02
1985862 1986021 - b4434 IS092 identified in a large scale screen; function unknown b4434 04
2023249 2023335 - b1954 dsrA (IS095) regulatory RNA; regulates transcriptional silencing by H-NS protein, and enhances translation of RpoS   10
2069337 2069540 + b4435 IS102 identified in a large scale screen; function unknown b4435 04
2151297 2151445 + b4436 ryeC (tp11) (QUAD1a) identified in a large scale screen; function unknown   02
2151632 2151774 + b4437 ryeD (tpe60) (QUAD1b) identified in a large scale screen; function unknown   02
2165134 2165219 + b4438 ryeE identified in a large scale screen; function unknown b4438 02
2311104 2311196 + b4439 micF (IS113) regulatory antisense RNA affecting ompF expression   11
2651875 2652178 + b4440 ryfA (tp1) (PAIR3) identified in a large scale screen; function unknown b4440 02,03
2689212 2689360 - b4441 tke1 identified in a large scale screen; function unknown b4441 03
2753614 2753976 + b2621 ssrA (sipB) 10Sa RNA; tmRNA, acts as both tRNA-Ala and mRNA template for tagging proteins resulting from premature transcription termination   12
2812822 2812897 + b4442 sraD identified in a large scale screen; function unknown b4442 01
2922178 2922537 - b4408 csrB CsrA-binding RNA, antagonizes csrA regulation   13
2940718 2940922 + b4443 gcvB (IS145) small RNA gene divergent from gcvA; represses oppA, dppA, gltI and livJ expression b4443 01,06
2974124 2974211 - b4444 rygA (sraE) (t59) (PAIR2) identified in a large scale screen; function unknown   01,02
2974332 2974407 - b4445 rygB (t59) (PAIR2) identified in a large scale screen; function unknown   02
3054003 3054185 + b2911 ssrS (ssr) 6S RNA; modulates promoter use   14
3054835 3054985 + b4446 rygC (t27) (QUAD1c) identified in a large scale screen; function unknown   02
3192767 3192916 - b4447 rygD (tp8) (C0730) (QUAD1d) (IS156) identified in a large scale screen; function unknown   03,05
3236015 3236203 + b4448 sraF (tpk1) (IS160) identified in a large scale screen; function unknown b4448 01,03
3267857 3268233 - b3123 rnpB M1 RNA; RNA component of RNase P, involved in tRNA and 4.5S RNA processing   15
3308866 3309039 + b4449 sraG (p3) identified in a large scale screen; function unknown b4449 01
3348218 3348325 + b4450 ryhA (sraH) identified in a large scale screen; function unknown b4450 01,02
3578554 3578647 - b4451 ryhB (sraI) (IS176) regulatory RNA mediating Fur regulon response; Fur represses this inhibitory RNA, relieving sdhABCD, sodB, ftnA, bfr, and fumA from RyhB-mediated repression   01,02,16
3662494 3662598 + b4452 IS183 identified in a large scale screen; function unknown b4452 04
3697765 3697828 + b4454 rdlD antisense RNA, trans-acting regulator of ldrD translation; component of addiction module   20
3984045 3984216 + b4456 ryiA (sraJ) (k19) identified in a large scale screen; function unknown b4456 01,02
4047479 4047587 + b3864 spf (IS197) Spot 42 RNA; antisense regulator of galK translation   17
4048616 4048860 + b4457 csrC (sraK) (ryiB) (tpk2) (IS198) CsrA-binding RNA, antagonizes csrA regulation b4457 01,02,03,19
4155864 4155973 - b4458 oxyS global regulatory RNA, induced in response to oxidative stress; activates or represses expression of many genes b4458 18
4275506 4275645 - b4459 ryjA (sraL) identified in a large scale screen; function unknown   01,02

*The gene expression profiles of selected sRNAs were based on data from Affymetrix E. coli antisense genome arrays. The chip design file was modified to fit the newest annotation and data were extracted with dchip software.

back to top

A note on nomenclature: b-numbers, y-names, and all that

Beginning with our publication of the complete genome sequence (Blattner, et al. 1997) we have assigned each gene (protein- or RNA-encoding) a unique numeric identifier beginning with a "b" -- the so-called b-numbers or Blattner numbers. These designations remain constant through further updates, gene identifications, etc. It has come to our attention that others have assigned b-numbers without consulting us; for example, yaeP (b4406) has been designated B0189.1 in Swiss-Prot, and b4502 in the RefSeq version of the genome sequence. In general, we will not track those designations, just as we do not invent our own GenBank accession numbers, etc.

The provisional y-names for uncharacterized ORFs are based on a systematic nomenclature described by Kenn Rudd (Rudd 1998). Briefly, the first three letters of a "y" name are based on the map position of an ORF at the time the name was assigned, in a manner analogous to the "z" naming system for transposon insertions. As with b-numbers, the y-names are not reused if an ORF is given a new gene name or if an ORF becomes defunct. According to the original scheme, once a function was established for an E. coli gene the provisional y-name would be abandoned and a new gene name chosen. Since the y-names are used in the literature, ASAP retains them as synonyms when a gene is renamed.

The standard genetic nomenclature for E. coli is that of Demerec et al. 1966, as subsequently amended through use, and as described in Instructions to Authors for the Journal of Bacteriology. In order to avoid chaos, we tend to defer to the E. coli Genetic Stock Center (CGSC) database at Yale University as the final authority on gene names.

back to top

 


© 2002-2014 UW E. coli Genome Project