OUP user menu

Evaluation of a Selection Strategy Before Use of 16S rRNA Gene Sequencing for the Identification of Clinically Significant Gram-Negative Rods and Coccobacilli

Steven D. Mahlen PhD, Jill E. Clarridge III PhD
DOI: http://dx.doi.org/10.1309/AJCP61CGNXCXVSPR 381-388 First published online: 1 September 2011


Although 16S ribosomal RNA (rRNA) gene sequencing is well established for correctly identifying bacteria, its most efficient use in a routine clinical laboratory is not clear. We devised and evaluated a strategy to select gram-negative rods and coccobacilli (GNRCB) for which sequencing might be necessary before routine identification methods had been exhausted. The prospectively applied selection criteria were primarily based on the isolate’s display of unusual or discordant phenotypic results and/or disease correlation. By using this strategy, we selected a total of 120 GNRCB (representing only ∼2% of all identified). The strategy was demonstrated to be efficient because the preliminary phenotypic identification for 79.2% of those isolates needed revision (18.2% were novel and about a third would have required further extensive testing). The knowledge that 1.6% (ie, 79% of 2%) of isolated GNRCB might benefit from sequence identification could provide guidelines for routine clinical laboratories toward efficient use of sequence analysis.

Key Words:
  • 16S rRNA gene sequencing
  • Gram-negative rod
  • Bacteria selection for sequence identification

There have been many studies and reviews on the usefulness of using 16S ribosomal RNA (rRNA) gene sequencing for bacterial identification in the clinical microbiology laboratory.17 Most studies have focused on the accuracy of sequencing vs specific phenotypic methods such as the Vitek 2 system (bioMérieux, Durham, NC) and manual identification or kit systems, such as the API 20 NE strip (bioMérieux).1,57 Other studies have focused on the usefulness of sequencing slow-growing organisms for which biochemical testing and phenotypic identification systems may not be valid.2,3,8 The literature most commonly documents the experiences of a reference laboratory that analyzes submitted isolates. However, published guidance for routine clinical microbiology laboratories in determining which organisms to select for sequencing is often not given or was derived after the study and not tested prospectively.1

The design of this study was to prospectively evaluate a strategy to select isolates at the time the initial and often incomplete phenotypic results were available. If the strategy is successful, the isolates would subsequently prove they would have been difficult and/or costly to identify by routine methods. Obviously, they would have been impossible to identify if they were novel and no prior description was available. Overall, our analysis was designed to give a quantitative measure as to the need for sequence identification in a routine clinical laboratory and provide guidance as to how to select organisms for such analysis in a timely manner.

Materials and Methods

Bacterial Strains and Strain Selection

During an 18-month period at the Veterans Affairs Puget Sound Health Care System, Seattle, WA, we prospectively applied selection criteria Table 1 aimed at performing 16S rRNA gene sequence analysis on only the isolates that actually needed this identification method. Selection of organisms was based on the isolate’s display of discordant or unusual morphologic or biochemical results, discordant disease correlation, and/or anticipation of failure of phenotypic testing from other studies. In addition, we selected only isolates that were deemed to be potential pathogens on the basis that they occurred as predominant organisms in superficial wound or respiratory specimens in the presence of appropriate inflammatory cells, as single isolates at 104 or 105 colony-forming units per milliliter from urine specimens, or as numerically significant isolates from sterile sites. The director or a designee trained in the selection criteria was responsible for all decisions of whether to sequence a given isolate. During this period, we selected 68 gram-negative rod and coccobacillus isolates for sequence analysis from about 3,500 total gram-negative rods and coccobacilli that were phenotypically identified. Thus, we chose to sequence only about 2% of all gram-negative rods and coccobacilli during this period.

In addition, we studied 52 isolates for which the selection was the same but the organisms were first stocked and sequenced later. Only the first group was used in the time studies. Thus, 120 isolates met the criteria used to select clinical isolates for sequence analysis.

Phenotypic Identification

Isolates were phenotypically identified by using an appropriate combination of light microscopy, biochemical testing, and a Vitek 2 GN card (bioMérieux), a Vitek 2 NH card (bioMérieux), an API NH strip (bioMérieux), an API 20 NE strip (bioMérieux), and/or a RapID ANA II strip (Remel, Lenexa, KS).

16S rRNA Gene Sequencing and Strain Identification

Nucleic acid extraction and 16S rRNA polymerase chain reaction (PCR) were performed as previously described.9 In some cases, nucleic acid extraction and 16S rRNA gene sequencing were performed the same day as strains were selected; in other cases, extraction and sequencing were performed the next day. PCR amplicons were taken to a core facility for cycle sequencing. Raw electropherograms obtained from the core facility were analyzed by the microbiology laboratory director or designee by using the MicroSeq program (Applied Biosystems, Foster City, CA). The resulting 16S rDNA sequences averaged about 490 bases and were compared with several databases, including a database consisting of in-house strains, the MicroSeq database, and public databases such as the National Center for Biotechnology Information (NCBI) and Bio Informatic Bacteria Identification (BIBI) to determine strain relatedness as needed. Sequencing results were correlated with phenotypic data for all strains. Last, the director or designee signed off on all sequencing results and the final identity was added to the culture result in the patient record, if appropriate. A strain was considered novel if there was more than 1% difference in the sequence from that of any known organisms.10

Time of Selection

We usually selected strains during morning plate rounds about 24 to 48 hours after specimens were originally plated and after preliminary phenotypic testing results were available. The sequencing procedures were begun that day or the next working day.

View this table:
Table 1

Discrepant Result Analysis

The sequence identification was compared with preliminary phenotypic identification with documentation of identification method and reason for sequencing. A correct identification occurs when the sequence result is clearly differentiated from all other strains. Strains for which this did not occur (in the Raoultella group) were noted: sequencing a site other than the 16S rRNA gene is recommended for these organisms.8 Assessing the difficulty of correct identification by phenotypic means within our laboratory was based on whether an identification might have required, for example, just rapid spot tests or a single kit test (≤2 days) or tests that were not easily available, such as a particular sugar fermentation or extensive incubation (>2 days). Poorly described or novel organisms are discussed subsequently.


The overall results are shown in Table 2. Of the 120 selected gram-negative rod or coccobacillus isolates, there were 25 isolates (20.8%) for which the sequence identification and the initial phenotypic identification were the same Table 3 and 95 isolates (79.2%) for which the sequence identification and the initial phenotypic identification were different Table 4. Because for 25 isolates the phenotypic and genotypic identifications agreed, one might think the selection criteria were misapplied. However, on examination, most of the reasons we chose to sequence each isolate listed in Table 3 seem valid. Nine of these were Enterobacteriaceae. We selected 2 (Citrobacter amalonaticus and Yersinia enterocolitica) because they were unusual isolates from the particular site and 6 because the genera (Raoultella and Enterobacter) have had a confusing taxonomic history with poor separation from other genera, leading to our concern that phenotypic databases would be inadequate. Two isolates had unusual colony morphologic features.

View this table:
Table 2
View this table:
Table 3

Most of the rest of the isolates in Table 3 were selected because they were an unusual organism for a particular anatomic site. The Haemophilus influenzae recovered from the blood is an exception; however, the colony morphologic features and the clinical manifestation (aortic aneurysm) of this isolate were unusual (Table 3). This selection was interesting in that the genotype identification showed it was H influenzae type f, which differs by more than 1% in sequence from H influenzae type b and others of the H influenzae group. H influenzae type f is difficult to distinguish phenotypically from other H influenzae serotypes, but it may be clinically important to do so.11,12 Thus, this isolate might have been listed in Table 3 or Table 4.

There were 95 isolates (79.2%) for which we correctly surmised that the sequence identification would be a useful or necessary augmentation to the phenotypic identification (Table 4). For 22 of these (18.2% of all sequenced isolates), even extensive phenotypic testing would not have been able to suggest a valid name because 17 strains were truly novel isolates; 1 strain matched the sequence of the type strain of Haemophilus quentini, but that name is not yet validly published or in databases; and 4 strains were near the sequence of the taxonomically confusing Centers for Disease Control and Prevention (CDC) Enteric Group 53 in the MicroSeq database (Table 4). Thus, for this 18.2%, additional phenotypic testing would have used resources without achieving an identification.

Also shown in Table 4 are 36 isolates (30.0% of all sequenced isolates) for which a correct phenotypic description exists but is not in most databases. Identification might have been achieved with more extensive and time-consuming testing, but with our preliminary testing methods, the phenotypic identification was wrong. For example, the Pasteurella species that are usually encountered in animals (Pasteurella dagmatis and Pasteurella canis) and the more rodlike Neisseria species (Neisseria weaveri and Neisseria canis) are not well identified. We also noted particular problems in the Pseudomonas genus: phenotypic identifications of the non–Pseudomonas aeruginosa species are often contradicted by sequence. Some species, like Pasteurella bettyae and Campylobacter showae, are so rarely isolated that there are few strains available for testing, and, thus, the taxon is absent from databases.

For others in the group, it was simple to prove that the phenotypic identification was wrong (urease test on the Bordetella bronchiseptica isolates was negative) but difficult to obtain a correct identification. The most common overall error with was that the phenotypic identification was too vague, with identification only to the genus level, identification as a non–lactose fermenter, or with equally probable identification to more than 1 species.

The time to an identification using sequencing will vary with availability of personnel and instruments. In the best circumstances in our laboratory, the isolate was selected for sequencing in the morning, the PCR was done in the afternoon, and the amplicons were sent to the core laboratory. The electrophoretograms and the rough contigs were electronically sent by about 3:00 pm the next day. For some isolates, the unedited contig was good enough so that a search through the BIBI or NCBI database could yield an excellent identification without editing within 28 hours after selection. If the isolate was brought to our attention and selected in the afternoon or if the sequence required extensive editing or use of internal or multiple databases, the result would be ready by 48 hours after selection. Thus, an identification could be generated in 28 to 48 hours (about half met this time frame) after the first phenotypic results were available.


The sequence of the 16S rRNA gene is frequently used to study bacterial taxonomy and to identify slow-growing or phenotypically challenging bacteria.3,10,1315 However, 16S rRNA gene sequencing can be slower and more costly than phenotypic methods and does not always provide species-level identification.10,16 Thus, there is a rationale for selecting which strains to submit for 16S rRNA gene sequence analysis.

Previous studies convincingly demonstrate the value of 16S rRNA gene sequencing to generate an accurate identification and, further, to distinguish novel strains.1,2,510 However, there has been little evaluation of the process by which the original laboratory selects organisms to analyze by 16S rRNA gene sequencing other than after failure of identification by routine methods. To streamline and make this process more efficient, we developed criteria to help make timely decisions on whether to perform sequencing as part of the routine workup of isolates. We found the decisions were on target because about 79% of the presumptive identifications of the selected strains indeed needed revision; some isolates could not be identified for the valid reasons that they were really novel species (18.2%) or were so rare, poorly described, or incorrect in databases that they would have required extensive identification methods not readily available in the clinical laboratory.

In retrospect, because the approximately 21% of isolates for which the sequence identification and the initial phenotypic identification were the same and the approximately 18% of isolates (n = 21) for which we could have achieved a timely identification by phenotypic methods and sequencing offered little benefit, we considered how we could change our selection criteria. We reminded ourselves that Enterobacteriaceae of probable biotypes, even with unusual colony morphologic features, are identified well by routine methods. And we will consider more carefully before subjecting to sequence analysis any isolates that seem to be Eikenella corrodens.

Adverse effects of misidentification of pathogens are self-evident. However, another type of confusion for physicians is when an identical strain isolated at different times from the same patient is given different names. For example, a patient who had been performing self-catheterization for months had several positive urine cultures. Our initial identifications of these isolates were different from each other; on one occasion the isolate was identified as Enterobacter species, and on another, it was identified as Klebsiella species. When we sequenced these isolates, we found that both were identical and in a genogroup called CDC Enteric Group 53. Thus, it was shown that the patient was becoming reinfected with the same organism or that the original infection had never cleared and it was not a new infection each time.

Many of our straightforward incorrect identifications could have been corrected by more extensive testing. For example, sequencing identified an isolate as a biochemical and genetic variant Acinetobacter species, whereas the Vitek 2 identified it as a Bordetella bronchiseptica (Table 4). Although a negative urease test result led us to doubt the initial identification, the actual identification of the organism (it is novel, has not been described, and is not in databases) would not have been possible by phenotypic means.

We devised criteria for the more efficient use of 16S rRNA gene sequence analysis for bacterial identification in a routine clinical microbiology laboratory: use of a knowledge-based strategy to select gram-negative isolates for which identification by 16S rRNA sequencing would be necessary. By using our rigorous criteria, we selected only about 2% of all gram-negative rods and coccobacilli to sequence during this study. Thus, because about 79% of the selected isolates profited by sequence identification, a rough quantitative estimation of the possible need for sequence identification in a routine clinical laboratory is about 1.6% (ie, 79% of 2%). This made 16S rRNA gene sequencing a more efficient (more timely and focused and less costly) method for determining the identity of selected gram-negative rods. This up-front decision making strategy can be integrated into many different sequencing flow schemes, and, thus, we hope that it will find use in other laboratories. Similar studies for gram-positive organisms are in progress.


Upon completion of this activity you will be able to:

  • list the advantages of using selection criteria for selection of isolates for sequence analysis.

  • describe the advantages and disadvantages to performing 16S rRNA gene sequencing in the clinical microbiology laboratory.

  • describe the gram-negative rods that rarely need sequencing for proper identification.

The ASCP is accredited by the Accreditation Council for Continuing Medical Education to provide continuing medical education for physicians. The ASCP designates this journal-based CME activity for a maximum of 1 AMA PRA Category 1 Credit ™ per article. Physicians should claim only the credit commensurate with the extent of their participation in the activity. This activity qualifies as an American Board of Pathology Maintenance of Certification Part II Self-Assessment Module.

The authors of this article and the planning committee members and staff have no relevant financial relationships with commercial interests to disclose.

Questions appear on p 478. Exam is located at www.ascp.org/ajcpcme.


We thank the people of the Seattle VA Microbiology Laboratory for their excellent work.


  • * Dr Mahlen is currently with the Department of Pathology and Area Laboratory Services, Madigan Army Medical Center, Tacoma, WA.

  • This material is the result of work supported by resources from the VA Puget Sound Health Care System.

  • Disclaimer: The views expressed in this article are those of the authors and do not reflect the official policy or position of the Department of the Army, Department of Defense, or the US Government.


  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
View Abstract