Software for bacterial identification using 16S rRNA primer sequence data

Gisela Gonzalez1, Giri Narasimhan1, Melissa Doud2, Kalai Mathee3
1Bioinformatics Research Group (BioRG), School of Computing and Information Sciences, Florida International University, Miami, FL
2 Department of Biological Sciences, College of Arts and Sciences, Florida International University
3Department of Molecular Microbiology and Infectious Diseases, College of Medicine, Florida International University

Home |  Software Download | 


Most microorganisms cannot be cultured and are difficult to identify. One of the most widely used markers to help identify bacteria is the ribosomal RNA gene. A component of the small ribosomal subunit, 16S rRNA is composed of alternating evolutionarily conserved and hypervariable regions. One strategy is to exploit the length heterogeneity in the highly variable regions and to use it for microbial identification. Techniques based on the polymerase chain reaction (PCR) are ideally suited for studying length heterogeneity of the 16S rRNA hypervariable regions. PCR primers were designed using the conserved regions (V1 and V1_V2) of the gene and the lengths of the resulting amplicons were estimated in the laboratory. However, there are thousands of microbial organisms that display the exact same amplicon length for a given pair of primers. If two or more pairs of primers are used and the amplicon lengths estimated using PCR, then there is a much better chance of correctly identifying the bacterial organisms present in a sample. AmpliQué is a BioPerl program that addresses this problem. AmpliQué receives as input two pairs of primers. It uses the 16S rRNA sequence database from the Ribosomal Database Project (URL: to report the microbial organisms that would result in those amplicon lengths. Sputum samples from Cystic Fibrosis patients were obtained from a local hospital and PCR was performed with primers for the V1 and V1_V2 regions. AmpliQué was then used to predict the putative bacteria present in the samples based on the observed lengths of the PCR amplicons. Each set of primers was run independently against the 16S rRNA database using BLAST. Results from the BLAST hits were then merged into a single table. Some of the bacterial genera predicted to be present in the sample by AmpliQué were Acidobacteria, Bacillus, Clostridium, Haemophilus, Lactobacillus, Mycoplasmataceae, Pseudomonas, Ralstonia, Salmonella, Shewanella, Staphylococcus, Streptococcus, Vibrio, and Xanthomonas. Our results suggest that AmpliQué is a helpful tool that assists the biologist in identifying the microbes present in unknown samples.


This work has been supported by NIGMS-RISE (R25 GM61347; GG).

The work of GN was supported by an NIH/NIGMS SCORE grant S06 GM008205


Contact Information:
Giri Narasimhan
Gisela Gonzalez
Bioinformatics Research Group (BioRG)
Florida International University
11200 SW 8th Street, Room ECS254
Modesto Maidique Campus, Miami, FL 33199


Date of Last Update: October 26, 2009.