TBC

A clustering algorithm based on prokaryotic taxonomy

Jae Hak Lee, Hana Yi, Yoon Seong Jeon, Sungho Won, Jongsik Chun

Research output: Contribution to journalArticle

16 Citations (Scopus)

Abstract

High-throughput DNA sequencing technologies have revolutionized the study of microbial ecology. Massive sequencing of PCR amplicons of the 16S rRNA gene has been widely used to understand the microbial community structure of a variety of environmental samples. The resulting sequencing reads are clustered into operational taxonomic units that are then used to calculate various statistical indices that represent the degree of species diversity in a given sample. Several algorithms have been developed to perform this task, but they tend to produce different outcomes. Herein, we propose a novel sequence clustering algorithm, namely Taxonomy-Based Clustering (TBC). This algorithm incorporates the basic concept of prokaryotic taxonomy in which only comparisons to the type strain are made and used to form species while omitting full-scale multiple sequence alignment. The clustering quality of the proposed method was compared with those of MOTHUR, BLASTClust, ESPRIT-Tree, CD-HIT, and UCLUST. A comprehensive comparison using three different experimental datasets produced by pyrosequencing demonstrated that the clustering obtained using TBC is comparable to those obtained using MOTHUR and ESPRIT-Tree and is computationally efficient. The program was written in JAVA and is available from http://sw. ezbiocloud. net/tbc.

Original languageEnglish
Pages (from-to)181-185
Number of pages5
JournalJournal of Microbiology
Volume50
Issue number2
DOIs
Publication statusPublished - 2012 Apr 1
Externally publishedYes

Fingerprint

Cluster Analysis
High-Throughput Nucleotide Sequencing
Sequence Alignment
Ecology
rRNA Genes
Technology
Polymerase Chain Reaction

Keywords

  • BLASTClust
  • CD-HIT
  • clustering algorithm
  • ESPRIT-Tree
  • metagenome
  • MOTHUR
  • OTU
  • pyrosequencing
  • TBC
  • UCLUST

ASJC Scopus subject areas

  • Microbiology
  • Applied Microbiology and Biotechnology

Cite this

TBC : A clustering algorithm based on prokaryotic taxonomy. / Lee, Jae Hak; Yi, Hana; Jeon, Yoon Seong; Won, Sungho; Chun, Jongsik.

In: Journal of Microbiology, Vol. 50, No. 2, 01.04.2012, p. 181-185.

Research output: Contribution to journalArticle

Lee, Jae Hak ; Yi, Hana ; Jeon, Yoon Seong ; Won, Sungho ; Chun, Jongsik. / TBC : A clustering algorithm based on prokaryotic taxonomy. In: Journal of Microbiology. 2012 ; Vol. 50, No. 2. pp. 181-185.
@article{1a6f0518b4cc44799ce2d85b52b03b50,
title = "TBC: A clustering algorithm based on prokaryotic taxonomy",
abstract = "High-throughput DNA sequencing technologies have revolutionized the study of microbial ecology. Massive sequencing of PCR amplicons of the 16S rRNA gene has been widely used to understand the microbial community structure of a variety of environmental samples. The resulting sequencing reads are clustered into operational taxonomic units that are then used to calculate various statistical indices that represent the degree of species diversity in a given sample. Several algorithms have been developed to perform this task, but they tend to produce different outcomes. Herein, we propose a novel sequence clustering algorithm, namely Taxonomy-Based Clustering (TBC). This algorithm incorporates the basic concept of prokaryotic taxonomy in which only comparisons to the type strain are made and used to form species while omitting full-scale multiple sequence alignment. The clustering quality of the proposed method was compared with those of MOTHUR, BLASTClust, ESPRIT-Tree, CD-HIT, and UCLUST. A comprehensive comparison using three different experimental datasets produced by pyrosequencing demonstrated that the clustering obtained using TBC is comparable to those obtained using MOTHUR and ESPRIT-Tree and is computationally efficient. The program was written in JAVA and is available from http://sw. ezbiocloud. net/tbc.",
keywords = "BLASTClust, CD-HIT, clustering algorithm, ESPRIT-Tree, metagenome, MOTHUR, OTU, pyrosequencing, TBC, UCLUST",
author = "Lee, {Jae Hak} and Hana Yi and Jeon, {Yoon Seong} and Sungho Won and Jongsik Chun",
year = "2012",
month = "4",
day = "1",
doi = "10.1007/s12275-012-1214-6",
language = "English",
volume = "50",
pages = "181--185",
journal = "Journal of Microbiology",
issn = "1225-8873",
publisher = "Microbiological Society of Korea",
number = "2",

}

TY - JOUR

T1 - TBC

T2 - A clustering algorithm based on prokaryotic taxonomy

AU - Lee, Jae Hak

AU - Yi, Hana

AU - Jeon, Yoon Seong

AU - Won, Sungho

AU - Chun, Jongsik

PY - 2012/4/1

Y1 - 2012/4/1

N2 - High-throughput DNA sequencing technologies have revolutionized the study of microbial ecology. Massive sequencing of PCR amplicons of the 16S rRNA gene has been widely used to understand the microbial community structure of a variety of environmental samples. The resulting sequencing reads are clustered into operational taxonomic units that are then used to calculate various statistical indices that represent the degree of species diversity in a given sample. Several algorithms have been developed to perform this task, but they tend to produce different outcomes. Herein, we propose a novel sequence clustering algorithm, namely Taxonomy-Based Clustering (TBC). This algorithm incorporates the basic concept of prokaryotic taxonomy in which only comparisons to the type strain are made and used to form species while omitting full-scale multiple sequence alignment. The clustering quality of the proposed method was compared with those of MOTHUR, BLASTClust, ESPRIT-Tree, CD-HIT, and UCLUST. A comprehensive comparison using three different experimental datasets produced by pyrosequencing demonstrated that the clustering obtained using TBC is comparable to those obtained using MOTHUR and ESPRIT-Tree and is computationally efficient. The program was written in JAVA and is available from http://sw. ezbiocloud. net/tbc.

AB - High-throughput DNA sequencing technologies have revolutionized the study of microbial ecology. Massive sequencing of PCR amplicons of the 16S rRNA gene has been widely used to understand the microbial community structure of a variety of environmental samples. The resulting sequencing reads are clustered into operational taxonomic units that are then used to calculate various statistical indices that represent the degree of species diversity in a given sample. Several algorithms have been developed to perform this task, but they tend to produce different outcomes. Herein, we propose a novel sequence clustering algorithm, namely Taxonomy-Based Clustering (TBC). This algorithm incorporates the basic concept of prokaryotic taxonomy in which only comparisons to the type strain are made and used to form species while omitting full-scale multiple sequence alignment. The clustering quality of the proposed method was compared with those of MOTHUR, BLASTClust, ESPRIT-Tree, CD-HIT, and UCLUST. A comprehensive comparison using three different experimental datasets produced by pyrosequencing demonstrated that the clustering obtained using TBC is comparable to those obtained using MOTHUR and ESPRIT-Tree and is computationally efficient. The program was written in JAVA and is available from http://sw. ezbiocloud. net/tbc.

KW - BLASTClust

KW - CD-HIT

KW - clustering algorithm

KW - ESPRIT-Tree

KW - metagenome

KW - MOTHUR

KW - OTU

KW - pyrosequencing

KW - TBC

KW - UCLUST

UR - http://www.scopus.com/inward/record.url?scp=84860356705&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84860356705&partnerID=8YFLogxK

U2 - 10.1007/s12275-012-1214-6

DO - 10.1007/s12275-012-1214-6

M3 - Article

VL - 50

SP - 181

EP - 185

JO - Journal of Microbiology

JF - Journal of Microbiology

SN - 1225-8873

IS - 2

ER -