Published on Sep. 14, 2015
Operational Taxonomic Units (or OTUs) are useful approximations for taxonomic species in groups where classification is difficult. As such, OTU classifications based on DNA sequences are commonly used in metagenomics studies to describe sample diversity. Since there are no a priori definitions of what constitutes an OTU, a number of different methods have been applied for defining them. We analyze 20,229 16S rDNA subunit sequences to explore the nature of several OTU classification approaches. In order to do so, we first perform all possible pairwise comparisons with the Needleman-Wunsch alignment algorithm. We then constructed OTU clusters using several different sampling levels, clustering methods, and sequence identity thresholds, under two different representations of the network: tree-like and graph-like. We find that not only do these varied approaches give differing OTU definitions, but that the sequence data themselves give no reason to prefer a particular method or threshold.