Theoretical and sequenced composition of the Zymo DNA standard
Zymo Research have a nice DNA standard for microbiomics, which can be used as a control in 16S sequencing experiments. I wanted to know how good the sequenced and theoretical composition correlate. Here is data from a recent MiSeq run (at the NCCT, where I work now), with the ZymoBIOMICS DNA standard used as a control.
Pretty good, right?
Next, I will post data from a metagenomics run with this standard mix.
Code
For reproducibility, the code below was used. In short, the fastq reads were mapped (minimap2
) against a database (available here) containing the 16S rDNA genes of the 8 organisms in the standard. The resulting PAF
file was used directly to count the number of reads mapping to each taxon (keeping only alignments > 250 bp, the read length was 300 bp). The theoretical composition of the Zymo DNA standard was taken from the manual.
seqtk seq -l 80 ssdb.fas > ssdb1.fas # prepare the fasta file so that it can be read by minimap2
minimap2 -x sr ssdb1.fas \
ZymoControl_S86_R1_001.fastq.gz \
ZymoControl_S86_R2_001.fastq.gz > zymo.paf # run the mapping
# in R:
library(data.table)
library(tidyverse)
df <- fread("zymo.paf") %>%
filter(V10 > 250, V12 == 0) %>% # take only good alignments
mutate(species = str_extract(V6, "[A-z]+_[a-z]+")) # clean species name
miseq <- df %>%
group_by(species) %>%
summarise(n = n(), miseq_percent = n/nrow(.)*100)
zymo <- fread("Zymo_control_table.txt") %>% mutate(species = str_c(V1, V2, sep = "_")) # read Zymo composition data
final <- zymo %>% # make final df
select(species, zymo_percent = V4) %>%
left_join(miseq, by = "species")
# ...plot
final %>%
#arrange(desc(percent)) %>% #bug!! if I do this, the values get mixed up!! the factor trick is also not working
apex(mapping = aes(source, round(percent,2), fill = species), type = "bar",
width = 800, height = 300) %>%
ax_chart(stacked = TRUE, stackType = "normal") %>%
ax_legend(horizontalAlign = "left", position = "right",
itemMargin = list(horizontal =5, vertical = 0)) %>%
ax_colors(scales::brewer_pal(palette = "Set2")(8)) %>%
ax_dataLabels(enabled = TRUE) %>%
ax_yaxis(max = 100) %>%
ax_labs(title = "Theoretical (zymo) and measured (MiSeq) composition of the Zymo standard",
subtitle = "MiSeq run (2 x 300) of amplified (20 PCR cycles) V3-V4 region, 359 191 read pairs")
Disclaimer
I am not affiliated with Zymo Research in any way.