Flow Logo

Pipelines

Ampliseq

Overview

Flow provides the nf-core/ampliseq v2.7.1 pipeline for analyzing 16S rRNA, 18S rRNA, and ITS amplicon sequencing data. It supports demultiplexing, quality control, taxonomic classification, and diversity analysis for microbiome studies.

The pipeline uses DADA2 or QIIME2 for amplicon sequence variant (ASV) calling and provides comprehensive diversity metrics and visualizations for ecological analysis.


Pipeline Summary

The workflow includes:

  1. Quality Control

    • Read quality assessment (FastQC)
    • Primer trimming (Cutadapt)
    • Quality filtering
  2. Denoising & ASV Calling

    • DADA2 or QIIME2 denoising
    • Chimera removal
    • ASV table generation
  3. Taxonomic Classification

    • Multiple classifier options:
      • DADA2 native classifier
      • QIIME2 feature-classifier
      • SINTAX
    • Reference database assignment
  4. Diversity Analysis

    • Alpha diversity metrics
    • Beta diversity analysis
    • Ordination (PCoA, NMDS)
    • Differential abundance testing
  5. Visualization

    • Interactive plots
    • Taxonomic bar plots
    • Diversity boxplots
    • Rarefaction curves

Input Requirements

Sequencing Data

  • Paired-end or single-end FASTQ files
  • Demultiplexed or multiplexed samples
  • Illumina, IonTorrent, or PacBio platforms

Metadata File

Tab-separated file with sample information:

sampleID	group	treatment	timepoint
sample1	control	none	T0
sample2	control	none	T0
sample3	treatment	antibiotics	T1

Primer Sequences

  • Forward primer sequence (required)
  • Reverse primer sequence (for paired-end)
  • Allow mismatches for degenerate primers

Key Parameters

Basic Settings

  • --input: Path to samplesheet
  • --metadata: Sample metadata file
  • --FW_primer: Forward primer sequence
  • --RV_primer: Reverse primer sequence

Amplicon Type

  • --amplicon_type: Target region
    • 16S: Bacterial 16S rRNA
    • 18S: Eukaryotic 18S rRNA
    • ITS: Fungal ITS region
    • custom: User-defined markers

Analysis Parameters

  • --trunclenf: Forward read truncation
  • --trunclenr: Reverse read truncation
  • --trunc_qmin: Quality truncation threshold
  • --max_ee: Maximum expected errors

Taxonomic Classification

  • --dada_ref_taxonomy: Reference database
    • silva: SILVA database
    • greengenes: Greengenes
    • unite: UNITE (for ITS)
    • pr2: PR2 (for protists)
  • --classifier: Classification method

Diversity Analysis

  • --metadata_category: Grouping variable
  • --min_samples: Minimum samples per group
  • --diversity_alpha_metrics: Alpha metrics to calculate
  • --diversity_beta_metrics: Beta metrics to calculate

Pipeline Outputs

ASV Data

  1. ASV Table

    • ASV_table.tsv: Abundance matrix
    • ASV_sequences.fasta: Representative sequences
    • ASV_tax.tsv: Taxonomic assignments
  2. Quality Reports

    • Read quality profiles
    • Denoising statistics
    • Chimera removal stats

Diversity Results

  1. Alpha Diversity

    • Shannon index
    • Simpson index
    • Observed ASVs
    • Chao1 richness
    • Statistical comparisons
  2. Beta Diversity

    • Distance matrices
    • PCoA coordinates
    • PERMANOVA results
    • Ordination plots

Visualizations

  1. Taxonomic Plots

    • Relative abundance bar charts
    • Krona plots
    • Heatmaps
  2. Diversity Plots

    • Alpha diversity boxplots
    • Beta diversity ordinations
    • Rarefaction curves
  3. Quality Control

    • MultiQC report
    • DADA2 QC plots
    • Read tracking table

Example Usage

Standard 16S V3-V4 Analysis

nextflow run nf-core/ampliseq \
  --input samplesheet.tsv \
  --amplicon_type 16S \
  --FW_primer CCTACGGGNGGCWGCAG \
  --RV_primer GACTACHVGGGTATCTAATCC \
  --metadata metadata.tsv \
  --trunclenf 250 \
  --trunclenr 200 \
  --outdir results \
  -profile docker

ITS2 Fungal Analysis

nextflow run nf-core/ampliseq \
  --input samplesheet.tsv \
  --amplicon_type ITS \
  --FW_primer GTGARTCATCGAATCTTTG \
  --RV_primer TCCTCCGCTTATTGATATGC \
  --its_partial true \
  --outdir results \
  -profile docker

Long-Read 16S (PacBio)

nextflow run nf-core/ampliseq \
  --input samplesheet.tsv \
  --pacbio true \
  --amplicon_type 16S \
  --max_len 1600 \
  --min_len 1000 \
  --outdir results \
  -profile docker

Custom Database Analysis

nextflow run nf-core/ampliseq \
  --input samplesheet.tsv \
  --FW_primer GTGCCAGCMGCCGCGGTAA \
  --RV_primer GGACTACHVGGGTWTCTAAT \
  --dada_ref_taxonomy silva_taxonomy.txt \
  --dada_ref_tax_levels "Kingdom,Phylum,Class,Order,Family,Genus,Species" \
  --outdir results \
  -profile docker

Tips and Best Practices

Sample Preparation

  1. Include negative controls
  2. Add mock communities for validation
  3. Randomize samples during sequencing
  4. Maintain consistent amplification conditions

Parameter Selection

  1. Set truncation based on quality profiles
  2. Use appropriate error rates for platform
  3. Choose reference database for environment
  4. Include all relevant metadata

Quality Control

  1. Check rarefaction curves for saturation
  2. Verify mock community composition
  3. Examine negative control contamination
  4. Review taxonomic assignments

Troubleshooting

Common Issues

Issue: Low ASV recovery or many reads filtered out

  • Solution: Check primer trimming success in cutadapt logs
  • Adjust quality filtering parameters (--trunclenf, --trunclenr)
  • Review expected amplicon length for your primers
  • Verify correct --amplicon_type selection

Issue: Poor taxonomic assignment

  • Solution: Update classifier database to latest version
  • Verify primer specificity for target organisms
  • Check if amplicon region has good database coverage
  • Try different classifier (--skip_dada2 to use QIIME2)

Issue: Diversity analysis failures

  • Solution: Ensure adequate sequencing depth (>10,000 reads/sample)
  • Check for batch effects in PCoA plots
  • Use appropriate normalization method for your data
  • Verify all samples are included in metadata file

Issue: Memory errors during DADA2

  • Solution: Reduce --max_cpus to limit parallelization
  • Process samples in smaller batches
  • Increase memory allocation with --max_memory
  • Consider --sample_inference pseudo for large datasets

Issue: Primer trimming failures

  • Solution: Check primer orientation (try reverse complement)
  • Allow more mismatches with --cutadapt_mismatches
  • Verify primer sequences including degenerate bases
  • Use --retain_untrimmed to diagnose issues

Additional Resources

Previous
Sarek (Variant Calling)