Flow Logo

Pipelines

Nanoseq

Overview

Flow provides the nf-core/nanoseq v3.1.0 pipeline for processing and analyzing Oxford Nanopore Technologies (ONT) long-read sequencing data. It supports both DNA and RNA sequencing data, including direct RNA sequencing (dRNA-seq), and provides comprehensive quality control, alignment, and downstream analysis.

Nanoseq handles various ONT data types including genomic DNA sequencing, cDNA sequencing, and direct RNA sequencing, with built-in support for both basecalled and raw FAST5 data.


Pipeline Summary

The pipeline performs these key steps:

  1. Basecalling (Optional)

    • Guppy basecaller for raw FAST5 files
    • Demultiplexing of barcoded samples
    • Modified base detection (5mC, 6mA)
  2. Quality Control

    • NanoPlot for read statistics
    • PycoQC for sequencing run metrics
    • FastQC adapted for long reads
  3. Read Processing

    • Adapter trimming (Porechop)
    • Quality filtering
    • Length filtering
  4. Alignment

    • minimap2 for genome alignment
    • Optional transcriptome alignment
    • SAMtools for BAM processing
  5. Downstream Analysis

    • Transcript identification (DNA/cDNA)
    • Differential expression (RNA)
    • Variant calling (DNA)
    • Methylation analysis
  6. Visualization

    • MultiQC report
    • Coverage plots
    • Transcript abundance

Input Requirements

Sequencing Data

  • FASTQ files (basecalled)
  • FAST5 files (raw signal)
  • Multi-FAST5 format supported
  • Minimum read length: 200bp
  • Recommended coverage: 30X (genome), 10M reads (transcriptome)

Sample Sheet Format

group,replicate,barcode,input_file
control,1,barcode01,sample1.fastq.gz
control,2,barcode02,sample2.fastq.gz
treated,1,barcode03,sample3.fastq.gz
treated,2,barcode04,sample4.fastq.gz

Reference Files

  • Reference genome (FASTA)
  • Annotation file (GTF/GFF3)
  • Transcriptome FASTA (optional)

Key Parameters

Input Options

  • --input: Sample sheet path
  • --protocol: Sequencing protocol
    • DNA: Genomic DNA sequencing
    • cDNA: PCR-amplified cDNA
    • directRNA: Direct RNA sequencing

Basecalling (Raw Data)

  • --flowcell: Flow cell version
  • --kit: Sequencing kit
  • --guppy_config: Guppy configuration
  • --guppy_model: Custom basecalling model
  • --guppy_gpu: Enable GPU basecalling

Quality Control

  • --skip_basecalling: Skip if pre-basecalled
  • --skip_qc: Skip QC steps
  • --min_read_length: Minimum read length
  • --max_read_length: Maximum read length
  • --min_read_qual: Minimum mean quality

Alignment

  • --aligner: Alignment tool (minimap2)
  • --minimap2_opts: Additional minimap2 options
  • --save_align_intermeds: Save intermediate files

Analysis Options

  • --quantification_method:
    • bambu: Transcript discovery/quantification
    • stringtie2: Alternative quantification
  • --skip_quantification: Skip transcript analysis
  • --skip_differential_analysis: Skip DE analysis

Pipeline Outputs

Quality Control

  1. Sequencing Metrics

    • nanoplot/: Read length, quality distributions
    • pycoqc/: Comprehensive run statistics
    • fastqc/: Per-base quality scores
  2. Processing Reports

    • Adapter trimming statistics
    • Filtering summary
    • Demultiplexing report

Alignment Results

  1. BAM Files

    • Sorted, indexed alignments
    • Alignment statistics
    • Coverage tracks (BigWig)
  2. Alignment Metrics

    • Mapping rates
    • Error profiles
    • Coverage statistics

Downstream Analysis

  1. Transcriptomics (RNA/cDNA)

    • Gene/transcript counts
    • Novel transcript annotations
    • Differential expression results
    • Isoform switching analysis
  2. Genomics (DNA)

    • Variant calls (VCF)
    • Structural variants
    • Methylation calls (if applicable)
  3. Visualizations

    • IGV-ready tracks
    • Expression heatmaps
    • PCA plots

Protocol-Specific Workflows

Direct RNA Sequencing

--protocol directRNA
--skip_alignment false
--quantification_method bambu
--skip_differential_analysis false

Genomic DNA Analysis

--protocol DNA
--call_variants true
--structural_variants true
--skip_quantification true

cDNA Isoform Analysis

--protocol cDNA
--quantification_method stringtie2
--skip_alignment false
--save_reference_annotation true

Targeted Sequencing

--protocol DNA
--targeted_alignment true
--bed_file targets.bed
--skip_quantification true

Best Practices

Sample Preparation

  1. Use high molecular weight DNA/RNA
  2. Avoid PCR amplification when possible
  3. Include spike-in controls for quantification
  4. Sequence sufficient depth for your application

Basecalling

  1. Use latest Guppy version
  2. Select appropriate config for chemistry
  3. Enable GPU acceleration for speed
  4. Consider live basecalling for large runs

Quality Control

  1. Set length filters based on expected sizes
  2. Remove low-quality reads (Q7 minimum)
  3. Check for adapter contamination
  4. Verify barcode assignments

Analysis

  1. Use appropriate presets for minimap2
  2. Enable splice-aware alignment for RNA
  3. Consider multi-mapped reads for repeats
  4. Validate novel transcripts

Troubleshooting

Common Issues

Low Yield

  • Check RNA/DNA quality (degradation)
  • Verify library preparation
  • Review pore occupancy
  • Examine adapter ligation efficiency

Poor Alignment

  • Confirm correct reference genome
  • Check for contamination
  • Adjust minimap2 parameters
  • Consider reference quality

Basecalling Problems

  • Update Guppy version
  • Verify flow cell/kit selection
  • Check GPU memory (if applicable)
  • Monitor system resources

Advanced Features

Modified Base Detection

--guppy_model template_r9.4.1_450bps_modbases_5mc_cg_hac.cfg
--skip_demethylation false
--methylation_threshold 0.8

Fusion Detection

--protocol cDNA
--fusion_detection true
--fusion_tool arriba

Allele-Specific Expression

--protocol directRNA
--phased_vcf sample.vcf.gz
--quantification_method bambu

Real-Time Analysis

--watch_path /path/to/sequencing/run
--real_time true
--min_batch_size 4000

Output Interpretation

Key Metrics

  1. Read N50: Median read length
  2. Mean Quality: Average Phred score (>Q10 good)
  3. Mapping Rate: >80% expected for good data
  4. Transcript Detection: Number of expressed genes

RNA-Seq Specific

  • Full-length Reads: Percentage with 5' and 3' ends
  • Isoform Diversity: Novel vs known transcripts
  • Poly(A) Tail Length: Direct RNA only
  • RNA Modifications: If detection enabled

DNA-Seq Specific

  • Coverage Uniformity: Evenness across genome
  • Variant Quality: QUAL scores in VCF
  • SV Validation: Split read support
  • Methylation Patterns: CpG island coverage

Additional Resources

Previous
Ampliseq (16S/ITS)