Flow Logo

Pipelines

scDownstream

Overview

Flow provides the nf-core/scdownstream v1.0.0 pipeline for comprehensive downstream analysis of single-cell RNA sequencing (scRNA-seq) data. It takes filtered count matrices as input and provides cell type annotation, trajectory inference, differential expression, and integrative analysis across multiple samples or conditions.

This pipeline complements upstream processing pipelines (like Cell Ranger or STARsolo) by focusing on biological interpretation and advanced computational analyses of single-cell data.


Pipeline Summary

The workflow includes:

  1. Data Integration

    • Multiple sample integration
    • Batch correction (Harmony, Seurat, scVI)
    • Dataset merging and normalization
  2. Quality Control

    • Cell filtering metrics
    • Gene filtering
    • Doublet detection
    • Ambient RNA removal
  3. Dimensionality Reduction

    • PCA analysis
    • UMAP/tSNE visualization
    • Feature selection
  4. Clustering & Annotation

    • Graph-based clustering
    • Automated cell type annotation
    • Reference-based mapping
    • Manual marker validation
  5. Differential Analysis

    • Differential expression
    • Gene set enrichment
    • Pathway analysis
    • Cell-cell communication
  6. Trajectory Analysis

    • Pseudotime inference
    • Lineage reconstruction
    • RNA velocity
    • Cell fate prediction

Input Requirements

Count Matrices

  • Filtered feature-barcode matrices
  • H5AD format (preferred)
  • H5, MTX, or CSV formats
  • Multiple samples supported

Metadata Requirements

Sample sheet with experimental design:

sample,path,condition,batch,species
sample1,/path/to/sample1.h5ad,control,batch1,human
sample2,/path/to/sample2.h5ad,control,batch1,human
sample3,/path/to/sample3.h5ad,treated,batch2,human
sample4,/path/to/sample4.h5ad,treated,batch2,human

Reference Data

  • Cell type reference datasets
  • Gene signatures for annotation
  • Pathway databases
  • Ligand-receptor databases

Key Parameters

Input Configuration

  • --input: Sample sheet path
  • --matrix_format: Input format (h5ad, h5, mtx)
  • --genome: Species (human, mouse)
  • --transcript_type: mRNA or total RNA

Quality Control

  • --min_cells: Minimum cells per gene
  • --min_features: Minimum genes per cell
  • --max_features: Maximum genes per cell
  • --max_mito: Maximum mitochondrial percentage
  • --doublet_detection: Method (scrublet, doubletfinder)

Integration Methods

  • --integration_method:
    • harmony: Fast batch correction
    • seurat: CCA/RPCA integration
    • scvi: Deep learning integration
    • scanvi: Semi-supervised integration
  • --integration_features: Number of features

Clustering Parameters

  • --clustering_resolution: Granularity (0.1-2.0)
  • --clustering_algorithm: leiden or louvain
  • --n_neighbors: KNN graph construction
  • --min_dist: UMAP minimum distance

Cell Type Annotation

  • --annotation_method:
    • celltypist: Automated annotation
    • singleR: Reference-based
    • manual: Marker genes only
  • --reference_dataset: Built-in or custom
  • --confidence_threshold: Annotation confidence

Differential Expression

  • --de_method: wilcox, t-test, MAST, DESeq2
  • --min_logfc: Log fold change threshold
  • --min_pct: Minimum cell percentage
  • --comparison_groups: Conditions to compare

Pipeline Outputs

Processed Data

  1. Integrated Object

    • integrated.h5ad: Merged, batch-corrected data
    • metadata.csv: Complete cell metadata
    • features.csv: Selected features
  2. Quality Reports

    • QC metrics summary
    • Filtering statistics
    • Integration diagnostics

Clustering Results

  1. Cell Clusters

    • Cluster assignments
    • Cluster markers
    • Cluster statistics
    • Dendrograms
  2. Visualizations

    • UMAP/tSNE plots
    • Feature plots
    • Violin plots
    • Dot plots

Cell Type Analysis

  1. Annotations

    • Cell type labels
    • Confidence scores
    • Marker expression
    • Reference mapping
  2. Composition

    • Cell type proportions
    • Condition comparisons
    • Statistical tests

Differential Expression

  1. DE Results

    • Gene lists per comparison
    • Volcano plots
    • MA plots
    • Heatmaps
  2. Functional Analysis

    • GO enrichment
    • KEGG pathways
    • Gene set scores
    • Network analysis

Trajectory Analysis

  1. Pseudotime

    • Cell ordering
    • Branch points
    • Gene dynamics
    • Fate probabilities
  2. RNA Velocity

    • Velocity vectors
    • Stream plots
    • Driver genes
    • Terminal states

Interactive Reports

  • HTML Report: Comprehensive analysis summary
  • CellxGene: Interactive data browser
  • UCSC Cell Browser: Web visualization

Analysis Workflows

Standard Single-Sample

--integration_method none
--clustering_resolution 0.6
--annotation_method celltypist
--de_method wilcox

Multi-Sample Integration

--integration_method harmony
--batch_key batch
--clustering_resolution 0.8
--comparison_groups condition

Disease vs Control

--integration_method scvi
--de_groups "disease,control"
--pathway_analysis true
--cell_communication true

Developmental Analysis

--trajectory_analysis true
--rna_velocity true
--diffusion_maps true
--annotation_method manual

Best Practices

Data Preparation

  1. Use properly filtered matrices
  2. Include all relevant metadata
  3. Plan batch structure carefully
  4. Consider sequencing depth

Quality Control

  1. Set QC thresholds based on data
  2. Examine QC plots before filtering
  3. Remove low-quality batches
  4. Document filtering decisions

Integration Strategy

  1. Choose method based on batch effects
  2. Evaluate integration success
  3. Preserve biological variation
  4. Check marker preservation

Biological Interpretation

  1. Validate cell types with markers
  2. Compare multiple DE methods
  3. Focus on robust findings
  4. Consider technical limitations

Troubleshooting

Common Issues

Over-clustering

  • Reduce clustering resolution
  • Increase minimum cluster size
  • Check for technical effects
  • Merge similar clusters

Poor Integration

  • Try different methods
  • Adjust integration parameters
  • Remove problematic batches
  • Use more integration features

Annotation Problems

  • Update reference datasets
  • Lower confidence threshold
  • Use multiple methods
  • Manual curation needed

Trajectory Artifacts

  • Verify starting cells
  • Check expression dynamics
  • Remove cell cycle effects
  • Validate with known markers

Advanced Features

Custom References

--annotation_reference /path/to/reference.h5ad
--reference_markers custom_markers.csv
--transfer_labels true

Multi-Modal Integration

--modality_weights "RNA:0.7,ADT:0.3"
--wnn_integration true
--cross_modality true

Spatial Mapping

--spatial_reference spatial_data.h5ad
--mapping_method tangram
--spatial_plots true

Perturbation Analysis

--perturbation_key treatment
--pseudobulk true
--mixscape_analysis true

Output Interpretation

Key Metrics

  1. nCells: Total cells passing QC
  2. nClusters: Number of cell populations
  3. Integration Score: Batch mixing metric
  4. Annotation Accuracy: If ground truth available

Biological Insights

  • Cell Type Composition: Population changes
  • Marker Genes: Cluster-defining features
  • DE Genes: Condition-specific changes
  • Pathways: Enriched biological processes

Quality Indicators

  • Silhouette Score: Cluster separation
  • ARI: Clustering stability
  • LISI: Integration quality
  • Velocity Coherence: Trajectory confidence

Additional Resources

Previous
Nanoseq (Long-read)