Pipelines
scDownstream
Overview
Flow provides the nf-core/scdownstream v1.0.0 pipeline for comprehensive downstream analysis of single-cell RNA sequencing (scRNA-seq) data. It takes filtered count matrices as input and provides cell type annotation, trajectory inference, differential expression, and integrative analysis across multiple samples or conditions.
This pipeline complements upstream processing pipelines (like Cell Ranger or STARsolo) by focusing on biological interpretation and advanced computational analyses of single-cell data.
Pipeline Summary
The workflow includes:
Data Integration
- Multiple sample integration
- Batch correction (Harmony, Seurat, scVI)
- Dataset merging and normalization
Quality Control
- Cell filtering metrics
- Gene filtering
- Doublet detection
- Ambient RNA removal
Dimensionality Reduction
- PCA analysis
- UMAP/tSNE visualization
- Feature selection
Clustering & Annotation
- Graph-based clustering
- Automated cell type annotation
- Reference-based mapping
- Manual marker validation
Differential Analysis
- Differential expression
- Gene set enrichment
- Pathway analysis
- Cell-cell communication
Trajectory Analysis
- Pseudotime inference
- Lineage reconstruction
- RNA velocity
- Cell fate prediction
Input Requirements
Count Matrices
- Filtered feature-barcode matrices
- H5AD format (preferred)
- H5, MTX, or CSV formats
- Multiple samples supported
Metadata Requirements
Sample sheet with experimental design:
sample,path,condition,batch,species
sample1,/path/to/sample1.h5ad,control,batch1,human
sample2,/path/to/sample2.h5ad,control,batch1,human
sample3,/path/to/sample3.h5ad,treated,batch2,human
sample4,/path/to/sample4.h5ad,treated,batch2,human
Reference Data
- Cell type reference datasets
- Gene signatures for annotation
- Pathway databases
- Ligand-receptor databases
Key Parameters
Input Configuration
--input
: Sample sheet path--matrix_format
: Input format (h5ad, h5, mtx)--genome
: Species (human, mouse)--transcript_type
: mRNA or total RNA
Quality Control
--min_cells
: Minimum cells per gene--min_features
: Minimum genes per cell--max_features
: Maximum genes per cell--max_mito
: Maximum mitochondrial percentage--doublet_detection
: Method (scrublet, doubletfinder)
Integration Methods
--integration_method
:harmony
: Fast batch correctionseurat
: CCA/RPCA integrationscvi
: Deep learning integrationscanvi
: Semi-supervised integration
--integration_features
: Number of features
Clustering Parameters
--clustering_resolution
: Granularity (0.1-2.0)--clustering_algorithm
: leiden or louvain--n_neighbors
: KNN graph construction--min_dist
: UMAP minimum distance
Cell Type Annotation
--annotation_method
:celltypist
: Automated annotationsingleR
: Reference-basedmanual
: Marker genes only
--reference_dataset
: Built-in or custom--confidence_threshold
: Annotation confidence
Differential Expression
--de_method
: wilcox, t-test, MAST, DESeq2--min_logfc
: Log fold change threshold--min_pct
: Minimum cell percentage--comparison_groups
: Conditions to compare
Pipeline Outputs
Processed Data
Integrated Object
integrated.h5ad
: Merged, batch-corrected datametadata.csv
: Complete cell metadatafeatures.csv
: Selected features
Quality Reports
- QC metrics summary
- Filtering statistics
- Integration diagnostics
Clustering Results
Cell Clusters
- Cluster assignments
- Cluster markers
- Cluster statistics
- Dendrograms
Visualizations
- UMAP/tSNE plots
- Feature plots
- Violin plots
- Dot plots
Cell Type Analysis
Annotations
- Cell type labels
- Confidence scores
- Marker expression
- Reference mapping
Composition
- Cell type proportions
- Condition comparisons
- Statistical tests
Differential Expression
DE Results
- Gene lists per comparison
- Volcano plots
- MA plots
- Heatmaps
Functional Analysis
- GO enrichment
- KEGG pathways
- Gene set scores
- Network analysis
Trajectory Analysis
Pseudotime
- Cell ordering
- Branch points
- Gene dynamics
- Fate probabilities
RNA Velocity
- Velocity vectors
- Stream plots
- Driver genes
- Terminal states
Interactive Reports
- HTML Report: Comprehensive analysis summary
- CellxGene: Interactive data browser
- UCSC Cell Browser: Web visualization
Analysis Workflows
Standard Single-Sample
--integration_method none
--clustering_resolution 0.6
--annotation_method celltypist
--de_method wilcox
Multi-Sample Integration
--integration_method harmony
--batch_key batch
--clustering_resolution 0.8
--comparison_groups condition
Disease vs Control
--integration_method scvi
--de_groups "disease,control"
--pathway_analysis true
--cell_communication true
Developmental Analysis
--trajectory_analysis true
--rna_velocity true
--diffusion_maps true
--annotation_method manual
Best Practices
Data Preparation
- Use properly filtered matrices
- Include all relevant metadata
- Plan batch structure carefully
- Consider sequencing depth
Quality Control
- Set QC thresholds based on data
- Examine QC plots before filtering
- Remove low-quality batches
- Document filtering decisions
Integration Strategy
- Choose method based on batch effects
- Evaluate integration success
- Preserve biological variation
- Check marker preservation
Biological Interpretation
- Validate cell types with markers
- Compare multiple DE methods
- Focus on robust findings
- Consider technical limitations
Troubleshooting
Common Issues
Over-clustering
- Reduce clustering resolution
- Increase minimum cluster size
- Check for technical effects
- Merge similar clusters
Poor Integration
- Try different methods
- Adjust integration parameters
- Remove problematic batches
- Use more integration features
Annotation Problems
- Update reference datasets
- Lower confidence threshold
- Use multiple methods
- Manual curation needed
Trajectory Artifacts
- Verify starting cells
- Check expression dynamics
- Remove cell cycle effects
- Validate with known markers
Advanced Features
Custom References
--annotation_reference /path/to/reference.h5ad
--reference_markers custom_markers.csv
--transfer_labels true
Multi-Modal Integration
--modality_weights "RNA:0.7,ADT:0.3"
--wnn_integration true
--cross_modality true
Spatial Mapping
--spatial_reference spatial_data.h5ad
--mapping_method tangram
--spatial_plots true
Perturbation Analysis
--perturbation_key treatment
--pseudobulk true
--mixscape_analysis true
Output Interpretation
Key Metrics
- nCells: Total cells passing QC
- nClusters: Number of cell populations
- Integration Score: Batch mixing metric
- Annotation Accuracy: If ground truth available
Biological Insights
- Cell Type Composition: Population changes
- Marker Genes: Cluster-defining features
- DE Genes: Condition-specific changes
- Pathways: Enriched biological processes
Quality Indicators
- Silhouette Score: Cluster separation
- ARI: Clustering stability
- LISI: Integration quality
- Velocity Coherence: Trajectory confidence
Additional Resources
- Full documentation: nf-core/scdownstream documentation
- Pipeline source code: GitHub - nf-core/scdownstream
- Scanpy documentation: scanpy.readthedocs.io
- Seurat documentation: satijalab.org/seurat
- Support: Join the
#scdownstream
channel on nf-core Slack - Citation: Wolf et al. (2018) doi.org/10.1186/s13059-017-1382-0