Pipelines
Airrflow
Overview
Flow provides the nf-core/airrflow v4.0.0 pipeline for processing bulk B cell receptor (BCR) and T cell receptor (TCR) repertoire sequencing data. It performs V(D)J assignment, clonotyping, lineage reconstruction, and repertoire analysis using the Immcantation framework.
Airrflow can process data from various experimental protocols including multiplex PCR and 5' RACE, supporting both Illumina and 454 sequencing platforms.
Pipeline Summary
The pipeline performs these key steps:
Quality Control
- FastQC for read quality assessment
- Optional quality filtering with FASTP
V(D)J Assignment
- Sequence annotation using IgBLAST or IMGT
- Functional gene assignment
- Junction analysis
Quality Filtering
- Removal of non-functional sequences
- Primer match validation
- Sequence quality thresholds
Clonal Analysis
- Clonal clustering and assignment
- Lineage tree construction
- Diversity analysis
Repertoire Analysis
- Clonal abundance
- V/J gene usage
- CDR3 properties
- Diversity metrics
Report Generation
- Comprehensive HTML report
- Repertoire statistics
- Quality metrics
Input Requirements
Sequencing Data
- Paired-end or single-end FASTQ files
- UMI barcodes (optional but recommended)
- Minimum read length: 300bp recommended
Metadata
Required metadata in samplesheet:
- Sample ID
- Subject/patient ID
- Sample type (e.g., PBMC, tissue)
- Treatment/timepoint
- PCR target (e.g., IGH, TRA)
- Species (human, mouse)
Primers
- Forward and reverse primer sequences
- C-region primer sequences for 5' RACE
Key Parameters
Basic Parameters
--protocol
: Experimental protocol (pcr_umi, race_5prime)--library_generation_method
: Library prep method--cprimers
: Path to C-region primers (5' RACE)--race_linker
: Linker sequence (5' RACE)
Analysis Parameters
--umi_length
: UMI barcode length--umi_position
: UMI position (R1/R2)--igblast_base
: Path to IgBLAST database--imgt_base
: Path to IMGT database
Filtering Parameters
--filterseq_q
: Quality score threshold--primer_maxerror
: Maximum primer matching error--primer_mask_mode
: Primer masking strategy
Clonal Analysis
--clonal_threshold
: Distance threshold for clonal grouping--clonal_method
: Clustering method--lineage_reconstruction
: Enable lineage trees
Pipeline Outputs
Sequence Data
Annotated sequences (AIRR format)
- V(D)J assignments
- CDR/FWR boundaries
- Functionality calls
Clonal data
- Clonal assignments
- Germline sequences
- Lineage relationships
Analysis Results
Repertoire metrics
- Clonal diversity indices
- V/J gene usage frequencies
- CDR3 length distributions
- SHM frequency analysis
Quality reports
- Read quality statistics
- Annotation success rates
- Primer match statistics
Visualizations
- Clonal abundance plots
- V/J gene usage heatmaps
- Lineage trees
- Diversity rarefaction curves
Report Files
repertoire_report.html
: Comprehensive analysis reportairr_table.tsv
: AIRR-compliant sequence tableclones.tsv
: Clonal assignment tablelineages/
: Lineage tree files
Common Use Cases
Basic BCR Analysis
--protocol pcr_umi
--library_generation_method ig_library
--umi_length 12
--cprimers false
5' RACE TCR Analysis
--protocol race_5prime
--library_generation_method race_library
--cprimers cprimers.fasta
--race_linker AAGCAGTGGTATCAACGCAGAGTACATGGG
Bulk RNA-seq with BCR/TCR
--protocol pcr_umi
--library_generation_method rna_library
--skip_lineage true
Tips and Best Practices
- UMI Usage: Always use UMIs when possible for accurate clonal abundance
- Read Length: Ensure reads cover the entire V(D)J region
- Primer Design: Include constant region in amplicons for better annotation
- Sample Size: Include 3+ replicates per condition for robust analysis
- Quality Control: Review the HTML report carefully before downstream analysis
Troubleshooting
Low Assignment Rate
- Check primer sequences and orientation
- Verify species and locus match
- Ensure sufficient read length
- Review quality scores
Clonal Inflation
- Verify UMI handling
- Check PCR duplicate removal
- Adjust clonal threshold
- Review primer bias
Memory Issues
- Reduce
--max_memory
per process - Process samples individually
- Use
--skip_lineage
for large datasets
Additional Resources
- Full documentation: nf-core/airrflow documentation
- Pipeline source code: GitHub - nf-core/airrflow
- Immcantation framework: immcantation.org
- AIRR Community Standards: airr-community.org
- Support: Join the
#airrflow
channel on nf-core Slack - Citation: The AIRR Community (2017) doi.org/10.1038/s41592-019-0687-1