Airrflow - Docs

Overview

Flow provides the nf-core/airrflow v4.0.0 pipeline for processing bulk B cell receptor (BCR) and T cell receptor (TCR) repertoire sequencing data. It performs V(D)J assignment, clonotyping, lineage reconstruction, and repertoire analysis using the Immcantation framework.

Airrflow can process data from various experimental protocols including multiplex PCR and 5' RACE, supporting both Illumina and 454 sequencing platforms.

Pipeline Summary

The pipeline performs these key steps:

Quality Control
- FastQC for read quality assessment
- Optional quality filtering with FASTP
V(D)J Assignment
- Sequence annotation using IgBLAST or IMGT
- Functional gene assignment
- Junction analysis
Quality Filtering
- Removal of non-functional sequences
- Primer match validation
- Sequence quality thresholds
Clonal Analysis
- Clonal clustering and assignment
- Lineage tree construction
- Diversity analysis
Repertoire Analysis
- Clonal abundance
- V/J gene usage
- CDR3 properties
- Diversity metrics
Report Generation
- Comprehensive HTML report
- Repertoire statistics
- Quality metrics

Input Requirements

Sequencing Data

Paired-end or single-end FASTQ files
UMI barcodes (optional but recommended)
Minimum read length: 300bp recommended

Metadata

Required metadata in samplesheet:

Sample ID
Subject/patient ID
Sample type (e.g., PBMC, tissue)
Treatment/timepoint
PCR target (e.g., IGH, TRA)
Species (human, mouse)

Primers

Forward and reverse primer sequences
C-region primer sequences for 5' RACE

Key Parameters

Basic Parameters

--protocol: Experimental protocol (pcr_umi, race_5prime)
--library_generation_method: Library prep method
--cprimers: Path to C-region primers (5' RACE)
--race_linker: Linker sequence (5' RACE)

Analysis Parameters

--umi_length: UMI barcode length
--umi_position: UMI position (R1/R2)
--igblast_base: Path to IgBLAST database
--imgt_base: Path to IMGT database

Filtering Parameters

--filterseq_q: Quality score threshold
--primer_maxerror: Maximum primer matching error
--primer_mask_mode: Primer masking strategy

Clonal Analysis

--clonal_threshold: Distance threshold for clonal grouping
--clonal_method: Clustering method
--lineage_reconstruction: Enable lineage trees

Pipeline Outputs

Sequence Data

Annotated sequences (AIRR format)
- V(D)J assignments
- CDR/FWR boundaries
- Functionality calls
Clonal data
- Clonal assignments
- Germline sequences
- Lineage relationships

Analysis Results

Repertoire metrics
- Clonal diversity indices
- V/J gene usage frequencies
- CDR3 length distributions
- SHM frequency analysis
Quality reports
- Read quality statistics
- Annotation success rates
- Primer match statistics
Visualizations
- Clonal abundance plots
- V/J gene usage heatmaps
- Lineage trees
- Diversity rarefaction curves

Report Files

repertoire_report.html: Comprehensive analysis report
airr_table.tsv: AIRR-compliant sequence table
clones.tsv: Clonal assignment table
lineages/: Lineage tree files

Common Use Cases

Basic BCR Analysis

--protocol pcr_umi
--library_generation_method ig_library
--umi_length 12
--cprimers false

5' RACE TCR Analysis

--protocol race_5prime
--library_generation_method race_library
--cprimers cprimers.fasta
--race_linker AAGCAGTGGTATCAACGCAGAGTACATGGG

Bulk RNA-seq with BCR/TCR

--protocol pcr_umi
--library_generation_method rna_library
--skip_lineage true

Tips and Best Practices

UMI Usage: Always use UMIs when possible for accurate clonal abundance
Read Length: Ensure reads cover the entire V(D)J region
Primer Design: Include constant region in amplicons for better annotation
Sample Size: Include 3+ replicates per condition for robust analysis
Quality Control: Review the HTML report carefully before downstream analysis

Troubleshooting

Low Assignment Rate

Check primer sequences and orientation
Verify species and locus match
Ensure sufficient read length
Review quality scores

Clonal Inflation

Verify UMI handling
Check PCR duplicate removal
Adjust clonal threshold
Review primer bias

Memory Issues

Reduce --max_memory per process
Process samples individually
Use --skip_lineage for large datasets

Additional Resources

Full documentation: nf-core/airrflow documentation
Pipeline source code: GitHub - nf-core/airrflow
Immcantation framework: immcantation.org
AIRR Community Standards: airr-community.org
Support: Join the #airrflow channel on nf-core Slack
Citation: The AIRR Community (2017) doi.org/10.1038/s41592-019-0687-1