Flow Logo

Pipelines

Airrflow

Overview

Flow provides the nf-core/airrflow v4.0.0 pipeline for processing bulk B cell receptor (BCR) and T cell receptor (TCR) repertoire sequencing data. It performs V(D)J assignment, clonotyping, lineage reconstruction, and repertoire analysis using the Immcantation framework.

Airrflow can process data from various experimental protocols including multiplex PCR and 5' RACE, supporting both Illumina and 454 sequencing platforms.


Pipeline Summary

The pipeline performs these key steps:

  1. Quality Control

    • FastQC for read quality assessment
    • Optional quality filtering with FASTP
  2. V(D)J Assignment

    • Sequence annotation using IgBLAST or IMGT
    • Functional gene assignment
    • Junction analysis
  3. Quality Filtering

    • Removal of non-functional sequences
    • Primer match validation
    • Sequence quality thresholds
  4. Clonal Analysis

    • Clonal clustering and assignment
    • Lineage tree construction
    • Diversity analysis
  5. Repertoire Analysis

    • Clonal abundance
    • V/J gene usage
    • CDR3 properties
    • Diversity metrics
  6. Report Generation

    • Comprehensive HTML report
    • Repertoire statistics
    • Quality metrics

Input Requirements

Sequencing Data

  • Paired-end or single-end FASTQ files
  • UMI barcodes (optional but recommended)
  • Minimum read length: 300bp recommended

Metadata

Required metadata in samplesheet:

  • Sample ID
  • Subject/patient ID
  • Sample type (e.g., PBMC, tissue)
  • Treatment/timepoint
  • PCR target (e.g., IGH, TRA)
  • Species (human, mouse)

Primers

  • Forward and reverse primer sequences
  • C-region primer sequences for 5' RACE

Key Parameters

Basic Parameters

  • --protocol: Experimental protocol (pcr_umi, race_5prime)
  • --library_generation_method: Library prep method
  • --cprimers: Path to C-region primers (5' RACE)
  • --race_linker: Linker sequence (5' RACE)

Analysis Parameters

  • --umi_length: UMI barcode length
  • --umi_position: UMI position (R1/R2)
  • --igblast_base: Path to IgBLAST database
  • --imgt_base: Path to IMGT database

Filtering Parameters

  • --filterseq_q: Quality score threshold
  • --primer_maxerror: Maximum primer matching error
  • --primer_mask_mode: Primer masking strategy

Clonal Analysis

  • --clonal_threshold: Distance threshold for clonal grouping
  • --clonal_method: Clustering method
  • --lineage_reconstruction: Enable lineage trees

Pipeline Outputs

Sequence Data

  1. Annotated sequences (AIRR format)

    • V(D)J assignments
    • CDR/FWR boundaries
    • Functionality calls
  2. Clonal data

    • Clonal assignments
    • Germline sequences
    • Lineage relationships

Analysis Results

  1. Repertoire metrics

    • Clonal diversity indices
    • V/J gene usage frequencies
    • CDR3 length distributions
    • SHM frequency analysis
  2. Quality reports

    • Read quality statistics
    • Annotation success rates
    • Primer match statistics
  3. Visualizations

    • Clonal abundance plots
    • V/J gene usage heatmaps
    • Lineage trees
    • Diversity rarefaction curves

Report Files

  • repertoire_report.html: Comprehensive analysis report
  • airr_table.tsv: AIRR-compliant sequence table
  • clones.tsv: Clonal assignment table
  • lineages/: Lineage tree files

Common Use Cases

Basic BCR Analysis

--protocol pcr_umi
--library_generation_method ig_library
--umi_length 12
--cprimers false

5' RACE TCR Analysis

--protocol race_5prime
--library_generation_method race_library
--cprimers cprimers.fasta
--race_linker AAGCAGTGGTATCAACGCAGAGTACATGGG

Bulk RNA-seq with BCR/TCR

--protocol pcr_umi
--library_generation_method rna_library
--skip_lineage true

Tips and Best Practices

  1. UMI Usage: Always use UMIs when possible for accurate clonal abundance
  2. Read Length: Ensure reads cover the entire V(D)J region
  3. Primer Design: Include constant region in amplicons for better annotation
  4. Sample Size: Include 3+ replicates per condition for robust analysis
  5. Quality Control: Review the HTML report carefully before downstream analysis

Troubleshooting

Low Assignment Rate

  • Check primer sequences and orientation
  • Verify species and locus match
  • Ensure sufficient read length
  • Review quality scores

Clonal Inflation

  • Verify UMI handling
  • Check PCR duplicate removal
  • Adjust clonal threshold
  • Review primer bias

Memory Issues

  • Reduce --max_memory per process
  • Process samples individually
  • Use --skip_lineage for large datasets

Additional Resources

Previous
Differential Expression Analysis