Flow Logo

Core Flow Concepts

Samples

Samples are the core data unit in Flow, representing individual biological specimens and their associated sequencing experiments. Each sample maintains a complete record of your experimental data, from raw sequencing files through every analysis performed.


What are Samples?

In Flow, a sample represents:

  1. A biological specimen - The physical material that was sequenced
  2. Its raw data - The FASTQ files from the sequencer
  3. Experimental metadata - All relevant details about how the data was generated
  4. Analysis history - Every pipeline run on this data
  5. Derived results - All files generated from analyzing this sample

Think of samples as self-contained experimental units that preserve the complete story of your data, from bench to analysis.


Sample Components

Raw Data Files

Every sample starts with sequencing data files:

  • Single-end reads: One FASTQ file per sample
  • Paired-end reads: Two FASTQ files (R1 and R2)
  • Multi-lane data: Multiple FASTQ files that get merged
  • Compressed formats: Support for .gz, .bz2 compression

Example file structure:

Sample_001/
├── Sample_001_R1.fastq.gz  (forward reads)
├── Sample_001_R2.fastq.gz  (reverse reads)
└── metadata.json            (experimental details)

Essential Metadata

Core information tracked for every sample:

Biological Information:

  • Organism (species, strain)
  • Tissue or cell type
  • Developmental stage
  • Sex (if applicable)
  • Age (if applicable)

Experimental Details:

  • Treatment conditions
  • Time points
  • Replicate number
  • Batch information
  • Experimental protocol

Technical Parameters:

  • Sequencing platform
  • Read length
  • Library preparation method
  • Sequencing depth
  • Run date

Quality Metrics:

  • Number of reads
  • Read quality scores
  • Contamination checks
  • Adapter content
  • Duplication rates

Relationships

Samples maintain connections to:

  • Parent project: The study this sample belongs to
  • Related samples: Replicates, time points, or paired samples
  • Executions: All analyses run on this sample
  • Derived data: Results generated from this sample
  • Groups/Users: Who has access to this sample

Types of Samples

By Experimental Design

Biological Replicates

  • Independent biological specimens
  • Same experimental conditions
  • Used to assess biological variability
  • Typically numbered (Rep1, Rep2, Rep3)

Technical Replicates

  • Same biological material
  • Multiple sequencing runs
  • Used to assess technical variability
  • Often merged before analysis

Time Course Samples

  • Same specimen/condition
  • Different time points
  • Track temporal changes
  • Ordered by collection time

Treatment vs Control

  • Paired experimental design
  • Matched conditions except treatment
  • Essential for differential analysis
  • Clear labeling critical

By Data Type

RNA-seq Samples

  • Gene expression profiling
  • Various library types (poly-A, total RNA, small RNA)
  • Strand-specific or unstranded
  • Single-cell or bulk

ChIP-seq Samples

  • Chromatin immunoprecipitation
  • Requires input control
  • Antibody information crucial
  • Peak calling analysis

ATAC-seq Samples

  • Chromatin accessibility
  • No antibody required
  • Nucleosome positioning
  • Open chromatin regions

Other Types

  • CLIP-seq (RNA-protein interactions)
  • Hi-C (3D genome organization)
  • Bisulfite-seq (DNA methylation)
  • Custom assay types

Sample Metadata Schema

Required Fields

Every sample must have:

  1. Name: Unique identifier within the project
  2. Organism: Species being studied
  3. Data files: At least one FASTQ file
  4. Ownership: User who created the sample

For better organization and analysis:

  • Tissue/Cell Type: Specific biological source
  • Treatment: Experimental conditions applied
  • Replicate Number: For grouped analyses
  • Collection Date: When sample was obtained
  • Experimenter: Who performed the wet lab work

Custom Fields

Flow supports flexible metadata:

  • Add any field relevant to your experiment
  • Create custom schemas for repeated experiments
  • Import metadata from spreadsheets
  • Export in standard formats

Metadata Standards

Follow community guidelines:

  • MIAME: Minimum information about microarray experiments
  • MINSEQE: Minimum information about sequencing experiments
  • FAANG: Functional annotation of animal genomes
  • Custom standards: Institution-specific requirements

Sample Organization

Naming Conventions

Good sample names are:

  • Unique: No duplicates within a project
  • Informative: Convey key information
  • Systematic: Follow a consistent pattern
  • Parseable: Can be processed programmatically

Examples of good naming:

✅ MouseLiver_WT_Rep1_D0
✅ PatientA_Tumor_PreTreatment
✅ CRISPR_Gene1_KO_Clone3
✅ H3K27ac_NeuralStemCells_48h

Examples to avoid:

❌ Sample1
❌ Test
❌ John's sample
❌ Data from Tuesday

Grouping Strategies

By Biological Replicate:

Treatment_Rep1
Treatment_Rep2
Treatment_Rep3
Control_Rep1
Control_Rep2
Control_Rep3

By Time Point:

Differentiation_Day0
Differentiation_Day1
Differentiation_Day3
Differentiation_Day7

By Condition Matrix:

CellTypeA_Treatment1_Rep1
CellTypeA_Treatment2_Rep1
CellTypeB_Treatment1_Rep1
CellTypeB_Treatment2_Rep1

Batch Management

When dealing with multiple batches:

  1. Track batch information: Include in metadata
  2. Process together: Run batch correction
  3. Visualize batches: Check for batch effects
  4. Document clearly: Note any batch-specific issues

Working with Samples

Creating Samples

Methods to create samples in Flow:

  1. Individual Upload

    • Upload FASTQ files
    • Enter metadata manually
    • Best for small studies
  2. Bulk Upload

    • Upload multiple files
    • Import metadata spreadsheet
    • Efficient for large studies
  3. Direct Transfer

    • From sequencing facility
    • Automated metadata capture
    • Reduces manual errors
  4. API Creation

    • Programmatic sample creation
    • Integration with LIMS
    • Fully automated workflows

Quality Control

Flow automatically performs QC on upload:

  • File validation: Ensures FASTQ format is correct
  • Read counting: Determines sequencing depth
  • Quality metrics: Calculates per-base quality scores
  • Contamination screening: Checks for common contaminants
  • Adapter detection: Identifies sequencing adapters

Sample Updates

After creation, you can:

  • Add metadata: Enrich with additional information
  • Fix errors: Correct metadata mistakes
  • Link samples: Establish relationships
  • Add files: Include supplementary data
  • Update permissions: Change access control

Sample States

Samples progress through states:

  1. Uploading: Files being transferred
  2. Processing: QC being performed
  3. Ready: Available for analysis
  4. Analyzing: Currently in a pipeline
  5. Complete: Has analysis results
  6. Archived: Moved to long-term storage

Best Practices

Before You Start

  1. Plan your naming scheme: Decide on conventions early
  2. Prepare metadata: Gather all information beforehand
  3. Check file integrity: Verify files before upload
  4. Organize locally: Structure files logically
  5. Document protocols: Record experimental methods

During Upload

  1. Use stable connection: Avoid interruptions
  2. Verify file selection: Double-check before upload
  3. Enter complete metadata: Don't skip fields
  4. Group related samples: Upload batches together
  5. Monitor progress: Watch for errors

After Upload

  1. Review QC results: Check for quality issues
  2. Verify metadata: Ensure accuracy
  3. Test with pipeline: Run a quick analysis
  4. Share appropriately: Set permissions
  5. Document issues: Note any problems

Advanced Features

Sample Templates

Create reusable templates for:

  • Common experiment types
  • Standard metadata fields
  • Repeated study designs
  • Lab-specific protocols

Metadata Import/Export

  • Import from CSV/Excel: Bulk metadata entry
  • Export to standards: GEO, SRA formats
  • API access: Programmatic metadata management
  • Validation rules: Ensure data consistency

Sample Relationships

Define connections between samples:

  • Paired samples: Tumor/normal pairs
  • Time series: Sequential time points
  • Replicates: Biological/technical groups
  • Multi-omics: Same specimen, different assays

Automated Processing

Set up workflows for:

  • Auto-QC: Run quality checks on upload
  • Pipeline triggers: Start analysis automatically
  • Metadata extraction: Parse from filenames
  • Notification rules: Alert on completion

Troubleshooting

Common Issues

Upload failures?

  • Check file format and integrity
  • Verify sufficient storage space
  • Ensure stable network connection
  • Try smaller batch sizes

Metadata errors?

  • Review required fields
  • Check for special characters
  • Verify controlled vocabularies
  • Use templates for consistency

Can't find samples?

  • Check project selection
  • Verify permissions
  • Use search filters
  • Check sample state

Quality issues?

  • Review QC reports
  • Check sequencing metrics
  • Verify correct organism
  • Consider resequencing

Next Steps

For a broader understanding of how samples fit into Flow's architecture, see the Core Concepts guide.

Previous
Executions