Samples - Docs

Samples are the core data unit in Flow, representing individual biological specimens and their associated sequencing experiments. Each sample maintains a complete record of your experimental data, from raw sequencing files through every analysis performed.

What are Samples?

In Flow, a sample represents:

A biological specimen - The physical material that was sequenced
Its raw data - The FASTQ files from the sequencer
Experimental metadata - All relevant details about how the data was generated
Analysis history - Every pipeline run on this data
Derived results - All files generated from analyzing this sample

Think of samples as self-contained experimental units that preserve the complete story of your data, from bench to analysis.

Sample Components

Raw Data Files

Every sample starts with sequencing data files:

Single-end reads: One FASTQ file per sample
Paired-end reads: Two FASTQ files (R1 and R2)
Multi-lane data: Multiple FASTQ files that get merged
Compressed formats: Support for .gz, .bz2 compression

Example file structure:

Sample_001/
├── Sample_001_R1.fastq.gz  (forward reads)
├── Sample_001_R2.fastq.gz  (reverse reads)
└── metadata.json            (experimental details)

Essential Metadata

Core information tracked for every sample:

Biological Information:

Organism (species, strain)
Tissue or cell type
Developmental stage
Sex (if applicable)
Age (if applicable)

Experimental Details:

Treatment conditions
Time points
Replicate number
Batch information
Experimental protocol

Technical Parameters:

Sequencing platform
Read length
Library preparation method
Sequencing depth
Run date

Quality Metrics:

Number of reads
Read quality scores
Contamination checks
Adapter content
Duplication rates

Relationships

Samples maintain connections to:

Parent project: The study this sample belongs to
Related samples: Replicates, time points, or paired samples
Executions: All analyses run on this sample
Derived data: Results generated from this sample
Groups/Users: Who has access to this sample

Types of Samples

By Experimental Design

Biological Replicates

Independent biological specimens
Same experimental conditions
Used to assess biological variability
Typically numbered (Rep1, Rep2, Rep3)

Technical Replicates

Same biological material
Multiple sequencing runs
Used to assess technical variability
Often merged before analysis

Time Course Samples

Same specimen/condition
Different time points
Track temporal changes
Ordered by collection time

Treatment vs Control

Paired experimental design
Matched conditions except treatment
Essential for differential analysis
Clear labeling critical

By Data Type

RNA-seq Samples

Gene expression profiling
Various library types (poly-A, total RNA, small RNA)
Strand-specific or unstranded
Single-cell or bulk

ChIP-seq Samples

Chromatin immunoprecipitation
Requires input control
Antibody information crucial
Peak calling analysis

ATAC-seq Samples

Chromatin accessibility
No antibody required
Nucleosome positioning
Open chromatin regions

Other Types

CLIP-seq (RNA-protein interactions)
Hi-C (3D genome organization)
Bisulfite-seq (DNA methylation)
Custom assay types

Sample Metadata Schema

Required Fields

Every sample must have:

Name: Unique identifier within the project
Organism: Species being studied
Data files: At least one FASTQ file
Ownership: User who created the sample

Recommended Fields

For better organization and analysis:

Tissue/Cell Type: Specific biological source
Treatment: Experimental conditions applied
Replicate Number: For grouped analyses
Collection Date: When sample was obtained
Experimenter: Who performed the wet lab work

Custom Fields

Flow supports flexible metadata:

Add any field relevant to your experiment
Create custom schemas for repeated experiments
Import metadata from spreadsheets
Export in standard formats

Metadata Standards

Follow community guidelines:

MIAME: Minimum information about microarray experiments
MINSEQE: Minimum information about sequencing experiments
FAANG: Functional annotation of animal genomes
Custom standards: Institution-specific requirements

Sample Organization

Naming Conventions

Good sample names are:

Unique: No duplicates within a project
Informative: Convey key information
Systematic: Follow a consistent pattern
Parseable: Can be processed programmatically

Examples of good naming:

✅ MouseLiver_WT_Rep1_D0
✅ PatientA_Tumor_PreTreatment
✅ CRISPR_Gene1_KO_Clone3
✅ H3K27ac_NeuralStemCells_48h

Examples to avoid:

❌ Sample1
❌ Test
❌ John's sample
❌ Data from Tuesday

Grouping Strategies

By Biological Replicate:

Treatment_Rep1
Treatment_Rep2
Treatment_Rep3
Control_Rep1
Control_Rep2
Control_Rep3

By Time Point:

Differentiation_Day0
Differentiation_Day1
Differentiation_Day3
Differentiation_Day7

By Condition Matrix:

CellTypeA_Treatment1_Rep1
CellTypeA_Treatment2_Rep1
CellTypeB_Treatment1_Rep1
CellTypeB_Treatment2_Rep1

Batch Management

When dealing with multiple batches:

Track batch information: Include in metadata
Process together: Run batch correction
Visualize batches: Check for batch effects
Document clearly: Note any batch-specific issues

Working with Samples

Creating Samples

Methods to create samples in Flow:

Individual Upload
- Upload FASTQ files
- Enter metadata manually
- Best for small studies
Bulk Upload
- Upload multiple files
- Import metadata spreadsheet
- Efficient for large studies
Direct Transfer
- From sequencing facility
- Automated metadata capture
- Reduces manual errors
API Creation
- Programmatic sample creation
- Integration with LIMS
- Fully automated workflows

Quality Control

Flow automatically performs QC on upload:

File validation: Ensures FASTQ format is correct
Read counting: Determines sequencing depth
Quality metrics: Calculates per-base quality scores
Contamination screening: Checks for common contaminants
Adapter detection: Identifies sequencing adapters

Sample Updates

After creation, you can:

Add metadata: Enrich with additional information
Fix errors: Correct metadata mistakes
Link samples: Establish relationships
Add files: Include supplementary data
Update permissions: Change access control

Sample States

Samples progress through states:

Uploading: Files being transferred
Processing: QC being performed
Ready: Available for analysis
Analyzing: Currently in a pipeline
Complete: Has analysis results
Archived: Moved to long-term storage

Best Practices

Before You Start

Plan your naming scheme: Decide on conventions early
Prepare metadata: Gather all information beforehand
Check file integrity: Verify files before upload
Organize locally: Structure files logically
Document protocols: Record experimental methods

During Upload

Use stable connection: Avoid interruptions
Verify file selection: Double-check before upload
Enter complete metadata: Don't skip fields
Group related samples: Upload batches together
Monitor progress: Watch for errors

After Upload

Review QC results: Check for quality issues
Verify metadata: Ensure accuracy
Test with pipeline: Run a quick analysis
Share appropriately: Set permissions
Document issues: Note any problems

Advanced Features

Sample Templates

Create reusable templates for:

Common experiment types
Standard metadata fields
Repeated study designs
Lab-specific protocols

Metadata Import/Export

Import from CSV/Excel: Bulk metadata entry
Export to standards: GEO, SRA formats
API access: Programmatic metadata management
Validation rules: Ensure data consistency

Sample Relationships

Define connections between samples:

Paired samples: Tumor/normal pairs
Time series: Sequential time points
Replicates: Biological/technical groups
Multi-omics: Same specimen, different assays

Automated Processing

Set up workflows for:

Auto-QC: Run quality checks on upload
Pipeline triggers: Start analysis automatically
Metadata extraction: Parse from filenames
Notification rules: Alert on completion

Troubleshooting

Common Issues

Upload failures?

Check file format and integrity
Verify sufficient storage space
Ensure stable network connection
Try smaller batch sizes

Metadata errors?

Review required fields
Check for special characters
Verify controlled vocabularies
Use templates for consistency

Can't find samples?

Check project selection
Verify permissions
Use search filters
Check sample state

Quality issues?

Review QC reports
Check sequencing metrics
Verify correct organism
Consider resequencing

Next Steps

Upload your first sample: Step-by-step guide
Understand data types: Learn about file formats
Run quality control: Analyze your samples
Manage permissions: Control access

For a broader understanding of how samples fit into Flow's architecture, see the Core Concepts guide.