Core Flow Concepts
Samples
Samples are the core data unit in Flow, representing individual biological specimens and their associated sequencing experiments. Each sample maintains a complete record of your experimental data, from raw sequencing files through every analysis performed.
What are Samples?
In Flow, a sample represents:
- A biological specimen - The physical material that was sequenced
- Its raw data - The FASTQ files from the sequencer
- Experimental metadata - All relevant details about how the data was generated
- Analysis history - Every pipeline run on this data
- Derived results - All files generated from analyzing this sample
Think of samples as self-contained experimental units that preserve the complete story of your data, from bench to analysis.
Sample Components
Raw Data Files
Every sample starts with sequencing data files:
- Single-end reads: One FASTQ file per sample
- Paired-end reads: Two FASTQ files (R1 and R2)
- Multi-lane data: Multiple FASTQ files that get merged
- Compressed formats: Support for .gz, .bz2 compression
Example file structure:
Sample_001/
├── Sample_001_R1.fastq.gz (forward reads)
├── Sample_001_R2.fastq.gz (reverse reads)
└── metadata.json (experimental details)
Essential Metadata
Core information tracked for every sample:
Biological Information:
- Organism (species, strain)
- Tissue or cell type
- Developmental stage
- Sex (if applicable)
- Age (if applicable)
Experimental Details:
- Treatment conditions
- Time points
- Replicate number
- Batch information
- Experimental protocol
Technical Parameters:
- Sequencing platform
- Read length
- Library preparation method
- Sequencing depth
- Run date
Quality Metrics:
- Number of reads
- Read quality scores
- Contamination checks
- Adapter content
- Duplication rates
Relationships
Samples maintain connections to:
- Parent project: The study this sample belongs to
- Related samples: Replicates, time points, or paired samples
- Executions: All analyses run on this sample
- Derived data: Results generated from this sample
- Groups/Users: Who has access to this sample
Types of Samples
By Experimental Design
Biological Replicates
- Independent biological specimens
- Same experimental conditions
- Used to assess biological variability
- Typically numbered (Rep1, Rep2, Rep3)
Technical Replicates
- Same biological material
- Multiple sequencing runs
- Used to assess technical variability
- Often merged before analysis
Time Course Samples
- Same specimen/condition
- Different time points
- Track temporal changes
- Ordered by collection time
Treatment vs Control
- Paired experimental design
- Matched conditions except treatment
- Essential for differential analysis
- Clear labeling critical
By Data Type
RNA-seq Samples
- Gene expression profiling
- Various library types (poly-A, total RNA, small RNA)
- Strand-specific or unstranded
- Single-cell or bulk
ChIP-seq Samples
- Chromatin immunoprecipitation
- Requires input control
- Antibody information crucial
- Peak calling analysis
ATAC-seq Samples
- Chromatin accessibility
- No antibody required
- Nucleosome positioning
- Open chromatin regions
Other Types
- CLIP-seq (RNA-protein interactions)
- Hi-C (3D genome organization)
- Bisulfite-seq (DNA methylation)
- Custom assay types
Sample Metadata Schema
Required Fields
Every sample must have:
- Name: Unique identifier within the project
- Organism: Species being studied
- Data files: At least one FASTQ file
- Ownership: User who created the sample
Recommended Fields
For better organization and analysis:
- Tissue/Cell Type: Specific biological source
- Treatment: Experimental conditions applied
- Replicate Number: For grouped analyses
- Collection Date: When sample was obtained
- Experimenter: Who performed the wet lab work
Custom Fields
Flow supports flexible metadata:
- Add any field relevant to your experiment
- Create custom schemas for repeated experiments
- Import metadata from spreadsheets
- Export in standard formats
Metadata Standards
Follow community guidelines:
- MIAME: Minimum information about microarray experiments
- MINSEQE: Minimum information about sequencing experiments
- FAANG: Functional annotation of animal genomes
- Custom standards: Institution-specific requirements
Sample Organization
Naming Conventions
Good sample names are:
- Unique: No duplicates within a project
- Informative: Convey key information
- Systematic: Follow a consistent pattern
- Parseable: Can be processed programmatically
Examples of good naming:
✅ MouseLiver_WT_Rep1_D0
✅ PatientA_Tumor_PreTreatment
✅ CRISPR_Gene1_KO_Clone3
✅ H3K27ac_NeuralStemCells_48h
Examples to avoid:
❌ Sample1
❌ Test
❌ John's sample
❌ Data from Tuesday
Grouping Strategies
By Biological Replicate:
Treatment_Rep1
Treatment_Rep2
Treatment_Rep3
Control_Rep1
Control_Rep2
Control_Rep3
By Time Point:
Differentiation_Day0
Differentiation_Day1
Differentiation_Day3
Differentiation_Day7
By Condition Matrix:
CellTypeA_Treatment1_Rep1
CellTypeA_Treatment2_Rep1
CellTypeB_Treatment1_Rep1
CellTypeB_Treatment2_Rep1
Batch Management
When dealing with multiple batches:
- Track batch information: Include in metadata
- Process together: Run batch correction
- Visualize batches: Check for batch effects
- Document clearly: Note any batch-specific issues
Working with Samples
Creating Samples
Methods to create samples in Flow:
Individual Upload
- Upload FASTQ files
- Enter metadata manually
- Best for small studies
Bulk Upload
- Upload multiple files
- Import metadata spreadsheet
- Efficient for large studies
Direct Transfer
- From sequencing facility
- Automated metadata capture
- Reduces manual errors
API Creation
- Programmatic sample creation
- Integration with LIMS
- Fully automated workflows
Quality Control
Flow automatically performs QC on upload:
- File validation: Ensures FASTQ format is correct
- Read counting: Determines sequencing depth
- Quality metrics: Calculates per-base quality scores
- Contamination screening: Checks for common contaminants
- Adapter detection: Identifies sequencing adapters
Sample Updates
After creation, you can:
- Add metadata: Enrich with additional information
- Fix errors: Correct metadata mistakes
- Link samples: Establish relationships
- Add files: Include supplementary data
- Update permissions: Change access control
Sample States
Samples progress through states:
- Uploading: Files being transferred
- Processing: QC being performed
- Ready: Available for analysis
- Analyzing: Currently in a pipeline
- Complete: Has analysis results
- Archived: Moved to long-term storage
Best Practices
Before You Start
- Plan your naming scheme: Decide on conventions early
- Prepare metadata: Gather all information beforehand
- Check file integrity: Verify files before upload
- Organize locally: Structure files logically
- Document protocols: Record experimental methods
During Upload
- Use stable connection: Avoid interruptions
- Verify file selection: Double-check before upload
- Enter complete metadata: Don't skip fields
- Group related samples: Upload batches together
- Monitor progress: Watch for errors
After Upload
- Review QC results: Check for quality issues
- Verify metadata: Ensure accuracy
- Test with pipeline: Run a quick analysis
- Share appropriately: Set permissions
- Document issues: Note any problems
Advanced Features
Sample Templates
Create reusable templates for:
- Common experiment types
- Standard metadata fields
- Repeated study designs
- Lab-specific protocols
Metadata Import/Export
- Import from CSV/Excel: Bulk metadata entry
- Export to standards: GEO, SRA formats
- API access: Programmatic metadata management
- Validation rules: Ensure data consistency
Sample Relationships
Define connections between samples:
- Paired samples: Tumor/normal pairs
- Time series: Sequential time points
- Replicates: Biological/technical groups
- Multi-omics: Same specimen, different assays
Automated Processing
Set up workflows for:
- Auto-QC: Run quality checks on upload
- Pipeline triggers: Start analysis automatically
- Metadata extraction: Parse from filenames
- Notification rules: Alert on completion
Troubleshooting
Common Issues
Upload failures?
- Check file format and integrity
- Verify sufficient storage space
- Ensure stable network connection
- Try smaller batch sizes
Metadata errors?
- Review required fields
- Check for special characters
- Verify controlled vocabularies
- Use templates for consistency
Can't find samples?
- Check project selection
- Verify permissions
- Use search filters
- Check sample state
Quality issues?
- Review QC reports
- Check sequencing metrics
- Verify correct organism
- Consider resequencing
Next Steps
- Upload your first sample: Step-by-step guide
- Understand data types: Learn about file formats
- Run quality control: Analyze your samples
- Manage permissions: Control access
For a broader understanding of how samples fit into Flow's architecture, see the Core Concepts guide.