Administration
Managing Pipelines (Admin Guide)
This guide covers how Flow administrators can add, configure, and manage bioinformatics pipelines through the admin interface. It explains the pipeline hierarchy, version management, and best practices for maintaining a pipeline catalog.
Overview
Flow organizes pipelines in a hierarchical structure that provides flexibility while maintaining organization:
Pipeline Repository (Git)
└── Pipeline Category (e.g., "Sequencing Analysis")
└── Pipeline Subcategory (e.g., "RNA Analysis")
└── Pipeline (e.g., "RNA-seq")
└── Pipeline Version (e.g., "v3.14.0")
Each level serves a specific purpose:
- Repositories contain the actual pipeline code
- Categories organize pipelines by broad analysis type
- Subcategories provide further organization
- Pipelines represent specific analysis workflows
- Versions are specific, runnable implementations
Accessing the Admin Interface
- Navigate to your Flow instance
- Log in with an admin account
- Click your username in the top-right corner
- Select "Admin" from the dropdown menu
The pipeline management sections are:
- Pipeline Repos - Git repositories
- Pipeline Categories - Top-level organization
- Pipeline Subcategories - Second-level organization
- Pipelines - Individual workflows
- Pipeline Versions - Specific implementations
- Pipeline Config - Global configuration files
Managing Pipeline Repositories
Adding a Repository
- Navigate to Admin → Pipeline Repos
- Click "Add Pipeline Repo"
- Fill in the repository details:
URL: https://github.com/nf-core/rnaseq
Path: / # Leave as "/" for root, or specify subdirectory
Private Key: (leave blank for public repos)
- Click "Save"
Flow will automatically:
- Clone the repository
- Validate the URL
- Set up authentication if a private key is provided
Private Repositories
For private repositories:
Generate an SSH key pair:
ssh-keygen -t ed25519 -f flow_pipeline_key
Add the public key to your Git provider (GitHub, GitLab, etc.)
In Flow admin:
- Paste the private key contents into the "Private Key" field
- Use the SSH URL format:
git@github.com:org/repo.git
Repository Best Practices
- Use specific repos - One repository per pipeline or a collection of related pipelines
- Keep repos focused - Avoid mixing pipelines with different purposes
- Document requirements - Include clear README files
- Tag releases - Use semantic versioning for stability
Creating Pipeline Categories
Categories organize pipelines into logical groups for users.
Adding a Category
- Go to Admin → Pipeline Categories
- Click "Add Pipeline Category"
- Enter:
- Name: Display name (e.g., "Sequencing Analysis")
- Slug: URL-friendly identifier (auto-generated)
- Order: Display order (lower numbers appear first)
- Description: What types of analyses this includes
Adding Subcategories
- Go to Admin → Pipeline Subcategories
- Click "Add Pipeline Subcategory"
- Enter:
- Category: Parent category
- Name: Subcategory name (e.g., "RNA Analysis")
- Description: More specific description
- Order: Display order within the category
Organization Strategy
Recommended category structure:
Sequencing Analysis/
├── RNA Analysis/
│ ├── RNA-seq
│ ├── scRNA-seq
│ └── Small RNA-seq
├── DNA Analysis/
│ ├── Whole Genome
│ ├── Exome
│ └── Amplicon
└── Epigenomics/
├── ChIP-seq
├── ATAC-seq
└── Bisulfite-seq
Adding Pipelines
Creating a Pipeline
- Navigate to Admin → Pipelines
- Click "Add Pipeline"
- Configure the pipeline:
Basic Information:
- Pipeline Repo: Select the repository containing this pipeline
- Name: User-friendly name (e.g., "RNA-seq")
- Slug: URL identifier (auto-generated)
- Category/Subcategory: Where to place in the catalog
Features (Checkboxes):
- Imports Samples: Can process Flow sample data
- Imports Data: Can process other Flow data types
- Imports Multiplexed: Can process multiplexed sequencing data
- Can Fail: Whether partial failures are acceptable
- Deleted: Soft delete (hide from users)
Additional Settings:
- Description: What the pipeline does (Markdown supported)
- Documentation Link: External documentation URL
- Order: Display order within subcategory
Pipeline Configuration
Each pipeline should specify:
- What types of input it accepts
- Key parameters users might want to modify
- Expected outputs
- Computational requirements
Managing Pipeline Versions
Pipeline versions are the actual runnable instances of a pipeline. Each version is tied to a specific Git reference (branch or tag).
Understanding Version References
Flow supports two types of Git references:
1. Git Tags (Recommended for Production)
Reference: v3.14.0
Type: Immutable, specific release
Use for: Stable, tested versions
2. Git Branches
Reference: main, dev, feature/new-analysis
Type: Mutable, can change
Use for: Development, testing
Adding a Version
- Go to Admin → Pipeline Versions
- Click "Add Pipeline Version"
- Configure:
Version Information:
- Pipeline: Select the parent pipeline
- Version: Version identifier (e.g., "3.14.0" or "dev")
- Git Branch or Tag: Git reference (e.g., "v3.14.0" or "main")
- Downgrade: Previous version to suggest if this fails
File Paths:
- Path: Path to main.nf file (e.g., "main.nf" or "workflows/rnaseq/main.nf")
- Schema Path: Path to schema JSON (e.g., "nextflow_schema.json")
- Config Paths: Pipeline-specific configs (comma-separated)
Settings:
- Active: Whether users can run this version
- Prerelease: Mark beta/testing versions
- Deleted: Soft delete
Dependencies:
- Minimum Nextflow Version: Required Nextflow version
- Upstream Pipeline Versions: Dependencies on other pipelines
Version Naming Conventions
Semantic Versioning (Recommended):
3.14.0 - Stable release
3.14.1 - Patch release
3.15.0-beta.1 - Beta release
4.0.0-rc.1 - Release candidate
Development Versions:
dev - Latest development
staging - Pre-production testing
hotfix-issue123 - Specific fixes
Handling Schema Files
The schema file defines pipeline parameters and is crucial for the UI:
{
"$schema": "http://json-schema.org/draft-07/schema",
"title": "RNA-seq pipeline",
"description": "Nextflow RNA-seq analysis pipeline",
"type": "object",
"definitions": {
"input_output_options": {
"title": "Input/output options",
"type": "object",
"properties": {
"input": {
"type": "string",
"format": "file-path",
"description": "Sample sheet CSV file"
}
}
}
}
}
Pipeline Configuration Files
Global configuration files apply to all pipeline executions.
Adding Global Config
- Go to Admin → Pipeline Config
- Click "Add Pipeline Config"
- Create configuration:
Example Development Config:
// dev.config
docker.enabled = true
docker.runOptions = '-u $(id -u):$(id -g)'
process {
maxRetries = 2
errorStrategy = 'retry'
withLabel: 'process_low' {
cpus = 2
memory = '4.GB'
time = '6.h'
}
withLabel: 'process_medium' {
cpus = 6
memory = '16.GB'
time = '12.h'
}
}
params {
max_cpus = 16
max_memory = '128.GB'
max_time = '48.h'
}
Example Production Config:
// production.config
singularity.enabled = true
singularity.autoMounts = true
executor {
name = 'slurm'
queueSize = 100
pollInterval = '30 sec'
}
process {
executor = 'slurm'
queue = 'bioinformatics'
errorStrategy = { task.exitStatus in [143,137,104,134,139] ? 'retry' : 'finish' }
maxRetries = 3
}
Environment-Specific Configs
Create different configs for different environments:
local.config
- Local developmenthpc.config
- HPC cluster executioncloud.config
- Cloud computingtest.config
- Minimal test runs
Updating Pipelines
Updating Repository Code
- Navigate to Admin → Pipeline Repos
- Select the repository
- Flow automatically pulls updates when versions are accessed
Updating Versions
For Branch-based Versions:
- Updates are automatic when the branch is updated
- Users get the latest code on next execution
For Tag-based Versions:
- Create a new version entry for each tag
- Mark old versions as inactive or deleted
- Use the "Downgrade" field to suggest alternatives
Version Migration Strategy
When updating major versions:
- Create new version alongside the old one
- Test thoroughly with test data
- Mark as prerelease during testing
- Communicate changes to users
- Deprecate old version gradually
- Remove old version after migration period
Access Control
Pipeline Permissions
Control who can run pipelines:
- Public Pipelines - Any authenticated user
- Group Restricted - Specific user groups
- Private Pipelines - Selected users only
Set through pipeline version "Active" status and user permissions.
Execution Limits
Configure per-user or per-group limits:
- Maximum concurrent executions
- Resource quotas
- Priority levels
Monitoring and Maintenance
Health Checks
Regular maintenance tasks:
- Check failed executions - Identify pipeline issues
- Monitor resource usage - Adjust resource allocations
- Review version usage - Deprecate unused versions
- Update dependencies - Keep Nextflow versions current
Common Issues and Solutions
Pipeline won't appear:
- Check all hierarchy levels exist (repo → category → subcategory → pipeline → version)
- Verify version is marked "Active"
- Ensure file paths are correct
Schema validation errors:
- Validate JSON syntax
- Check schema file path
- Ensure schema matches pipeline parameters
Git authentication failures:
- Verify repository URL
- Check private key format (if used)
- Test SSH connectivity
Execution failures:
- Check pipeline logs
- Verify Nextflow version compatibility
- Review resource allocations
Using the API
Manage pipelines programmatically using the REST API:
Create Pipeline Version
import requests
response = requests.post(
'https://api.flow.bio/api/pipelines/versions/',
headers={'Authorization': f'Bearer {token}'},
json={
'pipeline_id': 'pipeline-id',
'version': '3.15.0',
'git_branch_or_tag': 'v3.15.0',
'path': 'main.nf',
'schema_path': 'nextflow_schema.json',
'active': True
}
)
version_data = response.json()
print(f"Created version {version_data['version']} with ID {version_data['id']}")
Update Pipeline
response = requests.patch(
f'https://api.flow.bio/api/pipelines/{pipeline_id}/',
headers={'Authorization': f'Bearer {token}'},
json={
'description': 'Updated description with new features'
}
)
pipeline = response.json()
print(f"Updated pipeline: {pipeline['name']}")
Query Pipeline Status
# Get pipeline with versions
response = requests.get(
f'https://api.flow.bio/api/pipelines/{pipeline_id}/',
headers={'Authorization': f'Bearer {token}'},
params={'expand': 'versions,executions'}
)
pipeline = response.json()
# List versions and execution counts
for version in pipeline['versions']:
execution_count = len(version.get('executions', []))
status = 'Active' if version['active'] else 'Inactive'
print(f"Version {version['version']}: {status} - {execution_count} executions")
Best Practices
Repository Management
- Use dedicated repositories for production pipelines
- Tag releases with semantic versioning
- Include comprehensive documentation
- Test pipelines before adding to Flow
Version Control
- Use tags for production versions
- Reserve branches for development
- Document breaking changes
- Maintain backward compatibility when possible
User Communication
- Announce new pipeline versions
- Document parameter changes
- Provide migration guides
- Set appropriate downgrade paths
Performance Optimization
- Set realistic resource requirements
- Use pipeline profiling data
- Implement efficient configs
- Monitor execution metrics
Troubleshooting
Pipeline Not Visible
Check in order:
- Repository successfully cloned?
- Category and subcategory exist?
- Pipeline created and not deleted?
- Version created and active?
- User has permission to view?
Execution Failures
- Check execution logs in Flow
- Verify input data format
- Test with minimal data
- Check Nextflow compatibility
- Review resource allocation
Git Issues
# Test repository access
git ls-remote https://github.com/org/repo
# Test SSH key
ssh -T git@github.com
# Verify branch/tag exists
git ls-remote --tags --heads https://github.com/org/repo
Database Integrity
-- Check pipeline hierarchy
SELECT
pr.url as repo,
pc.name as category,
ps.name as subcategory,
p.name as pipeline,
pv.version
FROM analysis_pipelineversion pv
JOIN analysis_pipeline p ON pv.pipeline_id = p.id
JOIN analysis_pipelinesubcategory ps ON p.subcategory_id = ps.id
JOIN analysis_pipelinecategory pc ON ps.category_id = pc.id
JOIN analysis_pipelinerepo pr ON p.pipeline_repo_id = pr.id
WHERE pv.active = true
ORDER BY pc.order, ps.order, p.order;
This guide provides Flow administrators with the knowledge needed to effectively manage pipelines. For pipeline development, see Writing Pipelines. For end-user documentation, see Running Pipelines.