Managing Pipelines (Admin Guide)

This guide covers how Flow administrators can add, configure, and manage bioinformatics pipelines through the admin interface. It explains the pipeline hierarchy, version management, and best practices for maintaining a pipeline catalog.

Overview

Flow organizes pipelines in a hierarchical structure that provides flexibility while maintaining organization:

Pipeline Repository (Git)
└── Pipeline Category (e.g., "Sequencing Analysis")
    └── Pipeline Subcategory (e.g., "RNA Analysis")
        └── Pipeline (e.g., "RNA-seq")
            └── Pipeline Version (e.g., "v3.14.0")

Each level serves a specific purpose:

Repositories contain the actual pipeline code
Categories organize pipelines by broad analysis type
Subcategories provide further organization
Pipelines represent specific analysis workflows
Versions are specific, runnable implementations

Accessing the Admin Interface

Navigate to your Flow instance
Log in with an admin account
Click your username in the top-right corner
Select "Admin" from the dropdown menu

The pipeline management sections are:

Pipeline Repos - Git repositories
Pipeline Categories - Top-level organization
Pipeline Subcategories - Second-level organization
Pipelines - Individual workflows
Pipeline Versions - Specific implementations
Pipeline Config - Global configuration files

Managing Pipeline Repositories

Adding a Repository

Navigate to Admin → Pipeline Repos
Click "Add Pipeline Repo"
Fill in the repository details:

URL: https://github.com/nf-core/rnaseq
Path: /          # Leave as "/" for root, or specify subdirectory
Private Key: (leave blank for public repos)

Click "Save"

Flow will automatically:

Clone the repository
Validate the URL
Set up authentication if a private key is provided

Private Repositories

For private repositories:

Generate an SSH key pair:

ssh-keygen -t ed25519 -f flow_pipeline_key

Add the public key to your Git provider (GitHub, GitLab, etc.)
In Flow admin:
- Paste the private key contents into the "Private Key" field
- Use the SSH URL format: git@github.com:org/repo.git

Repository Best Practices

Use specific repos - One repository per pipeline or a collection of related pipelines
Keep repos focused - Avoid mixing pipelines with different purposes
Document requirements - Include clear README files
Tag releases - Use semantic versioning for stability

Creating Pipeline Categories

Categories organize pipelines into logical groups for users.

Adding a Category

Go to Admin → Pipeline Categories
Click "Add Pipeline Category"
Enter:
- Name: Display name (e.g., "Sequencing Analysis")
- Slug: URL-friendly identifier (auto-generated)
- Order: Display order (lower numbers appear first)
- Description: What types of analyses this includes

Adding Subcategories

Go to Admin → Pipeline Subcategories
Click "Add Pipeline Subcategory"
Enter:
- Category: Parent category
- Name: Subcategory name (e.g., "RNA Analysis")
- Description: More specific description
- Order: Display order within the category

Organization Strategy

Recommended category structure:

Sequencing Analysis/
├── RNA Analysis/
│   ├── RNA-seq
│   ├── scRNA-seq
│   └── Small RNA-seq
├── DNA Analysis/
│   ├── Whole Genome
│   ├── Exome
│   └── Amplicon
└── Epigenomics/
    ├── ChIP-seq
    ├── ATAC-seq
    └── Bisulfite-seq

Adding Pipelines

Creating a Pipeline

Navigate to Admin → Pipelines
Click "Add Pipeline"
Configure the pipeline:

Basic Information:

Pipeline Repo: Select the repository containing this pipeline
Name: User-friendly name (e.g., "RNA-seq")
Slug: URL identifier (auto-generated)
Category/Subcategory: Where to place in the catalog

Features (Checkboxes):

Imports Samples: Can process Flow sample data
Imports Data: Can process other Flow data types
Imports Multiplexed: Can process multiplexed sequencing data
Can Fail: Whether partial failures are acceptable
Deleted: Soft delete (hide from users)

Additional Settings:

Description: What the pipeline does (Markdown supported)
Documentation Link: External documentation URL
Order: Display order within subcategory

Pipeline Configuration

Each pipeline should specify:

What types of input it accepts
Key parameters users might want to modify
Expected outputs
Computational requirements

Managing Pipeline Versions

Pipeline versions are the actual runnable instances of a pipeline. Each version is tied to a specific Git reference (branch or tag).

Understanding Version References

Flow supports two types of Git references:

1. Git Tags (Recommended for Production)

Reference: v3.14.0
Type: Immutable, specific release
Use for: Stable, tested versions

2. Git Branches

Reference: main, dev, feature/new-analysis
Type: Mutable, can change
Use for: Development, testing

Adding a Version

Go to Admin → Pipeline Versions
Click "Add Pipeline Version"
Configure:

Version Information:

Pipeline: Select the parent pipeline
Version: Version identifier (e.g., "3.14.0" or "dev")
Git Branch or Tag: Git reference (e.g., "v3.14.0" or "main")
Downgrade: Previous version to suggest if this fails

File Paths:

Path: Path to main.nf file (e.g., "main.nf" or "workflows/rnaseq/main.nf")
Schema Path: Path to schema JSON (e.g., "nextflow_schema.json")
Config Paths: Pipeline-specific configs (comma-separated)

Settings:

Active: Whether users can run this version
Prerelease: Mark beta/testing versions
Deleted: Soft delete

Dependencies:

Minimum Nextflow Version: Required Nextflow version
Upstream Pipeline Versions: Dependencies on other pipelines

Version Naming Conventions

Semantic Versioning (Recommended):

3.14.0 - Stable release
3.14.1 - Patch release
3.15.0-beta.1 - Beta release
4.0.0-rc.1 - Release candidate

Development Versions:

dev - Latest development
staging - Pre-production testing
hotfix-issue123 - Specific fixes

Handling Schema Files

The schema file defines pipeline parameters and is crucial for the UI:

{
  "$schema": "http://json-schema.org/draft-07/schema",
  "title": "RNA-seq pipeline",
  "description": "Nextflow RNA-seq analysis pipeline",
  "type": "object",
  "definitions": {
    "input_output_options": {
      "title": "Input/output options",
      "type": "object",
      "properties": {
        "input": {
          "type": "string",
          "format": "file-path",
          "description": "Sample sheet CSV file"
        }
      }
    }
  }
}

Pipeline Configuration Files

Global configuration files apply to all pipeline executions.

Adding Global Config

Go to Admin → Pipeline Config
Click "Add Pipeline Config"
Create configuration:

Example Development Config:

// dev.config
docker.enabled = true
docker.runOptions = '-u $(id -u):$(id -g)'

process {
    maxRetries = 2
    errorStrategy = 'retry'
    
    withLabel: 'process_low' {
        cpus = 2
        memory = '4.GB'
        time = '6.h'
    }
    
    withLabel: 'process_medium' {
        cpus = 6
        memory = '16.GB'
        time = '12.h'
    }
}

params {
    max_cpus = 16
    max_memory = '128.GB'
    max_time = '48.h'
}

Example Production Config:

// production.config
singularity.enabled = true
singularity.autoMounts = true

executor {
    name = 'slurm'
    queueSize = 100
    pollInterval = '30 sec'
}

process {
    executor = 'slurm'
    queue = 'bioinformatics'
    
    errorStrategy = { task.exitStatus in [143,137,104,134,139] ? 'retry' : 'finish' }
    maxRetries = 3
}

Environment-Specific Configs

Create different configs for different environments:

local.config - Local development
hpc.config - HPC cluster execution
cloud.config - Cloud computing
test.config - Minimal test runs

Updating Pipelines

Updating Repository Code

Navigate to Admin → Pipeline Repos
Select the repository
Flow automatically pulls updates when versions are accessed

Updating Versions

For Branch-based Versions:

Updates are automatic when the branch is updated
Users get the latest code on next execution

For Tag-based Versions:

Create a new version entry for each tag
Mark old versions as inactive or deleted
Use the "Downgrade" field to suggest alternatives

Version Migration Strategy

When updating major versions:

Create new version alongside the old one
Test thoroughly with test data
Mark as prerelease during testing
Communicate changes to users
Deprecate old version gradually
Remove old version after migration period

Access Control

Pipeline Permissions

Control who can run pipelines:

Public Pipelines - Any authenticated user
Group Restricted - Specific user groups
Private Pipelines - Selected users only

Set through pipeline version "Active" status and user permissions.

Execution Limits

Configure per-user or per-group limits:

Maximum concurrent executions
Resource quotas
Priority levels

Monitoring and Maintenance

Health Checks

Regular maintenance tasks:

Check failed executions - Identify pipeline issues
Monitor resource usage - Adjust resource allocations
Review version usage - Deprecate unused versions
Update dependencies - Keep Nextflow versions current

Common Issues and Solutions

Pipeline won't appear:

Check all hierarchy levels exist (repo → category → subcategory → pipeline → version)
Verify version is marked "Active"
Ensure file paths are correct

Schema validation errors:

Validate JSON syntax
Check schema file path
Ensure schema matches pipeline parameters

Git authentication failures:

Verify repository URL
Check private key format (if used)
Test SSH connectivity

Execution failures:

Check pipeline logs
Verify Nextflow version compatibility
Review resource allocations

Using the API

Manage pipelines programmatically using the REST API:

Create Pipeline Version

import requests

response = requests.post(
    'https://api.flow.bio/api/pipelines/versions/',
    headers={'Authorization': f'Bearer {token}'},
    json={
        'pipeline_id': 'pipeline-id',
        'version': '3.15.0',
        'git_branch_or_tag': 'v3.15.0',
        'path': 'main.nf',
        'schema_path': 'nextflow_schema.json',
        'active': True
    }
)

version_data = response.json()
print(f"Created version {version_data['version']} with ID {version_data['id']}")

Update Pipeline

response = requests.patch(
    f'https://api.flow.bio/api/pipelines/{pipeline_id}/',
    headers={'Authorization': f'Bearer {token}'},
    json={
        'description': 'Updated description with new features'
    }
)

pipeline = response.json()
print(f"Updated pipeline: {pipeline['name']}")

Query Pipeline Status

# Get pipeline with versions
response = requests.get(
    f'https://api.flow.bio/api/pipelines/{pipeline_id}/',
    headers={'Authorization': f'Bearer {token}'},
    params={'expand': 'versions,executions'}
)

pipeline = response.json()

# List versions and execution counts
for version in pipeline['versions']:
    execution_count = len(version.get('executions', []))
    status = 'Active' if version['active'] else 'Inactive'
    print(f"Version {version['version']}: {status} - {execution_count} executions")

Best Practices

Repository Management

Use dedicated repositories for production pipelines
Tag releases with semantic versioning
Include comprehensive documentation
Test pipelines before adding to Flow

Version Control

Use tags for production versions
Reserve branches for development
Document breaking changes
Maintain backward compatibility when possible

User Communication

Announce new pipeline versions
Document parameter changes
Provide migration guides
Set appropriate downgrade paths

Performance Optimization

Set realistic resource requirements
Use pipeline profiling data
Implement efficient configs
Monitor execution metrics

Troubleshooting

Pipeline Not Visible

Check in order:

Repository successfully cloned?
Category and subcategory exist?
Pipeline created and not deleted?
Version created and active?
User has permission to view?

Execution Failures

Check execution logs in Flow
Verify input data format
Test with minimal data
Check Nextflow compatibility
Review resource allocation

Git Issues

# Test repository access
git ls-remote https://github.com/org/repo

# Test SSH key
ssh -T git@github.com

# Verify branch/tag exists
git ls-remote --tags --heads https://github.com/org/repo

Database Integrity

-- Check pipeline hierarchy
SELECT 
  pr.url as repo,
  pc.name as category,
  ps.name as subcategory,
  p.name as pipeline,
  pv.version
FROM analysis_pipelineversion pv
JOIN analysis_pipeline p ON pv.pipeline_id = p.id
JOIN analysis_pipelinesubcategory ps ON p.subcategory_id = ps.id
JOIN analysis_pipelinecategory pc ON ps.category_id = pc.id
JOIN analysis_pipelinerepo pr ON p.pipeline_repo_id = pr.id
WHERE pv.active = true
ORDER BY pc.order, ps.order, p.order;

This guide provides Flow administrators with the knowledge needed to effectively manage pipelines. For pipeline development, see Writing Pipelines. For end-user documentation, see Running Pipelines.