Flow Logo

Administration

Managing Pipelines (Admin Guide)

This guide covers how Flow administrators can add, configure, and manage bioinformatics pipelines through the admin interface. It explains the pipeline hierarchy, version management, and best practices for maintaining a pipeline catalog.


Overview

Flow organizes pipelines in a hierarchical structure that provides flexibility while maintaining organization:

Pipeline Repository (Git)
└── Pipeline Category (e.g., "Sequencing Analysis")
    └── Pipeline Subcategory (e.g., "RNA Analysis")
        └── Pipeline (e.g., "RNA-seq")
            └── Pipeline Version (e.g., "v3.14.0")

Each level serves a specific purpose:

  • Repositories contain the actual pipeline code
  • Categories organize pipelines by broad analysis type
  • Subcategories provide further organization
  • Pipelines represent specific analysis workflows
  • Versions are specific, runnable implementations

Accessing the Admin Interface

  1. Navigate to your Flow instance
  2. Log in with an admin account
  3. Click your username in the top-right corner
  4. Select "Admin" from the dropdown menu

The pipeline management sections are:

  • Pipeline Repos - Git repositories
  • Pipeline Categories - Top-level organization
  • Pipeline Subcategories - Second-level organization
  • Pipelines - Individual workflows
  • Pipeline Versions - Specific implementations
  • Pipeline Config - Global configuration files

Managing Pipeline Repositories

Adding a Repository

  1. Navigate to Admin → Pipeline Repos
  2. Click "Add Pipeline Repo"
  3. Fill in the repository details:
URL: https://github.com/nf-core/rnaseq
Path: /          # Leave as "/" for root, or specify subdirectory
Private Key: (leave blank for public repos)
  1. Click "Save"

Flow will automatically:

  • Clone the repository
  • Validate the URL
  • Set up authentication if a private key is provided

Private Repositories

For private repositories:

  1. Generate an SSH key pair:

    ssh-keygen -t ed25519 -f flow_pipeline_key
    
  2. Add the public key to your Git provider (GitHub, GitLab, etc.)

  3. In Flow admin:

    • Paste the private key contents into the "Private Key" field
    • Use the SSH URL format: git@github.com:org/repo.git

Repository Best Practices

  • Use specific repos - One repository per pipeline or a collection of related pipelines
  • Keep repos focused - Avoid mixing pipelines with different purposes
  • Document requirements - Include clear README files
  • Tag releases - Use semantic versioning for stability

Creating Pipeline Categories

Categories organize pipelines into logical groups for users.

Adding a Category

  1. Go to Admin → Pipeline Categories
  2. Click "Add Pipeline Category"
  3. Enter:
    • Name: Display name (e.g., "Sequencing Analysis")
    • Slug: URL-friendly identifier (auto-generated)
    • Order: Display order (lower numbers appear first)
    • Description: What types of analyses this includes

Adding Subcategories

  1. Go to Admin → Pipeline Subcategories
  2. Click "Add Pipeline Subcategory"
  3. Enter:
    • Category: Parent category
    • Name: Subcategory name (e.g., "RNA Analysis")
    • Description: More specific description
    • Order: Display order within the category

Organization Strategy

Recommended category structure:

Sequencing Analysis/
├── RNA Analysis/
│   ├── RNA-seq
│   ├── scRNA-seq
│   └── Small RNA-seq
├── DNA Analysis/
│   ├── Whole Genome
│   ├── Exome
│   └── Amplicon
└── Epigenomics/
    ├── ChIP-seq
    ├── ATAC-seq
    └── Bisulfite-seq

Adding Pipelines

Creating a Pipeline

  1. Navigate to Admin → Pipelines
  2. Click "Add Pipeline"
  3. Configure the pipeline:

Basic Information:

  • Pipeline Repo: Select the repository containing this pipeline
  • Name: User-friendly name (e.g., "RNA-seq")
  • Slug: URL identifier (auto-generated)
  • Category/Subcategory: Where to place in the catalog

Features (Checkboxes):

  • Imports Samples: Can process Flow sample data
  • Imports Data: Can process other Flow data types
  • Imports Multiplexed: Can process multiplexed sequencing data
  • Can Fail: Whether partial failures are acceptable
  • Deleted: Soft delete (hide from users)

Additional Settings:

  • Description: What the pipeline does (Markdown supported)
  • Documentation Link: External documentation URL
  • Order: Display order within subcategory

Pipeline Configuration

Each pipeline should specify:

  1. What types of input it accepts
  2. Key parameters users might want to modify
  3. Expected outputs
  4. Computational requirements

Managing Pipeline Versions

Pipeline versions are the actual runnable instances of a pipeline. Each version is tied to a specific Git reference (branch or tag).

Understanding Version References

Flow supports two types of Git references:

1. Git Tags (Recommended for Production)

Reference: v3.14.0
Type: Immutable, specific release
Use for: Stable, tested versions

2. Git Branches

Reference: main, dev, feature/new-analysis
Type: Mutable, can change
Use for: Development, testing

Adding a Version

  1. Go to Admin → Pipeline Versions
  2. Click "Add Pipeline Version"
  3. Configure:

Version Information:

  • Pipeline: Select the parent pipeline
  • Version: Version identifier (e.g., "3.14.0" or "dev")
  • Git Branch or Tag: Git reference (e.g., "v3.14.0" or "main")
  • Downgrade: Previous version to suggest if this fails

File Paths:

  • Path: Path to main.nf file (e.g., "main.nf" or "workflows/rnaseq/main.nf")
  • Schema Path: Path to schema JSON (e.g., "nextflow_schema.json")
  • Config Paths: Pipeline-specific configs (comma-separated)

Settings:

  • Active: Whether users can run this version
  • Prerelease: Mark beta/testing versions
  • Deleted: Soft delete

Dependencies:

  • Minimum Nextflow Version: Required Nextflow version
  • Upstream Pipeline Versions: Dependencies on other pipelines

Version Naming Conventions

Semantic Versioning (Recommended):

3.14.0 - Stable release
3.14.1 - Patch release
3.15.0-beta.1 - Beta release
4.0.0-rc.1 - Release candidate

Development Versions:

dev - Latest development
staging - Pre-production testing
hotfix-issue123 - Specific fixes

Handling Schema Files

The schema file defines pipeline parameters and is crucial for the UI:

{
  "$schema": "http://json-schema.org/draft-07/schema",
  "title": "RNA-seq pipeline",
  "description": "Nextflow RNA-seq analysis pipeline",
  "type": "object",
  "definitions": {
    "input_output_options": {
      "title": "Input/output options",
      "type": "object",
      "properties": {
        "input": {
          "type": "string",
          "format": "file-path",
          "description": "Sample sheet CSV file"
        }
      }
    }
  }
}

Pipeline Configuration Files

Global configuration files apply to all pipeline executions.

Adding Global Config

  1. Go to Admin → Pipeline Config
  2. Click "Add Pipeline Config"
  3. Create configuration:

Example Development Config:

// dev.config
docker.enabled = true
docker.runOptions = '-u $(id -u):$(id -g)'

process {
    maxRetries = 2
    errorStrategy = 'retry'
    
    withLabel: 'process_low' {
        cpus = 2
        memory = '4.GB'
        time = '6.h'
    }
    
    withLabel: 'process_medium' {
        cpus = 6
        memory = '16.GB'
        time = '12.h'
    }
}

params {
    max_cpus = 16
    max_memory = '128.GB'
    max_time = '48.h'
}

Example Production Config:

// production.config
singularity.enabled = true
singularity.autoMounts = true

executor {
    name = 'slurm'
    queueSize = 100
    pollInterval = '30 sec'
}

process {
    executor = 'slurm'
    queue = 'bioinformatics'
    
    errorStrategy = { task.exitStatus in [143,137,104,134,139] ? 'retry' : 'finish' }
    maxRetries = 3
}

Environment-Specific Configs

Create different configs for different environments:

  • local.config - Local development
  • hpc.config - HPC cluster execution
  • cloud.config - Cloud computing
  • test.config - Minimal test runs

Updating Pipelines

Updating Repository Code

  1. Navigate to Admin → Pipeline Repos
  2. Select the repository
  3. Flow automatically pulls updates when versions are accessed

Updating Versions

For Branch-based Versions:

  • Updates are automatic when the branch is updated
  • Users get the latest code on next execution

For Tag-based Versions:

  • Create a new version entry for each tag
  • Mark old versions as inactive or deleted
  • Use the "Downgrade" field to suggest alternatives

Version Migration Strategy

When updating major versions:

  1. Create new version alongside the old one
  2. Test thoroughly with test data
  3. Mark as prerelease during testing
  4. Communicate changes to users
  5. Deprecate old version gradually
  6. Remove old version after migration period

Access Control

Pipeline Permissions

Control who can run pipelines:

  1. Public Pipelines - Any authenticated user
  2. Group Restricted - Specific user groups
  3. Private Pipelines - Selected users only

Set through pipeline version "Active" status and user permissions.

Execution Limits

Configure per-user or per-group limits:

  • Maximum concurrent executions
  • Resource quotas
  • Priority levels

Monitoring and Maintenance

Health Checks

Regular maintenance tasks:

  1. Check failed executions - Identify pipeline issues
  2. Monitor resource usage - Adjust resource allocations
  3. Review version usage - Deprecate unused versions
  4. Update dependencies - Keep Nextflow versions current

Common Issues and Solutions

Pipeline won't appear:

  • Check all hierarchy levels exist (repo → category → subcategory → pipeline → version)
  • Verify version is marked "Active"
  • Ensure file paths are correct

Schema validation errors:

  • Validate JSON syntax
  • Check schema file path
  • Ensure schema matches pipeline parameters

Git authentication failures:

  • Verify repository URL
  • Check private key format (if used)
  • Test SSH connectivity

Execution failures:

  • Check pipeline logs
  • Verify Nextflow version compatibility
  • Review resource allocations

Using the API

Manage pipelines programmatically using the REST API:

Create Pipeline Version

import requests

response = requests.post(
    'https://api.flow.bio/api/pipelines/versions/',
    headers={'Authorization': f'Bearer {token}'},
    json={
        'pipeline_id': 'pipeline-id',
        'version': '3.15.0',
        'git_branch_or_tag': 'v3.15.0',
        'path': 'main.nf',
        'schema_path': 'nextflow_schema.json',
        'active': True
    }
)

version_data = response.json()
print(f"Created version {version_data['version']} with ID {version_data['id']}")

Update Pipeline

response = requests.patch(
    f'https://api.flow.bio/api/pipelines/{pipeline_id}/',
    headers={'Authorization': f'Bearer {token}'},
    json={
        'description': 'Updated description with new features'
    }
)

pipeline = response.json()
print(f"Updated pipeline: {pipeline['name']}")

Query Pipeline Status

# Get pipeline with versions
response = requests.get(
    f'https://api.flow.bio/api/pipelines/{pipeline_id}/',
    headers={'Authorization': f'Bearer {token}'},
    params={'expand': 'versions,executions'}
)

pipeline = response.json()

# List versions and execution counts
for version in pipeline['versions']:
    execution_count = len(version.get('executions', []))
    status = 'Active' if version['active'] else 'Inactive'
    print(f"Version {version['version']}: {status} - {execution_count} executions")

Best Practices

Repository Management

  • Use dedicated repositories for production pipelines
  • Tag releases with semantic versioning
  • Include comprehensive documentation
  • Test pipelines before adding to Flow

Version Control

  • Use tags for production versions
  • Reserve branches for development
  • Document breaking changes
  • Maintain backward compatibility when possible

User Communication

  • Announce new pipeline versions
  • Document parameter changes
  • Provide migration guides
  • Set appropriate downgrade paths

Performance Optimization

  • Set realistic resource requirements
  • Use pipeline profiling data
  • Implement efficient configs
  • Monitor execution metrics

Troubleshooting

Pipeline Not Visible

Check in order:

  1. Repository successfully cloned?
  2. Category and subcategory exist?
  3. Pipeline created and not deleted?
  4. Version created and active?
  5. User has permission to view?

Execution Failures

  1. Check execution logs in Flow
  2. Verify input data format
  3. Test with minimal data
  4. Check Nextflow compatibility
  5. Review resource allocation

Git Issues

# Test repository access
git ls-remote https://github.com/org/repo

# Test SSH key
ssh -T git@github.com

# Verify branch/tag exists
git ls-remote --tags --heads https://github.com/org/repo

Database Integrity

-- Check pipeline hierarchy
SELECT 
  pr.url as repo,
  pc.name as category,
  ps.name as subcategory,
  p.name as pipeline,
  pv.version
FROM analysis_pipelineversion pv
JOIN analysis_pipeline p ON pv.pipeline_id = p.id
JOIN analysis_pipelinesubcategory ps ON p.subcategory_id = ps.id
JOIN analysis_pipelinecategory pc ON ps.category_id = pc.id
JOIN analysis_pipelinerepo pr ON p.pipeline_repo_id = pr.id
WHERE pv.active = true
ORDER BY pc.order, ps.order, p.order;

This guide provides Flow administrators with the knowledge needed to effectively manage pipelines. For pipeline development, see Writing Pipelines. For end-user documentation, see Running Pipelines.

Previous
Installation