nf-core
Introduction to nf-core
nf-core is a community effort to collect a curated set of analysis pipelines built using Nextflow. As such nf-core is:
A community of users and developers.
A curated set of analysis pipelines build using Nextflow.
A set of guidelines (standard).
A set of helper tools.
Community
The nf-core community is a collaborative effort that has been growing since its creation in early 2018, as you can check on the nf-core stats site.
Pipelines
Currently, there are 72 pipelines that are available as part of nf-core (41 released, 25 under development and 6 archived). You can browse all of them on this link.
Guidelines
All nf-core pipelines must meet a series of requirements or guidelines. These guidelines ensure that all nf-core pipelines follow the same standard and stick to current computational standards to achieve reproducibility, interoperability and portability. The guidelines are make available on this link.
Helper tools
To ease the use and development of nf-core pipelines, the community makes available a set of helper tools that we will introduce on this tutorial.
Paper
The main nf-core paper was published in 2020 in Nature Biotechnology and describes the community and the nf-core framework.
Installation
You can use Conda to install nf-core tools. In the command below we create a new named environment that includes nf-core and then, we activate it.
conda create -n nf-core python=3.8 nf-core -c bioconda -c conda-forge -y
conda activate nf-core
Tip
Find alternative ways of installation on the nf-core documentation
We can now check the nf-core available commands:
$ nf-core -h
,--./,-.
___ __ __ __ ___ /,-._.--~\
|\ | |__ __ / ` / \ |__) |__ } {
| \| | \__, \__/ | \ |___ \`-._,-`-,
`._,._,'
nf-core/tools version 2.6 - https://nf-co.re
Usage: nf-core [OPTIONS] COMMAND [ARGS]...
nf-core/tools provides a set of helper tools for use with nf-core Nextflow pipelines.
It is designed for both end-users running pipelines and also developers creating new pipelines.
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────╮
│ --version Show the version and exit. │
│ --verbose -v Print verbose output to the console. │
│ --log-file -l <filename> Save a verbose log to a file. │
│ --help -h Show this message and exit. │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Commands for users ─────────────────────────────────────────────────────────────────────────────╮
│ list List available nf-core pipelines with local info. │
│ launch Launch a pipeline using a web GUI or command line prompts. │
│ download Download a pipeline, nf-core/configs and pipeline singularity images. │
│ licences List software licences for a given workflow. │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Commands for developers ────────────────────────────────────────────────────────────────────────╮
│ create Create a new pipeline using the nf-core template. │
│ lint Check pipeline code against nf-core guidelines. │
│ modules Commands to manage Nextflow DSL2 modules (tool wrappers). │
│ schema Suite of tools for developers to manage pipeline schema. │
│ bump-version Update nf-core pipeline version number. │
│ sync Sync a pipeline TEMPLATE branch with the nf-core template. │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Commands ───────────────────────────────────────────────────────────────────────────────────────╮
│ subworkflows Commands to manage Nextflow DSL2 subworkflows (tool wrappers). │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
As shown in the screenshot, nf-core tools provides you with some commands meant for users and with some commands meant for developers. Here, we will discuss how nf-core can be used from a user point of view.
Tip
If you are interested in learn how nf-core could help you developing your pipelines please refer to the tool page in the nf-core site or follow this tutorial.
Commands for users
Listing pipelines
To show all the available nf-core pipelines, we can use the nf-core list command. This command also provides some other information as the last version of each of the nf-core pipelines, its publication and and when you last pulled the pipeline to your local system.
$ nf-core list
,--./,-.
___ __ __ __ ___ /,-._.--~\
|\ | |__ __ / ` / \ |__) |__ } {
| \| | \__, \__/ | \ |___ \`-._,-`-,
`._,._,'
nf-core/tools version 2.6 - https://nf-co.re
┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┓
┃ Pipeline Name ┃ Stars ┃ Latest Release ┃ Released ┃ Last Pulled ┃ Have latest release? ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━┩
│ coproid │ 7 │ 1.1.1 │ yesterday │ - │ - │
│ cutandrun │ 31 │ 3.0 │ 1 weeks ago │ - │ - │
│ smrnaseq │ 44 │ 2.1.0 │ 2 weeks ago │ 16 minutes ago │ Yes (v2.1.0) │
│ nascent │ 5 │ 2.0.0 │ 2 weeks ago │ - │ - │
│ hgtseq │ 12 │ 1.0.0 │ 2 weeks ago │ - │ - │
│ hlatyping │ 35 │ 2.0.0 │ 3 weeks ago │ - │ - │
│ demultiplex │ 14 │ 1.0.0 │ 4 weeks ago │ - │ - │
│ scrnaseq │ 64 │ 2.1.0 │ 4 weeks ago │ - │ - │
│ chipseq │ 129 │ 2.0.0 │ 1 months ago │ 18 minutes ago │ Yes (v2.0.0) │
│ rnaseq │ 529 │ 3.9 │ 1 months ago │ 19 minutes ago │ Yes (v3.9) │
│ isoseq │ 5 │ 1.1.1 │ 1 months ago │ - │ - │
│ sarek │ 205 │ 3.0.2 │ 1 months ago │ 7 months ago │ No (v2.7.1) │
│ airrflow │ 22 │ 2.3.0 │ 2 months ago │ - │ - │
│ ampliseq │ 99 │ 2.4.0 │ 2 months ago │ - │ - │
│ mag │ 103 │ 2.2.1 │ 2 months ago │ - │ - │
│ epitopeprediction │ 22 │ 2.1.0 │ 3 months ago │ - │ - │
│ eager │ 76 │ 2.4.5 │ 3 months ago │ - │ - │
│ viralrecon │ 79 │ 2.5 │ 4 months ago │ 12 months ago │ No (v2.2) │
│ rnafusion │ 81 │ 2.1.0 │ 4 months ago │ - │ - │
│ fetchngs │ 64 │ 1.7 │ 4 months ago │ 5 months ago │ No (v1.5) │
│ circdna │ 10 │ 1.0.1 │ 5 months ago │ - │ - │
│ nanoseq │ 88 │ 3.0.0 │ 5 months ago │ - │ - │
│ rnavar │ 13 │ 1.0.0 │ 5 months ago │ - │ - │
│ mnaseseq │ 7 │ 1.0.0 │ 5 months ago │ - │ - │
│ atacseq │ 119 │ 1.2.2 │ 6 months ago │ 20 hours ago │ No (dev - 88d4e6d) │
[..truncated..]
Tip
The pipelines can be sorted by latest release (-s release
, default), by the last time you pulled a local copy
(-s pulled
), alphabetically (-s name
) or by the number of GitHub stars (-s stars
).
Filtering available nf-core pipelines
It is also possible to use keywords after the list
command so that the list of pipelines is shortened to those
matching the keywords or including them in the description. We can use the command below to filter on the atac
and atac-seq keywords:
$ nf-core list atac
,--./,-.
___ __ __ __ ___ /,-._.--~\
|\ | |__ __ / ` / \ |__) |__ } {
| \| | \__, \__/ | \ |___ \`-._,-`-,
`._,._,'
nf-core/tools version 2.6 - https://nf-co.re
┏━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┓
┃ Pipeline Name ┃ Stars ┃ Latest Release ┃ Released ┃ Last Pulled ┃ Have latest release? ┃
┡━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━┩
│ atacseq │ 119 │ 1.2.2 │ 6 months ago │ 17 hours ago │ No (dev - 88d4e6d) │
│ hicar │ 2 │ 1.0.0 │ 6 months ago │ - │ - │
└───────────────┴───────┴────────────────┴──────────────┴──────────────┴──────────────────────┘
Launching pipelines
The launch
command enables to launch nf-core, and also Nextflow, pipelines via a web-based graphical interface or an
interactive command-line wizard tool. This command becomes handy for pipelines with a considerable number of parameters
since it displays the documentation alongside each of the parameters and validate your inputs.
We can now launch an nf-core pipeline:
nf-core launch
To render the description of the parameters, its grouping and defaults, the tool uses the nextflow_schema.json
. This
JSON file is bundled with the pipeline and includes all the information mentioned above, see an example
here.
The chosen non-default parameters are dumped into a JSON file called nf-params.json
. This file can be provided to new
executions using the --params-in
flag. See below an example of a params JSON file:
{
"outdir": "results",
"input": "https://raw.githubusercontent.com/nf-core/test-datasets/atacseq/samplesheet/v2.0/samplesheet_test.csv"
}
It is a good practice in terms of reproducibility to explicitly indicate the version (revision) of the pipeline that
you want to use. This is done using the -r
flag e.g. nf-core launch atacseq --params-in nf-params.json -r dev
.
nf-core configs and profiles
nf-core configs
We have already introduced Nextflow configuration files and profiles during the course. Config files are used by nf-core pipelines to specify the computational requirements of the pipeline, define custom parameters and set which software management system to be used (Docker, Singularity or Conda). As an example take a look to the base.config that is used to set sensible defaults for the computational resources needed by the pipeline.
nf-core core profiles
nf-core pipelines use profiles to bundle a set of configuration attributes. By doing so, we can activate these
attributes by using the -profile
Nextflow command line option. All nf-core pipelines come along with a set of common
"Core profiles" that include the conda
, docker
and singularity
that define which software manager to use and the
test
profile that specifies a minimal test dataset to check that the pipelines works properly.
Note
Each configuration file can include one or several profiles
Institutional profiles
Institutional profiles are profiles where you can specify the configuration attributes for your institution system. They are
hosted in https://github.com/nf-core/configs and all pipelines pull this repository when a pipeline is run. The idea is that
these profiles set the custom config attributes to run nf-core pipelines in your institution (scheduler, container technology,
resources, etc.). This way all the users in a cluster can make use of the profile just setting the profile of your institution
(-profile institution
).
Tip
You can use more than profile at a time by separating them by a comma without space, e.g. -profile test,docker
Custom config
If you need to provide any custom parameter or setting when running a nf-core pipeline, you can do it by creating a local custom
config file and add it to your command with the -c
flag.
*Image from https://carpentries-incubator.github.io/workflows-nextflow.
Note
Profiles will be prioritized from left to right in case conflicting settings are found.
Running pipelines with test data
All nf-core pipelines include a special configuration named test
. This configuration defines all the files and parameters to test
all pipeline functionality with a minimal dataset. Thus, although the functionality of the pipeline is maintained often the results
are not meaningful. As an example, find on the snippet below the test configuration of the nf-core/atacseq.
pipeline.
/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Nextflow config file for running minimal tests
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Defines input files and everything required to run a fast and simple pipeline test.
Use as follows:
nextflow run nf-core/atacseq -profile test,<docker/singularity> --outdir <OUTDIR>
----------------------------------------------------------------------------------------
*/
params {
config_profile_name = 'Test profile'
config_profile_description = 'Minimal test dataset to check pipeline function'
// Limit resources so that this can run on GitHub Actions
max_cpus = 2
max_memory = 6.GB
max_time = 12.h
// Input data
input = 'https://raw.githubusercontent.com/nf-core/test-datasets/atacseq/samplesheet/v2.0/samplesheet_test.csv'
read_length = 50
// Genome references
mito_name = 'MT'
fasta = 'https://raw.githubusercontent.com/nf-core/test-datasets/atacseq/reference/genome.fa'
gtf = 'https://raw.githubusercontent.com/nf-core/test-datasets/atacseq/reference/genes.gtf'
// For speed to avoid CI time-out
fingerprint_bins = 100
// Avoid preseq errors with test data
skip_preseq = true
}
Tip
You can find the current version of the nf-core/atacseq test.config config
here
Downloading pipelines
If your HPC system or server does not have an internet connection you can still run nf-core pipelines by fetching the pipeline files first and then, manually transferring them to your system.
The nf-core download
option simplifies this process and ensures the correct versioning of all the code and containers
needed to run the pipeline. By default, the command will download the pipeline code and the institutional nf-core/configs
files. Again, the -r
flag allows to fetch a given revision of the pipeline.
Finally, you can also download any singularity image files required by the pipeline, if you specify the --singularity
flag.
Tip
If you don't provide any option to `nf-core download
an interactive prompt will ask you for the required options.
We can now try to download the atacseq pipeline using the command below:
nf-core download atacseq -r dev
Now we can inspect the structure of the downloaded directory:
$ tree -L 2 nf-core-atacseq-dev/
nf-core-atacseq-dev/
├── configs
│ ├── ..truncated..
│ ├── nextflow.config
│ ├── nfcore_custom.config
│ ├── pipeline
│ └── README.md
├── singularity-images
│ ├── depot.galaxyproject.org-singularity-ataqv-1.3.0--py39hccc85d7_2.img
│ ├── ..truncated..
│ └── depot.galaxyproject.org-singularity-ucsc-bedgraphtobigwig-377--h446ed27_1.img
└── workflow
├── assets
├── bin
├── ..truncated..
├── main.nf
├── modules
├── ..truncated..
└── workflows
Pipeline output
nf-core pipelines produce a MultiQC report which summarises results at the end of the execution along with software versions of the different tools used, nf-core pipeline version and Nextflow version itself.
Each pipeline provides an example of a MultiQC report from a real execution in the nf-core website. For instance you can find the report corresponding to the current version of nf-core/atacseq here.
Interesting links
Acknowledgements
This nf-core tutorial has been build taking as inspiration the nf-core official tools documentation and the Carpentries materials "Introduction to Bioinformatics workflows with Nextflow and nf-core" that can be find here.