A simple example

How do I create my own PEP? To use any PEP-compatible tool, you first need a PEP. A PEP describes a collection of data with its metadata. To create a PEP to represent your dataset, you create 2 files:

  1. Project config file - a yaml file describing file paths and optional project settings
  2. Sample annotation sheet - a csv file with 1 row per sample

In the simplest case, project_config.yaml is just a few lines of yaml. Here’s a minimal example project_config.yaml:

 output_dir: /path/to/output/folder
 sample_annotation: /path/to/sample_annotation.csv

If you’re not already familiar with yaml, it’s a simple and widely used hierarchical markup language used to store key-value pairs; you can read more about yaml here.

The output_dir key specifies where to save results. The sample_annotation key points to the second key part of a PEP, a comma-separated value (csv) file describing samples in the project. Here’s a small example of sample_annotation.csv:

"sample_name", "protocol", "file"
"frog_1", "RNA-seq", "frog1.fq.gz"
"frog_2", "RNA-seq", "frog2.fq.gz"
"frog_3", "RNA-seq", "frog3.fq.gz"
"frog_4", "RNA-seq", "frog4.fq.gz"

With those two simple files, you are ready to use the pepkit tools! With a single line of code, you could load this into R using pepr, into python using peppy, or run each sample through an arbitrary command-line pipeline using looper.

If you make a habit of describing all your projects like this, you’ll never parse another sample annotation sheet again. You’ll never write another pipeline submission loop.

This simple example presents a minimal functioning PEP. In practice, there are many advanced features of PEP structure. For instance, you can add additional sections to tailor your project for specific tools. But at its core, PEP is simple and generic; this way, you can start with the basics, and only add more complexity as you need it.

More advanced features are described in the docs for sample annotations and docs for project config.

Improve this page