A simple example
How do I create my own PEP? To use any PEP-compatible tool, you first need a PEP. A PEP describes a collection of data with its metadata. To create a PEP to represent your dataset, you create 2 files:
- Project config file - a
yamlfile describing file paths and optional project settings
- Sample annotation sheet - a
csvfile with 1 row per sample
In the simplest case,
project_config.yaml is just a few lines of
yaml. Here’s a minimal example
metadata: output_dir: /path/to/output/folder sample_annotation: /path/to/sample_annotation.csv
If you’re not already familiar with
yaml, it’s a simple and widely used hierarchical markup language used to store key-value pairs; you can read more about yaml here.
output_dir key specifies where to save results. The
sample_annotation key points to the second key part of a PEP, a comma-separated value (
csv) file describing samples in the project. Here’s a small example of
"sample_name", "protocol", "file" "frog_1", "RNA-seq", "frog1.fq.gz" "frog_2", "RNA-seq", "frog2.fq.gz" "frog_3", "RNA-seq", "frog3.fq.gz" "frog_4", "RNA-seq", "frog4.fq.gz"
With those two simple files, you are ready to use the pepkit tools! With a single line of code, you could load this into R using pepr, into python using peppy, or run each sample through an arbitrary command-line pipeline using looper.
If you make a habit of describing all your projects like this, you’ll never parse another sample annotation sheet again. You’ll never write another pipeline submission loop.
This simple example presents a minimal functioning PEP. In practice, there are many advanced features of PEP structure. For instance, you can add additional sections to tailor your project for specific tools. But at its core, PEP is simple and generic; this way, you can start with the basics, and only add more complexity as you need it.