Pipeline Input Configuration
Configuration of input parameters of the full pipeline mode¶
Paired End Input¶
The input should be a path to a tsv file containing a sample id, as well as a path to the left and right read.
Example:
input:
paired:
path: "test_data/fullPipeline/reads_split.tsv"
Nanopore Input¶
For Nanopore data a seperate input file should be specified.
input:
ont:
path: "test_data/fullPipeline/ont.tsv"
Generic SRA¶
The toolkit is able to fetch fastq files based on SRA run accession ids from the NCBI or from a mirror based on S3:
input:
SRA:
pattern:
ont: ".+[^(_1|_2)].+$"
illumina: ".+(_1|_2).+$"
S3:
path: test_data/SRA/samples.tsv
bucket: "s3://ftp.era.ebi.ac.uk"
prefix: "/vol1/fastq/"
watch: false
patternONT: ".+[^(_1|_2)].+$"
patternIllumina: ".+(_1|_2).+$"
where:
* path
is the path to a file containing a column with ACCESSION
as header. The ACCESSION
column contains either SRA run or study accessions.
-
bucket
is the S3 Bucket hosting the data. -
prefix
is the path to the actual SRA datasets. -
watch
if true, the file specified with thepath
attribute is watched and every time a new SRA run id is appended, the pipeline is triggered. The pipeline will never finish in this mode. Please note that watch currently only works if only one input type is specified (e.g "ont" or "paired" ...) -
patternONT
andpatternIllumina
are patterns that are applied on the specified mirror in order to select the correct input files.
NCBI SRA¶
With the following mode SRA datasets can directly be fetched from SRA.
input:
SRA:
pattern:
ont: ".+[^(_1|_2)].+$"
illumina: ".+(_1|_2).+$"
NCBI:
path: test_data/SRA/samples.tsv