1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161
|
# Quick guide to running ResFinder with Cromwell
### Disclaimer
Support is not offered for running Cromwell and no files in this directory is
guaranteed to work. These files were uploaded as inspiration. Please do not
report issues relating to this directory.
## Prepare input files
Two input files are needed:
1. input_data.tsv
2. input.json
Templates can be found in the ResFinder directory scripts/wdl.
### input_data.tsv
Tab separated file. Should contain columns in the following order:
1. Absolute path to fasta/fastq file 1
2. Absolute path to fastq file 2 (Can be empty, but must exist)
3. Species
4. Type of data, must be one of: assembly, paired
Each row should contain a single sample.
#### Species
If species cannot be provided put "other" (cases sensitive).
#### Type of data
* assembly: Fasta file containing contigs from a de novo assembly.
* paired: Couple of fastq files containing read data for foward and reverse
reads.
* single: **Not implemented** Read data from single-end sequencing.
#### Example
```
/home/projects/cge/apps/resfinder/resfinder/tests/data/test_isolate_01_1.fq /home/projects/cge/apps/resfinder/resfinder/tests/data/test_isolate_01_2.fq Escherichia coli paired
/home/projects/cge/apps/resfinder/resfinder/tests/data/test_isolate_05_1.fq /home/projects/cge/apps/resfinder/resfinder/tests/data/test_isolate_05_2.fq Escherichia coli paired
/home/projects/cge/apps/resfinder/resfinder/tests/data/test_isolate_09a_1.fq /home/projects/cge/apps/resfinder/resfinder/tests/data/test_isolate_09a_2.fq Escherichia coli paired
/home/projects/cge/apps/resfinder/resfinder/tests/data/test_isolate_09b_1.fq /home/projects/cge/apps/resfinder/resfinder/tests/data/test_isolate_09b_2.fq Escherichia coli paired
/home/projects/cge/apps/resfinder/resfinder/tests/data/test_isolate_01.fa Escherichia coli assembly
/home/projects/cge/apps/resfinder/resfinder/tests/data/test_isolate_02.fa Escherichia coli assembly
/home/projects/cge/apps/resfinder/resfinder/tests/data/test_isolate_03.fa Escherichia coli assembly
/home/projects/cge/apps/resfinder/resfinder/tests/data/test_isolate_05.fa Escherichia coli assembly
/home/projects/cge/apps/resfinder/resfinder/tests/data/test_isolate_09a.fa Escherichia coli assembly
/home/projects/cge/apps/resfinder/resfinder/tests/data/test_isolate_09b.fa Escherichia coli assembly
```
### input.json
JSON formatted file containing input and output information.
The file should consist of a single dict/hash/map with the following keys:
* Resistance.inputSamplesFile: Absolute path to input_data.tsv
* Resistance.outputDir: Absolute path to output directory.
* Resistance.geneCov: Fraction of gene coverage needed for resistance gene hits.
* Resistance.geneID: Fraction of nucleotide identity needed in resistance gene
hits.
* Resistance.pointCov: Fraction of gene coverage needed for point mutation gene
hits.
* Resistance.pointID: Fraction of nucleotide identity needed in point mutation gene
hits.
If running on Computerome and are using the input.json template, you probably
won't need to change the following:
* Resistance.python: Path to python3 interpreter.
* Resistance.kma: Path to kma application.
* Resistance.blastn: Path to blastn application.
* Resistance.resfinder: Path to run_resfinder.py.
* Resistance.resDB: Path to ResFinder database.
* Resistance.pointDB: Path to PointFinder database
The values should be the absolute path to the input_data.tsv and the desired
output directory, respectively.
#### Example
```json
{
"Resistance.inputSamplesFile": "/home/projects/cge/people/rkmo/delme/res_input.tsv",
"Resistance.outputDir": "/home/projects/cge/people/rkmo/delme/",
"Resistance.geneCov": 0.6,
"Resistance.geneID": 0.8,
"Resistance.pointCov": 0.6,
"Resistance.pointID": 0.8,
"Resistance.python": "python3",
"Resistance.kma": "/home/projects/cge/apps/resfinder/resfinder/cge/kma/kma",
"Resistance.blastn": "blastn",
"Resistance.resfinder": "/home/projects/cge/apps/resfinder/resfinder/run_resfinder.py",
"Resistance.resDB": "/home/projects/cge/apps/resfinder/resfinder/db_resfinder",
"Resistance.pointDB": "/home/projects/cge/apps/resfinder/resfinder/db_pointfinder"
}
```
## Run Cromwell
Cromwell needs JAVA to run. Load a valid JAVA module, for example:
```bash
module load openjdk/16
```
A Cromwell call looks like this:
```bash
java -Dconfig.file=<CONF> -jar <CROMWELL> run <WDL> --inputs <JSON>
```
### <CONF> and <CROMWELL>
Computerome specific.
* <CONF>: Path to Computerome configuration for Cromwell. You need to change
this if you are not running Cromwell on Computerome. Computerome path:
/home/projects/cge/apps/resfinder/resfinder/scripts/wdl/computerome.conf
* <CROMWELL>: Path to Cronwell jar file in Computerome:
/services/tools/cromwell/50/cromwell-50.jar
### <WDL>
ResFinder specific.
* <WDL>: Path to wdl file that specifies how to run ResFinder. Path to
resfinder.wdl on Computerome:
/home/projects/cge/apps/resfinder/resfinder/scripts/wdl/resfinder.wdl
### <JSON>
User/Run specific
Path to input.json. Specifies all the parameters for ResFinder (See above).
### Run example
```bash
java -Dconfig.file=/home/projects/cge/apps/resfinder/resfinder/scripts/wdl/computerome.conf -jar /services/tools/cromwell/50/cromwell-50.jar run /home/projects/cge/apps/resfinder/resfinder/scripts/wdl/resfinder.wdl --inputs /home/projects/cge/apps/resfinder/resfinder/scripts/wdl/input.json
```
### Post run
All ResFinder output will be located in the provided output directory.
In the directory where you execute Cromwell the following two directories will
also be created:
* cromwell-executions
* cromwell-workflow-logs
They contain logging information and cached results.
|