1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227
|
---
name: Picard
url: http://broadinstitute.github.io/picard/
description: >
Picard is a set of Java command line tools for manipulating high-throughput
sequencing data.
---
The Picard module parses results generated by
[Picard](http://broadinstitute.github.io/picard/),
a set of Java command line tools for manipulating high-throughput
sequencing data.
Supported commands:
- `AlignmentSummaryMetrics`
- `BaseDistributionByCycle`
- `CollectIlluminaBasecallingMetrics`
- `CollectIlluminaLaneMetrics`
- `CrosscheckFingerprints`
- `ExtractIlluminaBarcodes`
- `GcBiasMetrics`
- `HsMetrics`
- `InsertSizeMetrics`
- `MarkDuplicates`
- `MarkIlluminaAdapters`
- `OxoGMetrics`
- `QualityByCycleMetrics`
- `QualityScoreDistributionMetrics`
- `QualityYieldMetrics`
- `RnaSeqMetrics`
- `RrbsSummaryMetrics`
- `ValidateSamFile`
- `VariantCallingMetrics`
- `WgsMetrics`
### Coverage Levels
It's possible to customise the HsMetrics _"Target Bases 30X"_ coverage and
WgsMetrics _"Fraction of Bases over 30X"_ that are
shown in the general statistics table. This must correspond to field names in the
picard report, such as `PCT_TARGET_BASES_2X` / `PCT_10X`. Any numbers not found in the
reports will be ignored.
The coverage levels available for HsMetrics are
[typically](http://broadinstitute.github.io/picard/picard-metric-definitions.html#HsMetrics)
1, 2, 10, 20, 30, 40, 50 and 100X.
The coverage levels available for WgsMetrics are
[typically](http://broadinstitute.github.io/picard/picard-metric-definitions.html#CollectWgsMetrics.WgsMetrics)
1, 5, 10, 15, 20, 25, 30, 40, 50, 60, 70, 80, 90 and 100X.
To customise this, add the following to your MultiQC config:
```yaml
picard_config:
general_stats_target_coverage:
- 10
- 50
```
### CrosscheckFingerprints
In addition to adding a table of results, a `Crosschecks All Expected` column will be added to the General Statistics. If all comparisons for a sample were `Expected`, then the value of the field will be `True` and green. If not it will be `False` and Red.
You can customize the columns show in the CrosscheckFingerprints table with the config keys `CrosscheckFingerprints_table_cols` and `CrosscheckFingerprints_table_cols_hidden`. For example:
```yaml
picard_config:
CrosscheckFingerprints_table_cols:
- RESULT
- LOD_SCORE
CrosscheckFingerprints_table_cols_hidden:
- LEFT_LANE
- RIGHT_LANE
```
The column names will be normalized, ex `LOD_SCORE -> Lod score`.
Note that if `CALCULATE_TUMOR_AWARE_RESULTS` was set to true on the CLI for any of the CrosscheckFingerprints result files, then the `LOD_SCORE_TUMOR_NORMAL` and `LOD_SCORE_NORMAL_TUMOR` will be displayed.
### HsMetrics
Note that the _Target Region Coverage_ plot is generated using the `PCT_TARGET_BASES_` table columns from the HsMetrics output (not immediately obvious when looking at the log files).
You can customize the columns shown in the HsMetrics table with the config keys `HsMetrics_table_cols` and `HsMetrics_table_cols_hidden`. For example:
```yaml
picard_config:
HsMetrics_table_cols:
- NEAR_BAIT_BASES
- OFF_BAIT_BASES
- ON_BAIT_BASES
HsMetrics_table_cols_hidden:
- MAX_TARGET_COVERAGE
- MEAN_BAIT_COVERAGE
- MEAN_TARGET_COVERAGE
```
Only values listed in `HsMetrics_table_cols` will be included in the table.
Anything listed in `HsMetrics_table_cols_hidden` will be hidden by default.
A similar config is available for customising the HsMetrics columns in the General Stats table:
```yaml
picard_config:
HsMetrics_genstats_table_cols:
- NEAR_BAIT_BASES
HsMetrics_genstats_table_cols_hidden:
- MAX_TARGET_COVERAGE
```
### InsertSizeMetrics
By default, the insert size plot is smoothed to contain a maximum of 500 data points per sample.
This is to prevent the MultiQC report from being very large with big datasets.
If you would like to customise this value to get a better resolution you can set the following
MultiQC config values, with the new maximum number of points:
```yaml
picard_config:
insertsize_smooth_points: 10000
```
The plotted maximum insert size can be set with:
```yaml
picard_config:
insertsize_xmax: 10000
```
### MarkDuplicates
If a `BAM` file contains multiple read groups, Picard MarkDuplicates generates a report
with multiple metric lines, one for each "library".
By default, MultiQC will sum the values for every library it finds and recompute the
`PERCENT_DUPLICATION` and `ESTIMATED_LIBRARY_SIZE` fields, giving a single set of results
for each `BAM` file.
If instead you would prefer each library to be treated as a separate sample, you can do so
by setting the following MultiQC config:
```yaml
picard_config:
markdups_merge_multiple_libraries: False
```
This prevents the merge and recalculation and appends the library name to the sample name.
This behaviour is present in MultiQC since version 1.9. Before this, only the metrics from the
first library were taken and all others were ignored.
### ValidateSamFile Search Pattern
Generally, Picard adds identifiable content to the output of function calls. This is not the case for ValidateSamFile. In order to identify logs the MultiQC Picard submodule `ValidateSamFile` will search for filenames that contain 'validatesamfile' or 'ValidateSamFile'. One can customise the used search pattern by overwriting the `picard/sam_file_validation` pattern in your MultiQC config. For example:
```yaml
sp:
picard/sam_file_validation:
fn: "*[Vv]alidate[Ss]am[Ff]ile*"
```
### WgsMetrics
The coverage histogram from Picard typically shows a normal distribution with a very long tail.
To make the plot easier to view, by default the module plots the line up to 99% of the data.
This typically removes the long tail and gives a more useful graph.
If you would like, you can set a specific value for the maximum coverage to cut the graph at.
By setting this to a very large value, you will disable the cutting (the graph will automatically
limit the axis at the maximum data point). You can do this as follows:
```yaml
picard_config:
wgsmetrics_histogram_max_cov: 500
```
If running with very high coverage samples or using the Picard `CAP_COVERAGE` option,
the coverage histogram can become very large indeed. For eaxmple, if reporting coverages of 1 million,
it will have 1 million data points per sample. That can crash the browser and take a long time to run.
There are two customisation MultiQC options to help with this.
Firstly, MultiQC will automatically "smooth" the histogram to a maximum of `1000` data points by binning.
This should stop the browser from crashing. You can tweak how many bins are used with the following:
```yaml
picard_config:
wgsmetrics_histogram_smooth: 1000
```
Change `1000` to whatever number you want. If you don't want any smoothing, set it to a very high number
bigger than the number of data points you have.
Secondly, if you would prefer to instead simply skip the histogram, you can set the following:
```yaml
picard_config:
wgsmetrics_skip_histogram: True
```
This will omit that section from the report entirely, and also skip parsing the histogram data.
By specifying this option you may speed up the run time for MultiQC with these types of files
significantly.
### Sample names
MultiQC supports outputs from multiple runs of a Picard tool merged together into one
file. In order to handle multiple sample data in on file correctly, MultiQC needed
to take the sample name elsewhere rather than the file name. For this reason, MultiQC
attempts to parse the command line recorded in the output header. For example, an
output from the `GcBias` tool contains a header line like this:
```
# net.sf.picard.analysis.CollectGcBiasMetrics REFERENCE_SEQUENCE=/reference/genome.fa
INPUT=/alignments/P0001_101/P0001_101.bam OUTPUT=P0001_101.collectGcBias.txt ...
```
MultiQC would extract the BAM file name that goes after `INPUT=` and take `P0001_101`
as a sample name. If MultiQC fails to parse the command line for any reason, it will
fall back to using the file name. It is also possible to force using the file names
as sample names by enabling the following config option:
```yaml
picard_config:
s_name_filenames: true
```
|