1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418
|
# Introduction
Bioinformatics projects often include non-standardised analyses, with results from custom
scripts or in-house packages. It can be frustrating to have a MultiQC report describing
results from 90% of your pipeline but missing the final key plot. To help with this,
MultiQC has a special _"custom content"_ module.
Custom content parsing is a little more restricted than standard modules. Specifically:
- Only one plot per section is possible
- Plot customisation is more limited
All plot types can be generated using custom content - see the
[test files](https://github.com/ewels/MultiQC_TestData/tree/master/data/custom_content)
for examples of how data should be structured.
> **Note**: Use the name `custom_content` to refer to this module within configuration
> settings that require a module name, such as [`module_order`](#order-of-modules) or
> [`run_modules`](#removing-modules-or-sections).
## Data from a released tool
If your data comes from a released bioinformatics tool, you shouldn't be using this
feature of MultiQC! Sure, you can probably get it to work, but it's better if a
fully-fledged core MultiQC module is written instead. That way, other users of MultiQC
can also benefit from results parsing.
Note that proper MultiQC modules are more robust and powerful than this custom-content
feature. You can also [write modules](http://multiqc.info/docs/#writing-new-modules)
in [MultiQC plugins](http://multiqc.info/docs/#multiqc-plugins) if they're not suitable for
general release.
## Images
As of MultiQC v1.7, you can import custom images into your MultiQC reports.
Simply add `_mqc` to the end of the filename for `.png`, `.jpg` or `.jpeg` files, for example:
`my_image_file_mqc.png` or `summmary_diagram.jpeg`.
Images will be embedded within the HTML file, so will be self contained.
Note that this means that it's very possible to make the HTML file very very large if abused!
The report section name and description will be automatically based on the filename.
Note that if you are using `sp:` to take in images with a custom filename you need to also set `ignore_images: false` in your config. For example:
```yaml
custom_data:
my_custom_content_image:
section_name: "My nice image"
sp:
my_custom_content_image:
fn: "*.png"
ignore_images: false
```
## MultiQC-specific data file
If you can choose exactly how your data output looks, then the easiest way to parse it
is to use a MultiQC-specific format. If the filename ends in `*_mqc.(yaml|yml|json|txt|csv|tsv|log|out|png|jpg|jpeg|html)`
then it will be found by any standard MultiQC installation with no additional customisation
required (v0.9 onwards).
These files contain configuration information specifying how the data should be parsed,
alongside the data. If you want to use YAML, this is an example of how it should look:
```yaml
id: "my_pca_section"
section_name: "PCA Analysis"
description: "This plot shows the first two components from a principal component analysis."
plot_type: "scatter"
pconfig:
id: "pca_scatter_plot"
title: "PCA Plot"
xlab: "PC1"
ylab: "PC2"
data:
sample_1: { x: 12, y: 14 }
sample_2: { x: 8, y: 6 }
sample_3: { x: 5, y: 11 }
sample_4: { x: 9, y: 12 }
```
The file format can also be JSON:
```json
{
"id": "custom_data_lineplot",
"section_name": "Custom JSON File",
"description": "This plot is a self-contained JSON file.",
"plot_type": "linegraph",
"pconfig": {
"id": "custom_data_linegraph",
"title": "Output from my JSON file",
"ylab": "Number of things",
"xDecimals": false
},
"data": {
"sample_1": { "1": 12, "2": 14, "3": 10, "4": 7, "5": 16 },
"sample_2": { "1": 9, "2": 11, "3": 15, "4": 18, "5": 21 }
}
}
```
Note that if you're using `plot_type: html` then `data` just takes a string, with no sample keys.
For maximum compatibility with other tools, you can also use comma-separated or tab-separated files.
Include commented header lines with plot configuration in YAML format:
```bash
# id: "Output from my script'
# section_name: 'Custom data file'
# description: 'This output is described in the file header. Any MultiQC installation will understand it without prior configuration.'
# format: 'tsv'
# plot_type: 'bargraph'
# pconfig:
# id: 'custom_bargraph_w_header'
# ylab: 'Number of things'
Category_1 374
Category_2 229
Category_3 39
Category_4 253
```
You can easily inject custom HTML snippets by ending the filename with `_mqc.html` - again the
embedded config works in a similar way, but with a HTML comment:
```html
<!--
id: 'custom-html'
section_name: 'Custom HTML'
description: 'This section is created using a custom HTML file'
-->
<p>Some custom HTML content here.</p>
```
If no configuration is given, MultiQC will do its best to guess how to visualise your data appropriately.
To see examples of typical file structures which are understood, see the
[test data](https://github.com/ewels/MultiQC_TestData/tree/master/data/custom_content/no_config)
used to develop this code. Something will be probably be shown, but it may produce unexpected results.
> **Note:** Check [Tricky extras](#tricky-extras) for certain caveats about formatting headers for custom
> `tsv` or `csv` files, particularly for the first column.
## Data as part of MultiQC config
If you are already using a MultiQC config file to add data to your report (for example,
[titles / introductory text](http://multiqc.info/docs/#customising-reports)), you can
give data within this file too. This can be in any MultiQC config file (for example,
passed on the command line with `-c my_yaml_file.yaml`). This is useful as you can
keep everything contained within a single file (including stuff unrelated to this
specific _custom content_ feature of MultiQC).
To be understood by MultiQC, the `custom_data` key must be found.
This must contain a section with a unique id, specific to your new report section.
Finally, the contents of this second dictionary will look the same as the above
stand-alone `YAML` files. For example:
```yaml
custom_data:
my_data_type:
id: "mqc_config_file_section"
section_name: "My Custom Section"
description: "This data comes from a single multiqc_config.yaml file"
plot_type: "bargraph"
pconfig:
id: "barplot_config_only"
title: "MultiQC Config Data Plot"
ylab: "Number of things"
data:
sample_a:
first_thing: 12
second_thing: 14
sample_b:
first_thing: 8
second_thing: 6
sample_c:
first_thing: 11
second_thing: 5
sample_d:
first_thing: 12
second_thing: 9
```
Or to add data to the General Statistics table:
```yaml
custom_data:
my_genstats:
plot_type: "generalstats"
pconfig:
- col_1:
max: 100
min: 0
scale: "RdYlGn"
suffix: "%"
- col_2:
min: 0
data:
sample_a:
col_1: 14.32
col_2: 1.2
sample_b:
col_1: 84.84
col_2: 1.9
```
> **Note:** Use a **list** of headers in `pconfig` (keys prepended with `-`) to specify the order
> of columns in the General Statistics table.
See the [general statistics docs](http://multiqc.info/docs/#step-3-adding-to-the-general-statistics-table)
for more information about configuring data for the General Statistics table.
## Separate configuration and data files
It's not always possible or desirable to include MultiQC configuration within a data file.
If this is the case, you can add to the MultiQC configuration to specify how input files
should be parsed.
As described in the above [_Data as part of MultiQC config_](#data-as-part-of-multiqc-config) section,
this configuration should be held within a section called `custom_data` with a section-specific id.
The only difference is that no `data` subsection is given and a search pattern for the given id must
be supplied.
Search patterns are added [as with any other module](http://multiqc.info/docs/#module-search-patterns).
Ensure that the search pattern key is the same as your `custom_data` section ID.
For example, a MultiQC config file could look as follows:
```yaml
# Other MultiQC config stuff here
custom_data:
example_files:
file_format: "tsv"
section_name: "Coverage Decay"
description: "This plot comes from files acommpanied by a mutliqc_config.yaml file for configuration"
plot_type: "linegraph"
pconfig:
id: "example_coverage_lineplot"
title: "Coverage Decay"
ylab: "X Coverage"
ymax: 100
ymin: 0
sp:
example_files:
fn: "example_files_*"
```
And work with the following data file:
`example_files_Sample_1.txt`:
```bash
0 98.22076066
1 97.96764159
2 97.78227175
3 97.61262195
# [...]
```
This kind of customisation should work with most Custom Content types.
For example, using an image called `some_science_mqc.jpeg` gives us a report section `some_science`,
which we can then add a nicer name and description to:
```yaml
custom_data:
some_science:
section_name: "Some real science"
description: "This description comes from multiqc_config.yaml and helps to annotate the Custom Content image."
```
As mentioned above - if no configuration is given, MultiQC will do its best to guess how to visualise
your data appropriately. To see examples of typical file structures which are understood, see the
[test data](https://github.com/ewels/MultiQC_TestData/tree/master/data/custom_content/no_config)
used to develop this code.
# Configuration
## Grouping sections and subsections
If you have multiple content types that you would like to group together with MultiQC sub-sections,
you can do so using the following keys:
```yaml
parent_id: custom_section
parent_name: "Some grouped data"
parent_description: "This parent section contains one or more sub-sections below it"
```
Any custom-content files that share the same `parent_id` will be grouped.
Note that some things, such as `parent_name` are taken from the first file that MultiQC finds
with this `parent_id`. So it's a good idea to specify this in every file.
`parent_description` and `extra` is taken from the first file where it is set.
> `parent_id` only works within Custom Content.
> It is not currently possible to add custom content output into a report section
> from a core MultiQC module.
## Order of sections
If you have multiple different Custom Content sections, their order will be random
and may vary between runs. To avoid this, you can specify an order in your MultiQC
config as follows:
```yaml
custom_content:
order:
- first_cc_section
- second_cc_section
```
Each section name should be the ID assigned to that section. You can explicitly set
this (see below), or the Custom Content module will automatically assign an ID.
To find out what your custom content section ID is, generate a report and click
the side navigation to your section. The browser URL should update and show something
that looks like this:
```txt
multiqc_report.html#my_cc_section
```
The section ID is the part after the `#` (`my_cc_section` in the above section).
Note that any Custom Content sections found that are _not_ specified in the config
will be placed at the top of the report.
## Section configuration
See below for how these config options can be specified (either within the data file
or in a MultiQC config file). All of these configuration parameters
are optional, and MultiQC will do its best to guess sensible defaults if they are
not specified.
All possible configuration keys and their default values are shown below:
```yaml
id: null # Unique ID for report section.
section_anchor: <id> # Used in report section #soft-links
section_name: <id> # Nice name used for the report section header
section_href: null # External URL for the data, to find more information
description: null # Introductory text to be printed under the section header
section_extra: null # Custom HTML to add after the section description
file_format: null # File format of the data (eg. csv / tsv)
plot_type:
null # The plot type to visualise the data with.
# generalstats | table | bargraph | linegraph | scatter | heatmap | beeswarm
pconfig: {} # Configuration for the plot.
```
> Data types `generalstats` and `beeswarm` are _only_ possible by setting the above
> configuration keys (these can't be guessed by data format).
Note that any _custom content_ data found with the same section `id` will be merged
into the same report section / plot. The other section configuration keys are merged
for each file, with identical keys overwriting what was previously parsed.
This approach means that it's possible to have a single file containing data for multiple
samples, but it's also possible to have one file per sample and still have all of them
summarised.
> If you're using `plot_type: 'generalstats'` then a report section will not be created and
> most of the configuration keys above are ignored.
## Plot configuration
Configuration of specific plots follows the same syntax as used when writing modules.
To find out more, please see the later docs. Specifically, the plot config docs for
[bar graphs](#bar-graphs),
[line graphs](#line-graphs),
[scatter plots](#scatter-plots),
[tables](#creating-a-table),
[beeswarm plots](#beeswarm-plots-dot-plots) and
[heatmaps](#heatmaps).
Wherever you see `pconfig`, any key can be used within the above syntax.
## Tricky extras
Because of the way this module works, there are a few specifics that can trip you up.
Most of these should probably be fixed one day. Feel free to complain on gitter or submit a pull request!
I'll try to keep a list here to help the wary...
### Differences between Tables and General Stats
Although they're both tables, note that general stats configures columns with a list
in the `pconfig` scope (see above example). Files that are just tables use `headers` instead.
### First columns in tables are special
The first column in every table is reserved for the sample name. As such, it shouldn't contain data.
All header configuration will be ignored for the first column. The only exception is name:
this can be tweaked using the somewhat tricky `col1_header` field in the `pconfig` scope (see table docs).
Alternatively, you can customise the column name by including a 'header row' in the first line of the `tsv`
or `csv` itself specifying the column names, with the first column with the name of your choice, and
subsequent columns including the key(s) defined in the header.
## Linting
MultiQC has been developed to be as forgiving as possible and will handle lots of
invalid or ignored configurations. This is useful for most users but can make life
difficult when getting MultiQC to work with a new custom content format.
To help with this, you can run with the `--lint` flag, which will give explicit
warnings about anything that is not optimally configured. For example:
```bash
multiqc --lint test_data
```
# Examples
Probably the best way to get to grips with Custom Content is to see some examples.
The MultiQC automated testing runs with a bunch of different files, and I try to add to
these all the time.
You can see these examples here: <https://github.com/ewels/MultiQC_TestData/tree/master/data/custom_content>
For example, to see a file which generates a table in a report by itself, you can
have a look at `embedded_config/table_headers_mqc.txt` ([link](https://github.com/ewels/MultiQC_TestData/blob/master/data/custom_content/embedded_config/table_headers_mqc.txt)).
|