# pycoQC CLI Usage

PycoQC CLI can generate a beautiful HTML formatted report containing interactive D3.js plots. On top of it, the CLI can also dump summary information in a JSON formated file allowing easy parsing with third party tools.

The report is dynamically generated depending on the information available in the summary file.

## CLI Usage

### Activate virtual environment

In [2]:
# Using conda here but can also be done with other virtenv managers 
conda activate pycoQC

(pycoQC) (pycoQC) 

: 1

### Getting help

In [3]:
pycoQC -h

usage: pycoQC [-h] [--version]
              [--summary_file [SUMMARY_FILE [SUMMARY_FILE ...]]]
              [--barcode_file [BARCODE_FILE [BARCODE_FILE ...]]]
              [--bam_file [BAM_FILE [BAM_FILE ...]]]
              [--html_outfile HTML_OUTFILE] [--json_outfile JSON_OUTFILE]
              [--min_pass_qual MIN_PASS_QUAL] [--min_pass_len MIN_PASS_LEN]
              [--filter_calibration] [--filter_duplicated]
              [--min_barcode_percent MIN_BARCODE_PERCENT]
              [--report_title REPORT_TITLE] [--template_file TEMPLATE_FILE]
              [--config_file CONFIG_FILE] [--skip_coverage_plot]
              [--sample SAMPLE] [--default_config] [-v | -q]

pycoQC computes metrics and generates interactive QC plots from the sequencing summary
report generated by Oxford Nanopore technologies basecallers

* Minimal usage
    pycoQC -f sequencing_summary.txt -o pycoQC_output.html
* Including Guppy barcoding file + html output + json output
    pycoQC -f sequencing_summar

: 1

### Usage examples

#### Basic usage  (quiet mode)

In [4]:
pycoQC \
    -f ./data/Albacore-1.2.1_basecall-1D-DNA_sequencing_summary.txt.gz \
    -o ./results/Albacore-1.2.1_basecall-1D-DNA.html \
    --quiet

Checking arguments values
Check input data files
Parse data files
Merge data
Cleaning data
Loading plotting interface
(pycoQC) 

: 1

[HTML OUTPUT](https://a-slide.github.io/pycoQC/pycoQC/results/Albacore-1.2.1_basecall-1D-DNA.html)

#### JSON data output on top of the html report

A json report can be generated on top (or instead) of the html report

It contains a summarized version of the data collected by pycoQC in a structured and easy to parse format

In [5]:
pycoQC \
    -f ./data/Guppy-2.1.3_basecall-1D-RNA_sequencing_summary.txt.gz \
    -o ./results/Guppy-2.1.3_basecall-1D_RNA.html \
    -j ./results/Guppy-2.1.3_basecall-1D_RNA.json \
    --quiet

Checking arguments values
Check input data files
Parse data files
Merge data
Cleaning data
Loading plotting interface
(pycoQC) 

: 1

[HTML OUTPUT](https://a-slide.github.io/pycoQC/pycoQC/results/Guppy-2.1.3_basecall-1D_RNA.html)

[JSON OUTPUT](https://a-slide.github.io/pycoQC/pycoQC/results/Guppy-2.1.3_basecall-1D_RNA.json)

#### Including guppy barcoding information

In [None]:
pycoQC \
    -f ./data/Guppy-2.1.3_basecall-1D-DNA_sequencing_summary.txt.gz \
    -b ./data/Guppy-2.1.3_basecall-1D_DNA_barcoding_summary.txt.gz \
    -o ./results/Guppy-2.1.3_basecall-1D_DNA_barcode.html \
    --quiet

[HTML OUTPUT](https://a-slide.github.io/pycoQC/pycoQC/results/Guppy-2.1.3_basecall-1D_DNA_barcode.html)

#### Matching multiple files with a regex and add a title to report

In [None]:
pycoQC \
    -f ./data/Albacore*RNA* \
    -o ./results/Albacore_all_RNA.html \
    --report_title "All RNA runs" \
    --quiet

[HTML OUTPUT](https://a-slide.github.io/pycoQC/pycoQC/results/Albacore_all_RNA.html)

#### Tweak filtering parameters

* Define reads with a quality higher than 8 and length higher than 200 bases as "pass" 
* Discard reads aligned on the calibration standard
* Unset value of any barcode found in less than 10% of the reads

In [6]:
pycoQC \
    -f ./data/Albacore-2.1.10_basecall-1D-DNA_sequencing_summary.txt.gz\
    -o ./results/Albacore-2.1.10_basecall-1D-DNA.html \
    --min_pass_qual 8 \
    --min_pass_len 200 \
    --filter_calibration \
    --min_barcode_percent 10 \
    --quiet

Checking arguments values
Check input data files
Parse data files
Merge data
Cleaning data
Loading plotting interface
(pycoQC) 

: 1

[HTML OUTPUT](https://a-slide.github.io/pycoQC/pycoQC/results/Albacore-2.1.10_basecall-1D-DNA.html)

#### Including Alignments information for a Bam file

In [7]:
pycoQC \
    -f ./large_data/sample_1_sequencing_summary.txt \
    -a ./large_data/sample_1.bam \
    -o ./results/Guppy-2.3_basecall-1D_alignment-DNA.html \
    -j ./results/Guppy-2.3_basecall-1D_alignment-DNA.json \
    --quiet

Checking arguments values
Check input data files
Parse data files
Merge data
Cleaning data
Loading plotting interface
(pycoQC) 

: 1

[HTML OUTPUT](https://a-slide.github.io/pycoQC/pycoQC/results/Guppy-2.3_basecall-1D_alignment-DNA.html)

[JSON OUTPUT](https://a-slide.github.io/pycoQC/pycoQC/results/Guppy-2.3_basecall-1D_alignment-DNA.json)

#### Advanced configuration with custon json file

Although we recommend to stick to the default parameters, a json formatted configuration file can be provided to tweak the plots. A default configuration file can be generated using:

In [11]:
pycoQC --default_config

{
  "run_summary": {
    "plot_title": "General run summary"
  },
  "basecall_summary": {
    "plot_title": "Basecall summary"
  },
  "alignment_summary": {
    "plot_title": "Alignment summary"
  },
  "read_len_1D": {
    "plot_title": "Basecalled reads length",
    "color": "lightsteelblue",
    "nbins": 200,
    "smooth_sigma": 2
  },
  "align_len_1D": {
    "plot_title": "Aligned reads length",
    "color": "mediumseagreen",
    "nbins": 200,
    "smooth_sigma": 2
  },
  "read_qual_1D": {
    "plot_title": "Basecalled reads PHRED quality",
    "color": "salmon",
    "nbins": 200,
    "smooth_sigma": 2
  },
  "identity_freq_1D": {
    "plot_title": "Aligned reads identity",
    "color": "sandybrown",
    "nbins": 200,
    "smooth_sigma": 2
  },
  "read_len_read_qual_2D": {
    "plot_title": "Basecalled reads length vs reads PHRED quality",
    "x_nbins": 200,
    "y_nbins": 100,
    "smooth_sigma": 2
  },
  "read_len_align_len_2D": {
    "plot_title": "Basecalled reads length vs ali

: 1

To save and edit it redirect the std output to a file and make your changes using your favorite text editor.

To remove a plot from the report, just remove it (or comment it) from the configuration file

The configuration file accept all the arguments of the target plotting functions. For more information refer to the API documentation

In [37]:
pycoQC --default_config > data/pycoQC_config.json

(pycoQC) 

: 1

Run pycoQC with `--config` option

In [12]:
pycoQC \
    -f ./data/Albacore-1.7.0_basecall-1D-DNA_sequencing_summary.txt.gz\
    -o ./results/Albacore-1.7.0_basecall-1D-DNA.html \
    --config ./data/pycoQC_config.json \
    --quiet

Checking arguments values
Check input data files
Parse data files
Merge data
Cleaning data
Loading plotting interface
(pycoQC) 

: 1

[HTML OUTPUT](https://a-slide.github.io/pycoQC/pycoQC/results/Albacore-1.7.0_basecall-1D-DNA.html)