# Fast5_to_seq_summary API Usage

## Running Jupyter notebook

If you want to run pycoQC interactively in Jupyter you need to install Jupyter manually. If you installed pycoQC in a virtual environment then install Jupyter in the same virtual environment.

```bash
pip3 install notebook
```

Launch the notebook in a shell terminal

```bash
jupyter notebook
```

If it does not auto-start, open the following URL in you favorite web browser http://localhost:8888/tree

From Jupyter homepage you can navigate to the directory you want to work in and create a new Python3 Notebook.

## Imports

In [2]:
# Run cell with Ctrl + Enter

# Import main pycoQC module
from pycoQC.Fast5_to_seq_summary import Fast5_to_seq_summary

# Import helper functions from pycoQC
from pycoQC.common import jhelp, head

## Running Fast5_to_seq_summary

In [3]:
jhelp(Fast5_to_seq_summary)

**Fast5_to_seq_summary** (fast5_dir, seq_summary_fn, max_fast5, threads, basecall_id, verbose_level, include_path, fields)

Create a summary file akin the one generated by Albacore or Guppy from a directory containing multiple fast5 files. The script will attempt to extract all the required fields but will not raise an error if not found.

---

* **fast5_dir** (required) [str]

Directory containing fast5 files. Can contain multiple subdirectories

* **seq_summary_fn** (required) [str]

path of the summary sequencing file where to write the data extracted from the fast5 files

* **max_fast5** (default: 0) [int]

Maximum number of file to try to parse. 0 to deactivate

* **threads** (default: 4) [int]

Total number of threads to use. 1 thread is used for the reader and 1 for the writer. Minimum 3 (default = 4)

* **basecall_id** (default: 0) [int]

id of the basecalling group. By default leave to 0, but if you perfome multiple basecalling on the same fast5 files, this can be used to indicate the corresponding group (1, 2 ...)

* **verbose_level** (default: 0) [int]

Level of verbosity, from 2 (Chatty) to 0 (Nothing)

* **include_path** (default: False) [bool]

If True the absolute path to the corresponding file is added in an extra column

* **fields** (default: ['read_id', 'run_id', 'channel', 'start_time', 'sequence_length_template', 'mean_qscore_template', 'calibration_strand_genome_template', 'barcode_arrangement']) [list]

list of field names corresponding to attributes to try to fetch from the fast5 files. List a valid field names: mean_qscore_template, sequence_length_template, called_events, skip_prob, stay_prob, step_prob, strand_score, read_id, start_time, duration, start_mux, read_number, channel, channel_digitisation, channel_offset, channel_range, channel_sampling, run_id, sample_id, device_id, protocol_run, flow_cell, calibration_strand, calibration_strand, calibration_strand, calibration_strand, barcode_arrangement, barcode_full, barcode_score



### Basic usage

This minimal usage creates a minimal file compatible with pycoQC 

In [4]:
Fast5_to_seq_summary (
    fast5_dir="./data/",
    seq_summary_fn="./results/summary_sequencing.tsv",
    verbose_level=1)

head ("./results/summary_sequencing.tsv")

Check input data and options
Start processing fast5 files
22 reads [00:00, 391.50 reads/s]
Overall counts 	valid files: 22

fields found 	read_id: 22
	run_id: 22
	channel: 22
	start_time: 22
	sequence_length_template: 22
	mean_qscore_template: 22
	calibration_strand_genome_template: 22

fields not found 	barcode_arrangement: 22

Total reads: 22 / Average speed: 161.12 reads/s



read_id                              run_id                                   channel start_time sequence_length_template mean_qscore_template calibration_strand_genome_template 
5b7fadd0-c646-4c7b-9800-66ee658a5ca8 40ebe55356ada6c830fa793745ef4c498d896c73 150     37         468                      7.608                filtered_out                       
e6a8e4d0-7b3c-471a-be26-fa7857d12663 40ebe55356ada6c830fa793745ef4c498d896c73 318     15         392                      8.304                filtered_out                       
f8325de9-a77e-4616-a4a8-69ecf32e1688 40ebe55356ada6c830fa793745ef4c498d896c73 354     16         568                      8.206                filtered_out                       
2c32553e-62c6-4c7a-bf05-249771364f04 40ebe55356ada6c830fa793745ef4c498d896c73 237     11         1151                     8.544                filtered_out                       
3e81c32a-f2ee-4719-a88d-e0affe93d26f 40ebe55356ada6c830fa793745ef4c498d896c73 348     24         1137    

### Multi-threading support

In [5]:
Fast5_to_seq_summary (
    fast5_dir="./data/",
    seq_summary_fn="./results/summary_sequencing.tsv",
    verbose_level=1,
    threads=10)

head ("./results/summary_sequencing.tsv")

Check input data and options
Start processing fast5 files
22 reads [00:00, 13120.25 reads/s]
Overall counts 	valid files: 22

fields found 	read_id: 22
	run_id: 22
	channel: 22
	start_time: 22
	sequence_length_template: 22
	mean_qscore_template: 22
	calibration_strand_genome_template: 22

fields not found 	barcode_arrangement: 22

Total reads: 22 / Average speed: 385.38 reads/s



read_id                              run_id                                   channel start_time sequence_length_template mean_qscore_template calibration_strand_genome_template 
2c32553e-62c6-4c7a-bf05-249771364f04 40ebe55356ada6c830fa793745ef4c498d896c73 237     11         1151                     8.544                filtered_out                       
5b7fadd0-c646-4c7b-9800-66ee658a5ca8 40ebe55356ada6c830fa793745ef4c498d896c73 150     37         468                      7.608                filtered_out                       
f8325de9-a77e-4616-a4a8-69ecf32e1688 40ebe55356ada6c830fa793745ef4c498d896c73 354     16         568                      8.206                filtered_out                       
151757ea-53de-44b1-b86e-f823511af02a 40ebe55356ada6c830fa793745ef4c498d896c73 191     13         568                      8.23                 filtered_out                       
97205d42-93ac-4c99-af78-e553f7d1ff83 40ebe55356ada6c830fa793745ef4c498d896c73 343     26         1584    

### Customize fields of the summary file

In [6]:
Fast5_to_seq_summary (
    fast5_dir="./data/",
    seq_summary_fn="./results/custom_summary_sequencing.tsv",
    threads=6,
    verbose_level=1,
    fields=["mean_qscore_template", "called_events", "duration", "strand_score"])

head ("./results/custom_summary_sequencing.tsv")

Check input data and options
Start processing fast5 files
22 reads [00:00, 20876.63 reads/s]
Overall counts 	valid files: 22

fields found 	mean_qscore_template: 22
	called_events: 22
	duration: 22
	strand_score: 22

fields not found 
Total reads: 22 / Average speed: 386.96 reads/s



mean_qscore_template called_events duration strand_score 
8.544                3740          56107    -0.0003      
7.608                1615          24233    -0.0007      
8.234                1827          27409    -0.0011      
8.206                1649          24747    -0.0009      
8.124                2978          44675    -0.0005      
8.304                1547          23218    -0.0008      
8.325                3846          57697    -0.0004      
8.219                2080          31208    -0.0011      
8.23                 2778          51387    -0.0007      



### Add file path

In [7]:
Fast5_to_seq_summary (
    fast5_dir="./data/",
    seq_summary_fn="./results/fn_summary_sequencing.tsv",
    threads=6,
    verbose_level=1,
    include_path=True)

head ("./results/fn_summary_sequencing.tsv")

Check input data and options
Start processing fast5 files
22 reads [00:00, 15978.30 reads/s]
Overall counts 	valid files: 22

fields found 	read_id: 22
	run_id: 22
	channel: 22
	start_time: 22
	sequence_length_template: 22
	mean_qscore_template: 22
	calibration_strand_genome_template: 22

fields not found 	barcode_arrangement: 22

Total reads: 22 / Average speed: 366.98 reads/s



read_id                              run_id                                   channel start_time sequence_length_template mean_qscore_template calibration_strand_genome_template path                                                                                                                                                
5b7fadd0-c646-4c7b-9800-66ee658a5ca8 40ebe55356ada6c830fa793745ef4c498d896c73 150     37         468                      7.608                filtered_out                       /home/aleg/Programming/Packages/pycoQC/docs/Fast5_to_seq_summary/data/20180625_FAH77625_MN23126_sequencing_run_S1_57529_read_10_ch_150_strand.fast5 
2c32553e-62c6-4c7a-bf05-249771364f04 40ebe55356ada6c830fa793745ef4c498d896c73 237     11         1151                     8.544                filtered_out                       /home/aleg/Programming/Packages/pycoQC/docs/Fast5_to_seq_summary/data/20180625_FAH77625_MN23126_sequencing_run_S1_57529_read_10_ch_237_strand.fast5 
e6a8e4d0-7b3c-471a-