File: strand-conventions.txt

package info (click to toggle)
kineticstools 0.6.1%2Bgit20180425.27a1878-2
  • links: PTS, VCS
  • area: main
  • in suites: buster
  • size: 126,524 kB
  • sloc: python: 4,047; makefile: 187; ansic: 129; sh: 38; xml: 19
file content (33 lines) | stat: -rw-r--r-- 1,676 bytes parent folder | download | duplicates (5)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33

    Note on IPD classifier training conventions
    ===========================================

    Getting the strand and context correct is notoriously confusing.
    In the output of the basemods package (kineticsTools), strand refers the
    _template_ strand, because that's where the modification that we're detecting
    are.  The PacBio basecaller and dye/base mapping always work in the product strand.
    
    The IPD model itself is trained to take sequences in the template strand and make predictions about
    the IPD that will observed when synthesizing the product strand.  In KEC, we will be working entirely 
    with the product strand sequence. We set up the Kinetic Model code to accept product strand sequences and give
    product strand predictions.

    Here's a reference for the context windows and strands:

    Case 1: the + strand is the product strand:
	- The input sequence to pass to the classfier is the - strand sequence, indexed according to the top numbers
	- The IPD GBM model prediction is for the G incorporation in the product strand synthesis

    Case 2: the - strand is the product strand:
	- The input sequence to pass to the classfier is the + strand sequence, indexed according to the bottom numbers
	- The IPD GBM model prediction for the C incorporation in the product strand synthesis


    strand                  sequence                 pol motion
                          14        4   0
		          |         |   |
      -       3'-xxxxxxxxxNNNNNNNNNNCNNNNxxxxx-5'      <-
      +       5'-xxxxxxxxxNNNNNNNNNNGNNNNxxxxx-3'      ->
                          |         |   |
                          0         10  14