1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58
|
# Structure of the code
* mirtop/bam
* __bam.py__
* `read_bam`: reads BAM files with pysamtools and store in a key - value object
* __filter.py__
* `tune`: if option `--clean` is on, filter according generic rules
* `clean_hits`: get the top hits
* mirtop/gff
* __init.py__ wraps the conversion process to GFF3
* __body.py__ `create` will create the line according GFF format established.
* `read_gff_line`: Inside a for loop to read line of the file. It'll return and structure key:value dictionary for each column.
* __header.py__ generate header and read header section.
* __check.py__ checks header and single lines to be valid according GFF format (NOT IMPLEMENTED)
* __stats.py__ GFF stats counting number of isomiR, their total and average expression
* __query.py__ accept SQlite queries after option -q ""
* __convert.py__
* `create_counts` table of counts
* allow filtering by attribute
* allow collapse by miRNA/isomiR type
* __filter.py__, parse from query (NOT IMPLEMENTED)
* mirtop/mirna
* __fasta.py__:
* `read_precursor` fasta file: key - value
* __realign.py__:
* `hits`: class that defines hits
* `isomir`: class that defines each sequence
* `cigar_correction`: function that use CIGAR to make sequence to miRNA alignemt
* `read_id` and `make_id`: shorter ID for sequences
* `make_cigar`: giving an alignment return the CIGAR of it
* `reverse_complement`: return the reverse complement of a sequence
* `align`: uses biopython to align two sequences of the same size
* `expand_cigar`: from a 12M to MMMMMMMMMMMM
* `cigar2snp`: from CIGAR code to list of changes with position and reference and target nts
* __mapper.py__:
* `read_gtf` file: map genomic miRNA position to precursos position, then it needs genomic position for the miRNA and the precursor. Return would be like {mirna: [start, end]}
* __annotate.py__:
* `annotate`: read isomiRs and populate all attributes related to isomiRs
* mirtop/importer:
* seqbuster.py
* prost.py
* srnabench.py
* isomirsea.py
* mirtop/exporter:
* isomirs.py: export file to match [isomiRs BioC package](https://github.com/lpantano/isomiRs).
* data/examples/
* check gff files: example of correct, invalid, warning GFF files
* check BAM file
* check mapping from genome position to precursor position, example of +/- strand. Using `mirtop/mirna/map.read_gtf`.
* check clean option: sequence mapping to multiple precursors/mirna, get the best score. Using `mirtop/bam/filter.clean_hits`.
To add new sub-commands, modify the following:
* mirtop/lib/parse.py
* query: TODO
* transform: TODO
* create: TODO
* check: TODO
|