File: outputs.rst

package info (click to toggle)
python-seqcluster 1.2.7%2Bds-1
  • links: PTS, VCS
  • area: contrib
  • in suites: bullseye
  • size: 113,592 kB
  • sloc: python: 5,327; makefile: 184; sh: 122; javascript: 55
file content (40 lines) | stat: -rw-r--r-- 2,465 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
.. _outputs:


***************
Outputs
***************

seqcluster
==========

* ``counts.tsv``: count matrix that can be input of downstream analyses. `nloci` will be 0 always that the meta-cluster has been resolved successfully. For instance, it can happen that you got sequences you have a bunch of sequences mapping to hundreds of different places on the genome, then seqcluster doesn’t resolve that, and put everything under the larger region covered by those sequences. So, mainly, 0 all are good rows. The `ann` column is just where the meta-clusters overlap with. It can happen that one name appears many times if different locations of the meta-cluster map to different copies of that feature. OR if the annotation file used had multiple lines for that. 
* ``read_stats.tsv``: number of reads for each sample after each step in the analysis. Meant to give a hint if we lose a lot of information or not.
* ``size_counts.tsv``: size distribution of the small RNA by annotation group. (position, reads, cluster)
* ``seqcluster.json``: json file containing all information. This file is used as the input of the report suit.
* ``log/run.log``: all messages at debug level
* ``log/trace.log``: to keep trace of algorithm decisions


Report
======

Beside the static HTML report that you can get using ``report`` `subcommand <http://seqcluster.readthedocs.org/getting_started.html#report>`_, you can download `this <https://github.com/lpantano/seqclusterViz/archive/master.zip>`_ HTML. (watch the repository to get notifications of new releases.)

* Go inside ``seqclusterViz`` folder
* Open ``reader.html``
* Upload the ``seqcluster.db`` file generated by ``report`` subcommand.
* Start browsing your data!

Meaning of different sections:

* Top-left table shows list of meta-clusters, user can filter by number ID or keywords.
* Top-right table shows positions where this meta-cluster has been detected.
* Expression profile along precursor: Lines are number of reads in that position of the precursor. It is sum of the log2 RPM of the expression for each sample.
* Table: raw counts for each sample and sequence. Only top 100 are shown.
* secondary structure: The region with more sequences inside meta-cluster is used to plot the secondary structure. Colors refers to abundance in each position. Darker means more abundance.

An example of the HTML code:

.. image:: http://i.makeagif.com/media/7-03-2016/M0GjW2.gif
  :target: https://youtu.be/Zjzte8n2-Sg