1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
|
.. _outputs:
***************
Outputs
***************
seqcluster
==========
* ``counts.tsv``: count matrix that can be input of downstream analyses. `nloci` will be 0 always that the meta-cluster has been resolved successfully. For instance, it can happen that you got sequences you have a bunch of sequences mapping to hundreds of different places on the genome, then seqcluster doesn’t resolve that, and put everything under the larger region covered by those sequences. So, mainly, 0 all are good rows. The `ann` column is just where the meta-clusters overlap with. It can happen that one name appears many times if different locations of the meta-cluster map to different copies of that feature. OR if the annotation file used had multiple lines for that.
* ``read_stats.tsv``: number of reads for each sample after each step in the analysis. Meant to give a hint if we lose a lot of information or not.
* ``size_counts.tsv``: size distribution of the small RNA by annotation group. (position, reads, cluster)
* ``seqcluster.json``: json file containing all information. This file is used as the input of the report suit.
* ``log/run.log``: all messages at debug level
* ``log/trace.log``: to keep trace of algorithm decisions
Report
======
Beside the static HTML report that you can get using ``report`` `subcommand <http://seqcluster.readthedocs.org/getting_started.html#report>`_, you can download `this <https://github.com/lpantano/seqclusterViz/archive/master.zip>`_ HTML. (watch the repository to get notifications of new releases.)
* Go inside ``seqclusterViz`` folder
* Open ``reader.html``
* Upload the ``seqcluster.db`` file generated by ``report`` subcommand.
* Start browsing your data!
Meaning of different sections:
* Top-left table shows list of meta-clusters, user can filter by number ID or keywords.
* Top-right table shows positions where this meta-cluster has been detected.
* Expression profile along precursor: Lines are number of reads in that position of the precursor. It is sum of the log2 RPM of the expression for each sample.
* Table: raw counts for each sample and sequence. Only top 100 are shown.
* secondary structure: The region with more sequences inside meta-cluster is used to plot the secondary structure. Colors refers to abundance in each position. Darker means more abundance.
An example of the HTML code:
.. image:: http://i.makeagif.com/media/7-03-2016/M0GjW2.gif
:target: https://youtu.be/Zjzte8n2-Sg
|