1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167
|
************
Introduction
************
**sideRETRO** is a bioinformatic tool devoted for the detection
of somatic (*de novo*) **retrocopy insertion** in whole genome
and whole exome sequencing data (WGS, WES). The program has been
written from scratch in C, and uses `HTSlib <http://www.htslib.org/>`_
and `SQLite3 <https://www.sqlite.org>`_ libraries, in order to
manage SAM/BAM/CRAM reading and data analysis. The source code is
distributed under the **GNU General Public License**.
Wait, what is retrocopy?
========================
I can tell you now that retrocopy is a term used for the process
resulting from **reverse-transcription** of a mature **mRNA**
molecule into **cDNA**, and its insertion into a new position on
the genome.
.. image:: images/retrocopy.png
:scale: 50%
:align: center
Got interested? For a more detailed explanation about what is
a retrocopy at all, please see our section :ref:`Retrocopy in a
nutshell <chap_retrocopy>`.
Features
========
When detecting retrocopy mobilization, sideRETRO can annotate
several other features related to the event:
Parental gene
The **gene** which **underwent retrotransposition** process,
giving rise to the retrocopy.
Genomic position
The genome **coordinate** where occurred the retrocopy
**integration** (chromosome:start-end). It includes the
**insertion point**.
Strandness
Detects the orientation of the insertion (+/-). It takes into
account the orientation of insertion, whether in the
**leading** (+) or **lagging** (-) DNA strand.
Genomic context
The retrocopy integration site context: If the retrotransposition
event occurred at an **intergenic** or **intragenic** region - the
latter can be splitted into **exonic** and **intronic** according
to the host gene.
Genotype
When **multiple** individuals are analysed, annotate the
events for each one. That way, it is possible to
**distinguish** if an event is **exclusive** or **shared**
among the cohort.
Haplotype
Our tool provides information about the ploidy of the event,
i.e., whether it occurs in one or both **homologous** chromosomes
(homozygous or heterozygous).
How it works
============
sideRETRO compiles to an executable called :code:`sider`,
which has three subcommands: :code:`process-sample`,
:code:`merge-call` and :code:`make-vcf`. The :code:`process-sample`
subcommand reads a list of SAM/BAM/CRAM files, and captures
**abnormal reads** that must be related to an event of retrocopy.
All those data is saved to a **SQLite3 database** and then we come
to the second step :code:`merge-call`, which **processes** the database
and **annotate** all the retrocopies found. Finally we can run the
subcommand :code:`make-vcf` and generate an annotated retrocopy
`VCF <https://samtools.github.io/hts-specs/VCFv4.2.pdf>`_.
.. code-block:: sh
# List of BAM files
$ cat 'my-bam-list.txt'
/path/to/file1.bam
/path/to/file2.bam
/path/to/file3.bam
...
# Run process-sample step
$ sider process-sample \
--annotation-file='my-annotation.gtf' \
--input-file='my-bam-list.txt'
$ ls -1
my-genome.fa
my-annotation.gtf
my-bam-list.txt
out.db
# Run merge-call step
$ sider merge-call --in-place out.db
# Run make-vcf step
$ sider make-vcf \
--reference-file='my-genome.fa' out.db
Take a look at the manual page for :ref:`installation <chap_installation>`
and :ref:`usage <chap_usage>` information. Also for more details about
the algorithm, see our :ref:`methodology <chap_methodology>`.
Obtaining sideRETRO
===================
The source code for the program can be obtaining in the `github
<https://github.com/galantelab/sideRETRO>`_ page. From the command
line you can clone our repository::
$ git clone https://github.com/galantelab/sideRETRO.git
No Warranty
===========
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
`GNU General Public License
<https://www.gnu.org/licenses/gpl-3.0.en.html>`_
for more details.
Reporting Bugs
==============
If you find a bug, or have any issue, please inform us in the
`github issues tab <https://github.com/galantelab/sideRETRO/issues>`_.
All bug reports should include:
- The version number of sideRETRO
- A description of the bug behavior
Citation
========
If sideRETRO was somehow useful in your research, please cite it:
.. code-block:: bib
@article{10.1093/bioinformatics/btaa689,
author = {Miller, Thiago L A and Orpinelli, Fernanda and Buzzo, José Leonel L and Galante, Pedro A F},
title = "{sideRETRO: a pipeline for identifying somatic and polymorphic insertions of processed pseudogenes or retrocopies}",
journal = {Bioinformatics},
year = {2020},
month = {07},
issn = {1367-4803},
doi = {10.1093/bioinformatics/btaa689},
url = {https://doi.org/10.1093/bioinformatics/btaa689},
note = {btaa689},
}
Further Information
===================
If you need additional information, or a closer contact with the authors -
*we are always looking for coffee and good company* - contact us by email,
see :ref:`authors <chap_authors>`.
Our bioinformatic group has a site, feel free to make us a visit:
https://www.bioinfo.mochsl.org.br/.
|