File: README.md

package info (click to toggle)
sideretro 1.1.6-3
  • links: PTS, VCS
  • area: main
  • in suites: sid
  • size: 6,636 kB
  • sloc: ansic: 15,270; perl: 46; python: 44; makefile: 3
file content (146 lines) | stat: -rw-r--r-- 5,773 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
<p align="center"><a href="https://sideretro.readthedocs.io/en/latest/?badge=latest"><img src="docs/images/logo_sideRETRO.png" alt="sideRETRO" width="200"></a></p>
<h2 align="center">A pipeline for detecting Somatic Insertion of DE novo RETROcopies</h2>

<p align="center">
  <a href="https://github.com/galantelab/sideRETRO/actions/workflows/ci.yml"><img alt="" src="https://github.com/galantelab/sideRETRO/actions/workflows/ci.yml/badge.svg?branch=master" align="center"></a>
  <a href="https://github.com/galantelab/sideRETRO/actions?query=workflow:%22CodeQL%22"><img src="https://github.com/galantelab/sideRETRO/workflows/CodeQL/badge.svg" alt="CodeQL" align="center"></a>
  <a href="https://coveralls.io/github/galantelab/sideRETRO?branch=master"><img alt="" src="https://coveralls.io/repos/github/galantelab/sideRETRO/badge.svg?branch=master&kill_cache=1" align="center"></a>
  <a href="https://sideretro.readthedocs.io/en/latest/?badge=latest"><img alt="" src="https://readthedocs.org/projects/sideretro/badge/?version=latest" align="center"></a>
  <a href="https://github.com/galantelab/sideRETRO/tags/"><img src="https://img.shields.io/github/tag/galantelab/sideRETRO?include_prereleases=&sort=semver&color=blue" alt="GitHub tag" align="center"></a>
  <a href="https://hub.docker.com/r/galantelab/sider/tags"><img alt="" src="https://img.shields.io/badge/docker-%230db7ed.svg?style=flat&logo=docker&logoColor=white" align="center"></a>
</p>

**sideRETRO** is a bioinformatic tool devoted for the detection of somatic **retrocopy** insertion, also known as
**retroCNV**, in whole genome and whole exome sequencing data (WGS, WES). The program has been written from scratch
in C, and uses [HTSlib](http://www.htslib.org/) and [SQLite3](https://www.sqlite.org) libraries, in order to manage
**SAM/BAM/CRAM** reading and data analysis.

For full documentation, please visit <https://sideretro.readthedocs.io>.

### Features

When detecting retrocopies, **sideRETRO** can annotate several other features related to each event:

* **Parental gene**

   The gene which underwent retrotransposition process.

* **Genomic position**

   The genome coordinate where occurred the retrocopy integration event (chromosome:start-end).
   It includes the insertion point (the expected exact point of each retrocopy insertion).

* **Strandness**

   Detects the orientation of the insertion (+/-). It takes into account the orientation of insertion,
   whether in the leading (+) or lagging (-) DNA strand.

* **Genomic context**

   The retrocopy integration site context: If the retrotransposition event occurred at an intergenic or
   intragenic region - the latter can be splitted into exonic and intronic according to the host gene.

* **Genotype**

   When multiple individuals (genomes) are analyzed, **sideRETRO** discriminates events found in each one.
   That way, it is possible to distinguish whether an event is exclusive or shared among the cohort analyzed.

* **Haplotype**

   Our tool provides information about the ploidy of the event, i.e., whether it occurs in one or both homologous
   chromosomes (homozygous or heterozygous).

## Getting Started

### Installation

The project depends on [Meson build system](https://mesonbuild.com) and [Ninja](https://github.com/ninja-build/ninja)
to manage configuration and compilation process. They can be obtained using package manager or from source. For example,
using [Ubuntu](https://ubuntu.com) distribution:

```
$ sudo apt-get install python3 \
                       python3-pip \
                       python3-setuptools \
                       python3-wheel \
                       ninja-build
```

and then:

`$ pip3 install --user meson`

(or: `$ sudo apt install meson`)


Finally, clone this repository:

`$ git clone https://github.com/galantelab/sideRETRO.git`

Inside sideRETRO directory, run:

`$ meson build && ninja -C build`

You can find `sider` executable inside `build/src`. Optionally, install to system directories with:

`$ sudo ninja -C build install`

### Usage

**sideRETRO** compiles to an executable called `sider`, which has three subcommands: `process-sample`, `merge-call`
and `make-vcf`. The `process-sample` subcommand processes a list of **SAM/BAM/CRAM** files, and captures abnormal reads
that must be related to an event of retrocopy. All those data is saved to a **SQLite3 database** and then we come
to the second step `merge-call`, which processes the database and annotates all the retrocopies found. Finally we
can run the subcommand `make-vcf` and generate a file (in **VCF** format) with retrocopies and further information
about them.

```sh
# List of BAM files
$ cat 'my-bam-list.txt'
/path/to/file1.bam
/path/to/file2.bam
/path/to/file3.bam

# Run process-sample step
$ sider process-sample \
    --annotation-file='my-annotation.gtf' \
    --input-file='my-bam-list.txt'

$ ls -1
my-genome.fa
my-annotation.gtf
my-bam-list.txt
out.db

# Run merge-call step
$ sider merge-call --in-place out.db

# Run make-vcf step
$ sider make-vcf \
    --reference-file='my-genome.fa' out.db
```

## Citation

If sideRETRO was somehow useful in your research, please cite it:

```bib
@article{10.1093/bioinformatics/btaa689,
  author = {Miller, Thiago L A and Orpinelli, Fernanda and Buzzo, José Leonel L and Galante, Pedro A F},
  title = "{sideRETRO: a pipeline for identifying somatic and polymorphic insertions of processed pseudogenes or retrocopies}",
  journal = {Bioinformatics},
  year = {2020},
  month = {07},
  issn = {1367-4803},
  doi = {10.1093/bioinformatics/btaa689},
  url = {https://doi.org/10.1093/bioinformatics/btaa689},
  note = {btaa689},
}
```

## License

This is free software, licensed under:

`The GNU General Public License, Version 3, June 2007`