1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383
|
Version 1.1.6
* Use 'mandir' variable to handle man page
installation. This will be useful when
packaging for distributions - like debian.
* Fix an english typo in usage.rst.
Version 1.1.5
* Now, by default, compile and install a man
page alongside the sider executable when
sphinx-build is present.
* Add CodeQL to test vulnerabilities in code.
* Add a security policy.
Version 1.1.4
* Fix an english typo in cluster.c (cluster).
* Fix readthedocs build.
* Update ubuntu image version for docker and
github workflows. Use ubuntu 24.04.
Version 1.1.3
* Improve memory usage when reading unsorted,
non-collated, alignment files. To solve
running unsorted file, read it three time:
Firstly index all abnormal read querynames
into a hash table, then remove low-quality
fragments and finally dump all indexed
fragments to the database.
Version 1.1.2
* Fix a possible vulnerability in 'strncat'
and 'strcpy' inside a loop, replacing them
with 'memcpy'.
* Use multi-stage build to decrease the final
docker image size.
Version 1.1.1
* Fix testing. Point the right 'libcheck'
version (>= 0.15.0) in 'meson.build'.
Version 1.1.0
* Fix bug in 'merge-call' subcommand when
merging databases with the option 'in-place'
active.
* Fix testing to handle SIGABRT and SIGSEGV.
* Move from Travis-CI to Github Actions. Test
the code, build and deploy to Dockerhub.
* Update Dockerfile base image to ubuntu
version 20.04.
Version 1.0.0
* Now sideRETRO works with the CRAM file
format too.
* Fix bug when reading GTF, possibly GFF3
as well, and there is a space into the attr
value: gene_name "My gene name". Now, the
attr will be correctly splited in:
key=gene_name and value='My gene name'.
* Add citation option. Print it in BibTeX
format.
* Add 'gz' interface, in order to wrapper
'libz' library and remove duplicated
code.
* Add the copying notices header to each
source script.
Version 0.14.1
* Add 'git' to Dockerfile. It will avoid
errors when picking the right version.
* Fix the default value for 'phred-quality'.
* Fix building: 'config.h' is dynamically
generated from template 'config.h.in', so
it need to be made before compiling. In order
to assure the required order, declare
'config.h' as a dependency of library and
executable.
Version 0.14.0
* Implement the genotype estimation by using
the likelihood approach as it is defined in
Heng Li paper: "A statistical framework for
SNP calling, mutation discovery, association
mapping and population genetical parameter
estimation from sequencing data - 2.2 (eq2)".
* Add reference and alternate depth counts to
VCF's info.
* Evidence for reads covering the reference
allele is calculated now by the overlapping
between read range and insertion point +/-
read half decil. It is necessary in order to
avoid overestimation of reads covering the
reference allele due to mapping errors.
* Fix when two overlapped clusters share the
same parental gene - maybe the edge points are
reachable, not core points (DBSCAN) - it would
be annotated as overlapped parentals. Now it
will be annotated as PASS and the parental gene
name wont be duplicated (e.g. PTEN/PTEN).
* Add Docker image for this project.
* Add documentation using Sphinx. So far, it is
ready the topics about installation, usage,
molecular gene biology method and simulation
results.
Version 0.13.0
* Improve usage help by splitting all options
according to its category.
* Log all user-given options and arguments.
* Add processing text interface to read 'FASTA'
file format.
* Add VCF's writer interface. It was necessary
to create new INFO and ALT tags for handle
informations related to the retrocopy as:
'PG' for parental gene, 'PGTYPE' for
parental gene type, EXONIC/INTRONIC/NEAR
for retrocopy genomic position - all those
tags for INFO - and <INS:ME:RTC> for ALT.
* Add subcommand 'make-vcf' in order to
manage VCF's file generation.
Version 0.12.0
* Add genotype and haplotype analysis. If
the BAM index is found, perform a fast
search for each retrocopy region, else
index all regions inside an intervalar
tree and make a slow linear search all
over the file.
* To perform genotype and haplotype analysis,
is required to go back to the BAM files -
whose path is saved inside 'source' table.
So, it is necessary that the user maintain
the files where the program expects to find
them.
* Remove bwa subproject, because it won't be
used any longer.
Version 0.11.0
* Add retrocopy annotation: Insertion window
and point, orientation rho and p-value,
level PASS, OVERLAPPED, NEAR, HOTSPOT,
AMBIGUOUS.
* Add deduplication capability. Remove
duplicated reads but one, which is called
the primary read. Other tools, specilized
in remove duplications, use some metric to
choose the primary reads. For us, it is just
interesting the coordinates, so the primary
reads are choosen randomly - mostly the
first one to appear.
* There is the possibility that different BAM
files share reads with the same query name.
In order to avoid a mess to find the right
mate, use the source_id along with qname when
required to match reads from the same fragment.
Version 0.10.0
* Add reclustering step in order to filtering
low number of reads comming from a given
source (BAM). When those reads are removed,
may occur that the cluster become rarefied,
and therefore, invalid according to DBSCAN
constraints.
* Add processing text interface to read 'BED'
file format.
* Add blacklist interface and tables blacklist,
overlapping_blacklist. The interface was
inspired in 'exon.c' way. Also add more
command-line options for indexing blacklisted
regions from GTF/GFF3/BED files.
* Add gff filtering capabilities. The user must
initiate a type GffFilter with the feature_type
to be filtered (e.g. gene, transcript, exon)
and may add attributes aswell (e.g.
gene_type=protein_coding). The attributes are
hard and soft - which mean, hard attributes
must all match the pattern (AND); soft
attributes, at least, must match one
pattern (OR).
* Cluster filter: NONE, CHR, DIST, REGION, SUPPORT.
The philosophy now is to keep all clusters and
subclusters and add a new column to handle
the filtering steps.
Version 0.9.0
* Add options '--blacklist-chr' and
'--parental-distance' to 'merge-call' command.
This way, it's possible to filter reads from
chromosomes (as chrM), and avoid clustering
inside, or very near, its own parental gene.
* 'DBSCAN' is not entirely deterministic: border
points that are reachable from more than one
cluster can be part of either cluster,
depending on the order the data are processed
(Wikipedia).
Fix the 'REACHABLE' points which are shared
among multiple clusters.
* Add statistics 'pearson' and 'spearman'
correlation brought from 'GSL'. They will be
necessary for calculating the retrocopy
orientation.
Version 0.8.0
* Add new extensible hash algorithm. Move from
chaining hash type to extensible hash. Now it
has no need to declare the hash size in 'new'
function, because it is dynamically allocated.
* Add a 'set' interface based on 'hash', instead
of 'list'. It was necessary for speed up the
'DBSCAN' when making the union operation.
* In 'process-sample' command, add phred quality
score filter option in order to avoid low mapped
quality reads.
* Add SQL INDEX for alignment(qname) table for
speed up query.
* Update clustering query statement in order to
avoid the abnormal reads that fall into their
own parental gene.
* Fix bug in 'exon.c'. The alignment_id was
included inside ExonTree object, which in turn
was shared among all threads. With no mutex,
may occur shocking among all alignment_id values.
In order to fix it, a new private struct keeps
'ExonTree' and 'alignment_id' separately.
* Move from 'Autotools' to 'Meson build system'.
Version 0.7.0
* Change abnormal interface to handle sorted
and unsorted SAM/BAM files.
* processing-sample subcommand automatically
detects if the alignment file is queryname
sorted or not and then choose the right
abnormal interface.
* Add a new table 'schema' with a single column
called 'version'. Its function is to keep
track of the database schema state in a
versioned way.
Version 0.6.0
* Add DBSCAN algorithm - Density-Based Spatial
Clustering of Applications with Noise. Its
purpose is making an one-dimensional
groupping of all reads per chromosome.
* Add CLI - Command Line Interface - based
in subcommands: For now, there are two
subcommands: processing-sample (alias
ps) and merge-call (alias mc).
* Add low-level wrappers on SQLite3 functions
in order to avoid so many testing against
each statement.
* Improve testing coverage with gcov, lcov
and COVERALLS platform.
Version 0.5.0
* Add a 'str' interface to handle string
memory manipulation more efficiently.
* 'ibitree' lookup mechanism reshaped: Besides
the range to search for, it acceps now node,
interval overlapped fractions and the bitwise
boolean (AND or OR) for testing if both, node
and interval must overlap each other at the
same percentage - or if just one being true
is enough. Also, 'ibitree' keeps track of the
position and length of overlapping regions.
* Add 'thpool' interface for thread pool.
* Add 'SQLite3' library to manage intermediate
and possibly final results.
* 'abnormal' filtering is working. It selects
the so called abnormal alignments: Thoses
alignments, whose each sequenced read of the
fragment falls into different chromosomes, or
is far way from its mate. Also it catches
supplementary alignments. All the results
are recorded into the 'SQLite3' database.
* Add 'exon' interface. Its job is seach for
abnormal alignments that overlap some exon.
Version 0.4.0
* Add an 'align' interface to save 'bwa mem'
output in 'bwa' format.
* Borrow the 'samtools sort' algorithm and
implement it at sam_sort interface.
* Add a 'binary tree' data structure and, on top
of it, implement an 'AVL interval tree' to handle
searching genomic annotation positons more
efficiently.
* Add processing text interface to read 'gff/gtf'
file format.
* Initiate 'abnormal' filter algorithm.
Version 0.3.0
* Use 'git submodule' to handle local building of
'htslib' and 'bwa'. Those codes are statically
linked against our software.
* Add 'bwa mem' and 'sam' wrappers.
Version 0.2.0
* Add array, list and hash data structures.
* Add logging interface to manage messages level.
* Wrapping of standard c functions for allocating
memory and opening file.
Version 0.1.0
* Initial version. Project was set to work with
'autotools' and testing with the framework 'check'.
|