1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121
|

# Jellyfish
## Overview
Jellyfish is a tool for fast, memory-efficient counting of k-mers in DNA. A k-mer is a substring of length k, and counting the occurrences of all such substrings is a central step in many analyses of DNA sequence. Jellyfish can count k-mers using an order of magnitude less memory and an order of magnitude faster than other k-mer counting packages by using an efficient encoding of a hash table and by exploiting the "compare-and-swap" CPU instruction to increase parallelism.
JELLYFISH is a command-line program that reads FASTA and multi-FASTA files containing DNA sequences. It outputs its k-mer counts in a binary format, which can be translated into a human-readable text format using the "jellyfish dump" command, or queried for specific k-mers with "jellyfish query". See the [documentation](doc/Readme.md) for details.
If you use Jellyfish in your research, please cite:
Guillaume Marcais and Carl Kingsford, A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics (2011) 27(6): 764-770 ([first published online January 7, 2011](http://bioinformatics.oxfordjournals.org/cgi/content/abstract/27/6/764 "Paper on Oxford Bioinformatics website")) doi:10.1093/bioinformatics/btr011
## Installation
### Linux Binaries
On Debian and Ubuntu with `apt`:
```Shell
sudo apt update
sudo apt install jellyfish
```
On Arch, it is available from [AUR](https://aur.archlinux.org/packages/jellyfish/).
### FreeBSD
Jellyfish can be installed on FreeBSD via the FreeBSD ports system.
To install via the binary package, simply run:
```Shell
pkg install Jellyfish
```
To install from source:
```Shell
cd /usr/ports/biology/jellyfish
make install
```
### Windows
With [Cygwin](https://www.cygwin.com/), Jellyfish can be compiled from source as [explained below](#from-source).
The simpler way on Windows 10 is to first install [WSL](https://docs.microsoft.com/en-us/windows/wsl/install-win10) and then install a Linux distribution that carries Jellyfish (e.g., Ubuntu) from the Windows Store.
Finally, install with:
```Shell
sudo apt update
sudo apt install jellyfish
```
### From source
To get an easier to compiled packaged tar ball of the source code, download a release from the [github release][3]. You need make and g++ version 4.4 or higher. To install in your home directory, do:
```Shell
./configure --prefix=$HOME
make -j 4
make install
```
To compile from the git tree, you will also need autoconf, automake, libool, gettext, pkg-config and [yaggo](https://github.com/gmarcais/yaggo/releases "Yaggo release on github"). Then to compile and install (in `/usr/local` in that example) with:
```Shell
autoreconf -i
./configure
make -j 4
sudo make install
```
If the software is installed in system directories (hint: you needed to use `sudo` to install), like the example above, then the system library cache must be updated like such:
```Shell
sudo ldconfig
```
Usage
-----
Instruction of use are available in the [doc](https://github.com/gmarcais/Jellyfish/tree/master/doc) directory.
Extra / Examples
----------------
In the examples directory are potentially useful extra programs to query/manipulates output files of Jellyfish, using the shared library of Jellyfish in C++ or with scripting languages. The examples are not compiled by default. Each subdirectory of examples is independent and is compiled with a simple invocation of 'make'.
Binding to script languages
---------------------------
Bindings to Ruby, Python and Perl are provided. This binding allows to read the output file of Jellyfish directly in a scripting language. Compilation of the bindings is easier from the [release tarball][3]. The development files of the target scripting language are required.
Compilation of the bindings from the git tree requires [SWIG][2] version 3 and adding the switch `--enable-swig` to the configure command lines show below.
To compile all three bindings, configure and compile with:
```Shell
./configure --enable-ruby-binding --enable-python-binding --enable-perl-binding
make -j 4
sudo make install
```
By default, Jellyfish is installed in `/usr/local` and the bindings are installed in the proper system location. When the `--prefix` switch is passed, the bindings are installed in the given directory. For example:
```Shell
./configure --prefix=$HOME --enable-python-binding
make -j 4
make install
```
This will install the python binding in `$HOME/lib/python2.7/site-packages` (adjust based on your Python version).
Then, for Python, Ruby or Perl to find the binding, an environment variable may need to be adjusted (`PYTHONPATH`, `RUBYLIB` and `PERL5LIB` respectively). For example:
```Shell
export PYTHONPATH=$HOME/lib/python2.7/site-packages
```
See the [swig directory](../../tree/master/swig) for examples on how to use the bindings.
[1]: http://www.genome.umd.edu/jellyfish.html "Genome group at University of Maryland"
[2]: http://www.swig.org/
[3]: https://github.com/gmarcais/Jellyfish/releases "Jellyfish release"
|