1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132
|
[](https://github.com/TimoLassmann/kalign/actions/workflows/cmake.yml)
[](https://github.com/TimoLassmann/kalign/actions/workflows/python.yml)
[](https://github.com/TimoLassmann/kalign/actions/workflows/wheels.yml)

# Kalign
Kalign is a fast multiple sequence alignment program for biological
sequences. It aligns protein, DNA, and RNA sequences using a progressive
alignment approach with multi-threading support.
## Installation
### From source
Prerequisites: C compiler (GCC or Clang), CMake 3.18+, optionally OpenMP.
```bash
mkdir build && cd build
cmake ..
make
make test
make install
```
On macOS, `brew install libomp` for OpenMP support.
### Zig build (alternative)
Requires zig version 0.12.
```bash
zig build
```
### Python
```bash
pip install kalign-python
```
See [README-python.md](README-python.md) for the full Python documentation.
## Usage
```
kalign -i <input> -o <output>
```
Kalign v3.5 has three modes:
| Mode | Flag | Description |
|------|------|-------------|
| default | (none) | Best general-purpose. |
| fast | `--fast` | Fastest. Same as kalign v3.4. |
| precise | `--precise` | Highest accuracy, ~10x slower. |
### Examples
```bash
# Align sequences
kalign -i sequences.fa -o aligned.fa
# Fast mode
kalign --fast -i sequences.fa -o aligned.fa
# Precise mode (ensemble + realign)
kalign --precise -i sequences.fa -o aligned.fa
# Read from stdin
cat input.fa | kalign -i - -o aligned.fa
# Combine multiple input files
kalign seqsA.fa seqsB.fa -o combined.fa
# Save ensemble consensus for re-thresholding
kalign --precise -i seqs.fa -o out.fa --save-poar consensus.poar
kalign -i seqs.fa -o out2.fa --load-poar consensus.poar --min-support 3
```
### Options
```
--format Output format: fasta, msf, clu. [fasta]
--type Sequence type: protein, dna, rna, divergent. [auto]
--gpo Gap open penalty. [auto]
--gpe Gap extension penalty. [auto]
--tgpe Terminal gap extension penalty. [auto]
--ensemble N Run N ensemble alignments. [off]
--refine Refinement: none, all, confident. [none]
-n Number of threads. [auto]
```
### Output formats
```bash
kalign -i input.fa -f msf -o output.msf
kalign -i input.fa -f clu -o output.clu
```
## C library
Link Kalign into your C/C++ project:
```cmake
find_package(kalign)
target_link_libraries(<target> kalign::kalign)
```
Or include directly:
```cmake
add_subdirectory(<path>/kalign EXCLUDE_FROM_ALL)
target_link_libraries(<target> kalign::kalign)
```
## Benchmarks
### Balibase

### Bralibase

## Citation
Lassmann, Timo. "Kalign 3: multiple sequence alignment of large data sets."
Bioinformatics (2019). [DOI](https://doi.org/10.1093/bioinformatics/btz795)
## License
Apache License, Version 2.0. See [COPYING](COPYING).
|