File: databases.md

package info (click to toggle)
sourmash 4.9.4-4
  • links: PTS, VCS
  • area: main
  • in suites: sid
  • size: 54,688 kB
  • sloc: python: 59,380; ansic: 332; makefile: 277; sh: 6
file content (31 lines) | stat: -rw-r--r-- 1,213 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# Prepared databases

```{contents}
```

## Modern databases

We provide a number of pre-built collections and indexed databases
that you can use with sourmash.  As of August 2025, we provide
databases in zip and RocksDB formats; older databases are available in
a variety of [legacy formats](legacy-databases.md).

[GTDB RS220](databases-md/gtdb220.md) -- Bacterial and Archaeal genomes from GTDB RS220.

[GTDB RS226](databases-md/gtdb226.md) -- Bacterial and Archaeal genomes from GTDB RS226.

[NCBI Viruses (Jan 2025)](databases-md/ncbi_viruses_2025_01.md) -- All viruses from NCBI (NCBI:txid10239) as of January 2025.

[NCBI Eukaryotes (Jan 2025)](databases-md/ncbi_euks_2025_01.md) -- All eukaryotic reference genomes from NCBI (NCBI:txid2759) as of January 2025.

## Database formats and sourmash versions

Zip format databases can be used with sourmash v4.1.0 and later (May
2021), while RocksDB databases can be used with sourmash v4.9.0 and
later (May 2025).  All older database formats work with these versions
of sourmash as well, and we always recommend using the latest version
available.

## Legacy database information (2024 and before)

Legacy databases are available [here](legacy-databases.md).