File: drawbacks.rst

package info (click to toggle)
pyfastx 2.2.0-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 1,336 kB
  • sloc: ansic: 4,820; python: 1,817; sh: 505; perl: 66; makefile: 31
file content (23 lines) | stat: -rw-r--r-- 774 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
Drawbacks
=========

If you intensively check sequence names exists in FASTA file using ``in`` operator on FASTA object like:

.. code:: python

	>>> fa = pyfastx.Fasta('tests/data/test.fa.gz')
	>>> # Suppose seqnames has 100000 names
	>>> for seqname in seqnames:
	>>>     if seqname in fa:
	>>>	        do something

This will take a long time to finish. Because, pyfastx does not load the index into memory, the ``in`` operating is corresponding to sql query existence from index database. The faster alternative way to do this is:

.. code:: python

	>>> fa = pyfastx.Fasta('tests/data/test.fa.gz')
	>>> # load all sequence names into a set object
	>>> all_names = set(fa.keys())
	>>> for seqname in seqnames:
	>>>     if seqname in all_names:
	>>>	        do something