File: drawbacks.rst

package info (click to toggle)

pyfastx 2.2.0-1

links: PTS, VCS
area: main
in suites: forky, sid, trixie
size: 1,336 kB
sloc: ansic: 4,820; python: 1,817; sh: 505; perl: 66; makefile: 31

file content (23 lines) | stat: -rw-r--r-- 774 bytes

parent folder | download | duplicates (2)

Drawbacks
=========

If you intensively check sequence names exists in FASTA file using ``in`` operator on FASTA object like:

.. code:: python

	>>> fa = pyfastx.Fasta('tests/data/test.fa.gz')
	>>> # Suppose seqnames has 100000 names
	>>> for seqname in seqnames:
	>>>     if seqname in fa:
	>>>	        do something

This will take a long time to finish. Because, pyfastx does not load the index into memory, the ``in`` operating is corresponding to sql query existence from index database. The faster alternative way to do this is:

.. code:: python

	>>> fa = pyfastx.Fasta('tests/data/test.fa.gz')
	>>> # load all sequence names into a set object
	>>> all_names = set(fa.keys())
	>>> for seqname in seqnames:
	>>>     if seqname in all_names:
	>>>	        do something