File: filter_fasta.rst

package info (click to toggle)
qiime 1.4.0-2
  • links: PTS, VCS
  • area: main
  • in suites: wheezy
  • size: 29,704 kB
  • sloc: python: 77,837; haskell: 379; sh: 113; makefile: 103
file content (68 lines) | stat: -rw-r--r-- 1,997 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
.. _filter_fasta:

.. index:: filter_fasta.py

*filter_fasta.py* -- This script can be applied to remove sequences from a fasta file based on input criteria.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

**Description:**




**Usage:** :file:`filter_fasta.py [options]`

**Input Arguments:**

.. note::

	
	**[REQUIRED]**
		
	-f, `-`-input_fasta_fp
		Path to the input fasta file
	-o, `-`-output_fasta_fp
		The output fasta filepath
	
	**[OPTIONAL]**
		
	-m, `-`-otu_map
		An OTU map where sequences ids are those which should be retained
	-s, `-`-seq_id_fp
		A list of sequence identifiers (or tab-delimited lines with a seq identifier in the first field) which should be retained
	-a, `-`-subject_fasta_fp
		A fasta file where the seq ids should be retained.
	-p, `-`-seq_id_prefix
		Keep seqs where seq_id starts with this prefix
	-n, `-`-negate
		Discard passed seq ids rather than keep passed seq ids [default: False]
	`-`-mapping_fp
		Mapping file path (for use with --valid_states) [default: None]
	`-`-valid_states
		Description of sample ids to retain (for use with --mapping_fp) [default: None]


**Output:**




**Keep all sequences that show up in an OTU map:**

::

	filter_fasta.py -f inseqs.fasta -o filtered_seqs.fasta -m uclust_ref_otus.txt

**Discard all sequences that show up in chimera checking output. NOTE: It is very important to pass -n here as this tells the script to negate the request, or discard all sequences that are listed via -s. This is necessary to remove the identified chimeras from inseqs.fasta:**

::

	filter_fasta.py -f inseqs.fasta -o non_chimeric_seqs.fasta -s chimeric_seqs.txt -n

**Keep all sequences listed in a text file:**

::

	filter_fasta.py -f inseqs.fasta -o filtered_seqs.fasta -s seqs_to_keep.txt