File: extract_seqs_by_sample_id.rst

package info (click to toggle)
qiime 1.8.0%2Bdfsg-4
  • links: PTS, VCS
  • area: main
  • in suites: jessie, jessie-kfreebsd
  • size: 130,508 kB
  • ctags: 10,145
  • sloc: python: 110,826; haskell: 379; sh: 169; makefile: 125
file content (62 lines) | stat: -rw-r--r-- 2,286 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
.. _extract_seqs_by_sample_id:

.. index:: extract_seqs_by_sample_id.py

*extract_seqs_by_sample_id.py* -- Extract sequences based on the SampleID
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

**Description:**

This script creates a fasta file which will contain only sequences that ARE associated with a set of sample IDs, OR all sequences that are NOT associated with a set of sample IDs (-n)


**Usage:** :file:`extract_seqs_by_sample_id.py [options]`

**Input Arguments:**

.. note::

	
	**[REQUIRED]**
		
	-i, `-`-input_fasta_fp
		Path to the input fasta file
	-o, `-`-output_fasta_fp
		The output fasta file
	
	**[OPTIONAL]**
		
	-n, `-`-negate
		Negate the sample ID list (i.e., output sample ids not passed via -s) [default: False]
	-s, `-`-sample_ids
		Comma-separated sample_ids to include in output fasta file (or exclude if --negate), or string describing mapping file states defining sample ids (mapping_fp must be provided for the latter)
	-m, `-`-mapping_fp
		The mapping filepath


**Output:**

The script produces a fasta file containing containing only the specified SampleIDs.


**Examples:**

Create the file outseqs.fasta (-o), which will be a subset of inseqs.fasta (-i) containing only the sequences THAT ARE associated with sample ids S2, S3, S4 (-s). As always, sample IDs are case-sensitive:

::

	extract_seqs_by_sample_id.py -i inseqs.fasta -o outseqs_by_sample.fasta -s S2,S3,S4

Create the file outseqs.fasta (-o), which will be a subset of inseqs.fasta (-i) containing only the sequences THAT ARE NOT (-n) associated with sample ids S2, S3, S4 (-s). As always, sample IDs are case-sensitive:

::

	extract_seqs_by_sample_id.py -i inseqs.fasta -o outseqs_by_sample_negated.fasta -s S2,S3,S4 -n

Create the file outseqs.fasta (-o), which will be a subset of inseqs.fasta (-i) containing only the sequences THAT ARE associated with sample ids whose "Treatment" value is "Fast" in the mapping file:

::

	extract_seqs_by_sample_id.py -i inseqs.fasta -o outseqs_by_mapping_field.fasta -m map.txt -s "Treatment:Fast"