File: get_random_sequence_names_from_fasta.py

package info (click to toggle)
pyfastx 2.3.0-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 1,336 kB
  • sloc: ansic: 4,826; python: 1,816; sh: 505; perl: 66; makefile: 31
file content (25 lines) | stat: -rw-r--r-- 553 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
"""randomly get numbers of sequence name from fasta file
Usage:
	python3 get_random_sequence_names.py seed prop fastas...
@seed, random seed
@prop, proportion of sequence to extract, 0-1
"""

import sys
import math
import random
import pyfastx

random.seed(sys.argv[1])
prop = float(sys.argv[2])

for fafile in sys.argv[3:]:
	fa = pyfastx.Fasta(fafile)
	num = math.ceil(len(fa)*prop)
	ids = fa.keys()

	samples = random.sample(range(len(fa)), num)

	with open("{}.list".format(fafile), 'w') as fw:
		for i in samples:
			fw.write("{}\n".format(ids[i]))