File: extractseq.txt

package info (click to toggle)
emboss 6.6.0%2Bdfsg-7
  • links: PTS, VCS
  • area: main
  • in suites: buster
  • size: 571,544 kB
  • sloc: ansic: 460,579; java: 29,439; perl: 13,573; sh: 12,754; makefile: 3,283; csh: 706; asm: 351; xml: 239; pascal: 237; modula3: 8
file content (588 lines) | stat: -rw-r--r-- 23,509 bytes parent folder | download | duplicates (7)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
                                 extractseq



Wiki

   The master copies of EMBOSS documentation are available at
   http://emboss.open-bio.org/wiki/Appdocs on the EMBOSS Wiki.

   Please help by correcting and extending the Wiki pages.

Function

   Extract regions from a sequence

Description

   extractseq reads a sequence and writes sub-sequences from it to file.
   The set of regions to extract is specified on the command-line or in a
   file as pairs of start and end positions. The regions are written in
   the order in which they are specified. Thus, if the sequence AAAGGGTTT
   has been input and the regions: 7-9, 3-4 have been specified, then the
   output sequence will be: TTTAG. Optionally, each region may be written
   out as a separate sequence.

Usage

   Here is a sample session with extractseq

   Extract the region from position 10 to 20:


% extractseq tembl:x65923 result.seq -regions "10-20"
Extract regions from a sequence


   Go to the input files for this example
   Go to the output files for this example

   Example 2

   Extract the regions 10 to 20, 30 to 45, 533 to 537:


% extractseq tembl:x65921 result2.seq -regions "10-20 30-45 533-537"
Extract regions from a sequence


   Go to the input files for this example
   Go to the output files for this example

   Example 3

   Extract the regions 782-856, 951-1095, 1557-1612 and 1787-1912:


% extractseq tembl:x65921 -reg "782..856,951..1095,1557..1612,1787..1912" stdout

Extract regions from a sequence

>X65921 X65921.1 H.sapiens fau 1 gene
atgcagctctttgtccgcgcccaggagctacacaccttcgaggtgaccggccaggaaacg
gtcgcccagatcaaggctcatgtagcctcactggagggcattgccccggaagatcaagtc
gtgctcctggcaggcgcgcccctggaggatgaggccactctgggccagtgcggggtggag
gccctgactaccctggaagtagcaggccgcatgcttggaggtaaagtccatggttccctg
gcccgtgctggaaaagtgagaggtcagactcctaaggtggccaaacaggagaagaagaag
aagaagacaggtcgggctaagcggcggatgcagtacaaccggcgctttgtcaacgttgtg
cccacctttggcaagaagaagggccccaatgccaactcttaa


   Example 4

   Extract the regions 782-856, 951-1095, 1557-1612 and 1787-1912 all to
   separate output sequences:


% extractseq tembl:x65921 -reg "782..856,951..1095,1557..1612,1787..1912" stdout
 -separate
Extract regions from a sequence

>X65921_782_856 H.sapiens fau 1 gene
atgcagctctttgtccgcgcccaggagctacacaccttcgaggtgaccggccaggaaacg
gtcgcccagatcaag
>X65921_951_1095 H.sapiens fau 1 gene
gctcatgtagcctcactggagggcattgccccggaagatcaagtcgtgctcctggcaggc
gcgcccctggaggatgaggccactctgggccagtgcggggtggaggccctgactaccctg
gaagtagcaggccgcatgcttggag
>X65921_1557_1612 H.sapiens fau 1 gene
gtaaagtccatggttccctggcccgtgctggaaaagtgagaggtcagactcctaag
>X65921_1787_1912 H.sapiens fau 1 gene
gtggccaaacaggagaagaagaagaagaagacaggtcgggctaagcggcggatgcagtac
aaccggcgctttgtcaacgttgtgcccacctttggcaagaagaagggccccaatgccaac
tcttaa


Command line arguments

Extract regions from a sequence
Version: EMBOSS:6.6.0.0

   Standard (Mandatory) qualifiers:
  [-sequence]          sequence   Sequence filename and optional format, or
                                  reference (input USA)
   -regions            range      [Whole sequence] Regions to extract.
                                  A set of regions is specified by a set of
                                  pairs of positions.
                                  The positions are integers.
                                  They are separated by any non-digit,
                                  non-alpha character.
                                  Examples of region specifications are:
                                  24-45, 56-78
                                  1:45, 67=99;765..888
                                  1,5,8,10,23,45,57,99
  [-outseq]            seqoutall  [.] Sequence set(s)
                                  filename and optional format (output USA)

   Additional (Optional) qualifiers:
   -separate           boolean    [N] If this is set true then each specified
                                  region is written out as a separate
                                  sequence. The name of the sequence is
                                  created from the name of the original
                                  sequence with the start and end positions of
                                  the range appended with underscore
                                  characters between them, eg: XYZ region 2 to
                                  34 is written as: XYZ_2_34

   Advanced (Unprompted) qualifiers: (none)
   Associated qualifiers:

   "-sequence" associated qualifiers
   -sbegin1            integer    Start of the sequence to be used
   -send1              integer    End of the sequence to be used
   -sreverse1          boolean    Reverse (if DNA)
   -sask1              boolean    Ask for begin/end/reverse
   -snucleotide1       boolean    Sequence is nucleotide
   -sprotein1          boolean    Sequence is protein
   -slower1            boolean    Make lower case
   -supper1            boolean    Make upper case
   -scircular1         boolean    Sequence is circular
   -squick1            boolean    Read id and sequence only
   -sformat1           string     Input sequence format
   -iquery1            string     Input query fields or ID list
   -ioffset1           integer    Input start position offset
   -sdbname1           string     Database name
   -sid1               string     Entryname
   -ufo1               string     UFO features
   -fformat1           string     Features format
   -fopenfile1         string     Features file name

   "-outseq" associated qualifiers
   -osformat2          string     Output seq format
   -osextension2       string     File name extension
   -osname2            string     Base file name
   -osdirectory2       string     Output directory
   -osdbname2          string     Database name to add
   -ossingle2          boolean    Separate file for each entry
   -oufo2              string     UFO features
   -offormat2          string     Features format
   -ofname2            string     Features file name
   -ofdirectory2       string     Output directory

   General qualifiers:
   -auto               boolean    Turn off prompts
   -stdout             boolean    Write first file to standard output
   -filter             boolean    Read first file from standard input, write
                                  first file to standard output
   -options            boolean    Prompt for standard and additional values
   -debug              boolean    Write debug output to program.dbg
   -verbose            boolean    Report some/full command line options
   -help               boolean    Report command line options and exit. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning            boolean    Report warnings
   -error              boolean    Report errors
   -fatal              boolean    Report fatal errors
   -die                boolean    Report dying program messages
   -version            boolean    Report version number and exit


Input file format

   extractseq reads a single nucleotide or protein sequence.

   The input is a standard EMBOSS sequence query (also known as a 'USA').

   Major sequence database sources defined as standard in EMBOSS
   installations include srs:embl, srs:uniprot and ensembl

   Data can also be read from sequence output in any supported format
   written by an EMBOSS or third-party application.

   The input format can be specified by using the command-line qualifier
   -sformat xxx, where 'xxx' is replaced by the name of the required
   format. The available format names are: gff (gff3), gff2, embl (em),
   genbank (gb, refseq), ddbj, refseqp, pir (nbrf), swissprot (swiss, sw),
   dasgff and debug.

   See: http://emboss.sf.net/docs/themes/SequenceFormats.html for further
   information on sequence formats.

  Input files for usage example

   'tembl:x65923' is a sequence entry in the example nucleic acid database
   'tembl'

  Database entry: tembl:x65923

ID   X65923; SV 1; linear; mRNA; STD; HUM; 518 BP.
XX
AC   X65923;
XX
DT   13-MAY-1992 (Rel. 31, Created)
DT   18-APR-2005 (Rel. 83, Last updated, Version 11)
XX
DE   H.sapiens fau mRNA
XX
KW   fau gene.
XX
OS   Homo sapiens (human)
OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia;
OC   Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae;
OC   Homo.
XX
RN   [1]
RP   1-518
RA   Michiels L.M.R.;
RT   ;
RL   Submitted (29-APR-1992) to the INSDC.
RL   L.M.R. Michiels, University of Antwerp, Dept of Biochemistry,
RL   Universiteisplein 1, 2610 Wilrijk, BELGIUM
XX
RN   [2]
RP   1-518
RX   PUBMED; 8395683.
RA   Michiels L., Van der Rauwelaert E., Van Hasselt F., Kas K., Merregaert J.;
RT   "fau cDNA encodes a ubiquitin-like-S30 fusion protein and is expressed as
RT   an antisense sequence in the Finkel-Biskis-Reilly murine sarcoma virus";
RL   Oncogene 8(9):2537-2546(1993).
XX
DR   Ensembl-Gn; ENSG00000149806; Homo_sapiens.
DR   Ensembl-Tr; ENST00000279259; Homo_sapiens.
DR   Ensembl-Tr; ENST00000434372; Homo_sapiens.
DR   Ensembl-Tr; ENST00000525297; Homo_sapiens.
DR   Ensembl-Tr; ENST00000526555; Homo_sapiens.
DR   Ensembl-Tr; ENST00000527548; Homo_sapiens.
DR   Ensembl-Tr; ENST00000529259; Homo_sapiens.
DR   Ensembl-Tr; ENST00000529639; Homo_sapiens.
DR   Ensembl-Tr; ENST00000531743; Homo_sapiens.
XX
FH   Key             Location/Qualifiers
FH
FT   source          1..518
FT                   /organism="Homo sapiens"
FT                   /chromosome="11q"
FT                   /map="13"
FT                   /mol_type="mRNA"
FT                   /clone_lib="cDNA"
FT                   /clone="pUIA 631"
FT                   /tissue_type="placenta"
FT                   /db_xref="taxon:9606"
FT   misc_feature    57..278
FT                   /note="ubiquitin like part"
FT   CDS             57..458
FT                   /gene="fau"
FT                   /db_xref="GDB:135476"
FT                   /db_xref="GOA:P35544"
FT                   /db_xref="GOA:P62861"
FT                   /db_xref="H-InvDB:HIT000322806.14"
FT                   /db_xref="HGNC:3597"
FT                   /db_xref="InterPro:IPR000626"
FT                   /db_xref="InterPro:IPR006846"
FT                   /db_xref="InterPro:IPR019954"
FT                   /db_xref="InterPro:IPR019955"
FT                   /db_xref="InterPro:IPR019956"
FT                   /db_xref="PDB:2L7R"
FT                   /db_xref="UniProtKB/Swiss-Prot:P35544"
FT                   /db_xref="UniProtKB/Swiss-Prot:P62861"
FT                   /protein_id="CAA46716.1"
FT                   /translation="MQLFVRAQELHTFEVTGQETVAQIKAHVASLEGIAPEDQVVLLAG
FT                   APLEDEATLGQCGVEALTTLEVAGRMLGGKVHGSLARAGKVRGQTPKVAKQEKKKKKTG
FT                   RAKRRMQYNRRFVNVVPTFGKKKGPNANS"
FT   misc_feature    98..102
FT                   /note="nucleolar localization signal"
FT   misc_feature    279..458
FT                   /note="S30 part"
FT   polyA_signal    484..489
FT   polyA_site      509
XX
SQ   Sequence 518 BP; 125 A; 139 C; 148 G; 106 T; 0 other;
     ttcctctttc tcgactccat cttcgcggta gctgggaccg ccgttcagtc gccaatatgc        60
     agctctttgt ccgcgcccag gagctacaca ccttcgaggt gaccggccag gaaacggtcg       120
     cccagatcaa ggctcatgta gcctcactgg agggcattgc cccggaagat caagtcgtgc       180
     tcctggcagg cgcgcccctg gaggatgagg ccactctggg ccagtgcggg gtggaggccc       240
     tgactaccct ggaagtagca ggccgcatgc ttggaggtaa agttcatggt tccctggccc       300
     gtgctggaaa agtgagaggt cagactccta aggtggccaa acaggagaag aagaagaaga       360
     agacaggtcg ggctaagcgg cggatgcagt acaaccggcg ctttgtcaac gttgtgccca       420
     cctttggcaa gaagaagggc cccaatgcca actcttaagt cttttgtaat tctggctttc       480
     tctaataaaa aagccactta gttcagtcaa aaaaaaaa                               518
//

  Input files for usage example 2

  Database entry: tembl:x65921

ID   X65921; SV 1; linear; genomic DNA; STD; HUM; 2016 BP.
XX
AC   X65921; S45242;
XX
DT   13-MAY-1992 (Rel. 31, Created)
DT   14-NOV-2006 (Rel. 89, Last updated, Version 7)
XX
DE   H.sapiens fau 1 gene
XX
KW   fau 1 gene.
XX
OS   Homo sapiens (human)
OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia;
OC   Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae;
OC   Homo.
XX
RN   [1]
RP   1-2016
RA   Kas K.;
RT   ;
RL   Submitted (29-APR-1992) to the INSDC.
RL   K. Kas, University of Antwerp, Dept of Biochemistry T3.22,
RL   Universiteitsplein 1, 2610 Wilrijk, BELGIUM
XX
RN   [2]
RP   1-2016
RX   DOI; 10.1016/0006-291X(92)91286-Y.
RX   PUBMED; 1326960.
RA   Kas K., Michiels L., Merregaert J.;
RT   "Genomic structure and expression of the human fau gene: encoding the
RT   ribosomal protein S30 fused to a ubiquitin-like protein";
RL   Biochem. Biophys. Res. Commun. 187(2):927-933(1992).
XX
DR   Ensembl-Gn; ENSG00000149806; Homo_sapiens.
DR   Ensembl-Tr; ENST00000279259; Homo_sapiens.
DR   Ensembl-Tr; ENST00000434372; Homo_sapiens.
DR   Ensembl-Tr; ENST00000525297; Homo_sapiens.
DR   Ensembl-Tr; ENST00000526555; Homo_sapiens.
DR   Ensembl-Tr; ENST00000527548; Homo_sapiens.
DR   Ensembl-Tr; ENST00000529259; Homo_sapiens.
DR   Ensembl-Tr; ENST00000529639; Homo_sapiens.
DR   Ensembl-Tr; ENST00000531743; Homo_sapiens.
DR   GDB; 191789.
DR   GDB; 191790.
DR   GDB; 354872.
DR   GDB; 4590236.
XX
FH   Key             Location/Qualifiers
FH
FT   source          1..2016


  [Part of this file has been deleted for brevity]

FT                   RAKRRMQYNRRFVNVVPTFGKKKGPNANS"
FT   intron          857..950
FT                   /number=2
FT   exon            951..1095
FT                   /number=3
FT   intron          1096..1556
FT                   /number=3
FT   exon            1557..1612
FT                   /number=4
FT   intron          1613..1786
FT                   /number=4
FT   exon            1787..>1912
FT                   /number=5
FT   polyA_signal    1938..1943
XX
SQ   Sequence 2016 BP; 421 A; 562 C; 538 G; 495 T; 0 other;
     ctaccatttt ccctctcgat tctatatgta cactcgggac aagttctcct gatcgaaaac        60
     ggcaaaacta aggccccaag taggaatgcc ttagttttcg gggttaacaa tgattaacac       120
     tgagcctcac acccacgcga tgccctcagc tcctcgctca gcgctctcac caacagccgt       180
     agcccgcagc cccgctggac accggttctc catccccgca gcgtagcccg gaacatggta       240
     gctgccatct ttacctgcta cgccagcctt ctgtgcgcgc aactgtctgg tcccgccccg       300
     tcctgcgcga gctgctgccc aggcaggttc gccggtgcga gcgtaaaggg gcggagctag       360
     gactgccttg ggcggtacaa atagcaggga accgcgcggt cgctcagcag tgacgtgaca       420
     cgcagcccac ggtctgtact gacgcgccct cgcttcttcc tctttctcga ctccatcttc       480
     gcggtagctg ggaccgccgt tcaggtaaga atggggcctt ggctggatcc gaagggcttg       540
     tagcaggttg gctgcggggt cagaaggcgc ggggggaacc gaagaacggg gcctgctccg       600
     tggccctgct ccagtcccta tccgaactcc ttgggaggca ctggccttcc gcacgtgagc       660
     cgccgcgacc accatcccgt cgcgatcgtt tctggaccgc tttccactcc caaatctcct       720
     ttatcccaga gcatttcttg gcttctctta caagccgtct tttctttact cagtcgccaa       780
     tatgcagctc tttgtccgcg cccaggagct acacaccttc gaggtgaccg gccaggaaac       840
     ggtcgcccag atcaaggtaa ggctgcttgg tgcgccctgg gttccatttt cttgtgctct       900
     tcactctcgc ggcccgaggg aacgcttacg agccttatct ttccctgtag gctcatgtag       960
     cctcactgga gggcattgcc ccggaagatc aagtcgtgct cctggcaggc gcgcccctgg      1020
     aggatgaggc cactctgggc cagtgcgggg tggaggccct gactaccctg gaagtagcag      1080
     gccgcatgct tggaggtgag tgagagagga atgttctttg aagtaccggt aagcgtctag      1140
     tgagtgtggg gtgcatagtc ctgacagctg agtgtcacac ctatggtaat agagtacttc      1200
     tcactgtctt cagttcagag tgattcttcc tgtttacatc cctcatgttg aacacagacg      1260
     tccatgggag actgagccag agtgtagttg tatttcagtc acatcacgag atcctagtct      1320
     ggttatcagc ttccacacta aaaattaggt cagaccaggc cccaaagtgc tctataaatt      1380
     agaagctgga agatcctgaa atgaaactta agatttcaag gtcaaatatc tgcaactttg      1440
     ttctcattac ctattgggcg cagcttctct ttaaaggctt gaattgagaa aagaggggtt      1500
     ctgctgggtg gcaccttctt gctcttacct gctggtgcct tcctttccca ctacaggtaa      1560
     agtccatggt tccctggccc gtgctggaaa agtgagaggt cagactccta aggtgagtga      1620
     gagtattagt ggtcatggtg ttaggacttt ttttcctttc acagctaaac caagtccctg      1680
     ggctcttact cggtttgcct tctccctccc tggagatgag cctgagggaa gggatgctag      1740
     gtgtggaaga caggaaccag ggcctgatta accttccctt ctccaggtgg ccaaacagga      1800
     gaagaagaag aagaagacag gtcgggctaa gcggcggatg cagtacaacc ggcgctttgt      1860
     caacgttgtg cccacctttg gcaagaagaa gggccccaat gccaactctt aagtcttttg      1920
     taattctggc tttctctaat aaaaaagcca cttagttcag tcatcgcatt gtttcatctt      1980
     tacttgcaag gcctcaggga gaggtgtgct tctcgg                                2016
//

   You can specify a file of ranges to extract by giving the '-regions'
   qualifier the value '@' followed by the name of the file containing the
   ranges. (eg: '-regions @myfile').

   The format of the range file is:
     * Comment lines start with '#' in the first column.
     * Comment lines and blank lines are ignored.
     * The line may start with white-space.
     * There are two positive (integer) numbers per line separated by one
       or more space or TAB characters.
     * The second number must be greater or equal to the first number.
     * There can be optional text after the two numbers to annotate the
       line.
     * White-space before or after the text is removed.

   An example range file is:

# this is my set of ranges
12   23
 4   5       this is like 12-23, but smaller
67   10348   interesting region


Output file format

   The output is a standard EMBOSS sequence file.

   The results can be output in one of several styles by using the
   command-line qualifier -osformat xxx, where 'xxx' is replaced by the
   name of the required format. The available format names are: embl,
   genbank, gff, pir, swiss, dasgff, debug, listfile, dbmotif, diffseq,
   excel, feattable, motif, nametable, regions, seqtable, simple, srs,
   table, tagseq.

   See: http://emboss.sf.net/docs/themes/SequenceFormats.html for further
   information on sequence formats.

  Output files for usage example

  File: result.seq

>X65923 X65923.1 H.sapiens fau mRNA
ctcgactccat

  Output files for usage example 2

  File: result2.seq

>X65921 X65921.1 H.sapiens fau 1 gene
tccctctcgatacactcgggacaagttagggc

   If the option -separate is used then each specified region is written
   to the output file as a separate sequence. The name of the sequence is
   created from the name of the original sequence with the start and end
   positions of the range appended with underscore characters between
   them,

   For example: "XYZ region 2 to 34" is written as: "XYZ_2_34"

Data files

   None.

Notes

   extractseq allows you to specify one or more regions of a sequence to
   extract sub-sequences from to build up a contiguous output sequence.
   This is modelled on the cell's process of splicing out exons from mRNA,
   but the program is generally applicable to any cutting and splicing or
   editing operation on a single sequence.

   Where the -regions option is used to specify output as separate
   sequences, the name of the sequence is created from the name of the
   original sequence with the start and end positions of the range
   appended with underscore characters between them, eg: XYZ region 2 to
   34 is written as: XYZ_2_34. In such cases the output sequence format
   must be capable of supporting multiple sequences.

References

   None.

Warnings

   None.

Diagnostic Error Messages

   Several warning messages about malformed region specifications:
     * Non-digit found in region ...
     * Unpaired start of a region found in ...
     * Non-digit found in region ...
     * The start of a pair of region positions must be smaller than the
       end in ...

Exit status

   It exits with status 0, unless a region is badly constructed.

Known bugs

   None noted.

Comments

See also

   Program name     Description
   aligncopy        Read and write alignments
   aligncopypair    Read and write pairs from alignments
   biosed           Replace or delete sequence sections
   codcopy          Copy and reformat a codon usage table
   cutseq           Remove a section from a sequence
   degapseq         Remove non-alphabetic (e.g. gap) characters from sequences
   descseq          Alter the name or description of a sequence
   entret           Retrieve sequence entries from flatfile databases and files
   extractalign     Extract regions from a sequence alignment
   extractfeat      Extract features from sequence(s)
   featcopy         Read and write a feature table
   featmerge        Merge two overlapping feature tables
   featreport       Read and write a feature table
   feattext         Return a feature table original text
   listor           Write a list file of the logical OR of two sets of sequences
   makenucseq       Create random nucleotide sequences
   makeprotseq      Create random protein sequences
   maskambignuc     Mask all ambiguity characters in nucleotide sequences with
                    N
   maskambigprot    Mask all ambiguity characters in protein sequences with X
   maskfeat         Write a sequence with masked features
   maskseq          Write a sequence with masked regions
   newseq           Create a sequence file from a typed-in sequence
   nohtml           Remove mark-up (e.g. HTML tags) from an ASCII text file
   noreturn         Remove carriage return from ASCII files
   nospace          Remove whitespace from an ASCII text file
   notab            Replace tabs with spaces in an ASCII text file
   notseq           Write to file a subset of an input stream of sequences
   nthseq           Write to file a single sequence from an input stream of
                    sequences
   nthseqset        Read and write (return) one set of sequences from many
   pasteseq         Insert one sequence into another
   revseq           Reverse and complement a nucleotide sequence
   seqcount         Read and count sequences
   seqret           Read and write (return) sequences
   seqretsetall     Read and write (return) many sets of sequences
   seqretsplit      Read sequences and write them to individual files
   sizeseq          Sort sequences by size
   skipredundant    Remove redundant sequences from an input set
   skipseq          Read and write (return) sequences, skipping first few
   splitsource      Split sequence(s) into original source sequences
   splitter         Split sequence(s) into smaller sequences
   trimest          Remove poly-A tails from nucleotide sequences
   trimseq          Remove unwanted characters from start and end of sequence(s)
   trimspace        Remove extra whitespace from an ASCII text file
   union            Concatenate multiple sequences into a single sequence
   vectorstrip      Remove vectors from the ends of nucleotide sequence(s)
   yank             Add a sequence reference (a full USA) to a list file

Author(s)

   Gary Williams formerly at:
   MRC Rosalind Franklin Centre for Genomics Research Wellcome Trust
   Genome Campus, Hinxton, Cambridge, CB10 1SB, UK

   Please report all bugs to the EMBOSS bug team
   (emboss-bug (c) emboss.open-bio.org) not to the original author.

History

   Written (2000) - Gary Williams

Target users

   This program is intended to be used by everyone and everything, from
   naive users to embedded scripts.

Comments

   None