File: using-varscan.txt

package info (click to toggle)
varscan 2.4.3%2Bdfsg-1
  • links: PTS, VCS
  • area: non-free
  • in suites: stretch
  • size: 604 kB
  • ctags: 112
  • sloc: java: 8,897; xml: 46; sh: 24; makefile: 17
file content (801 lines) | stat: -rw-r--r-- 39,320 bytes parent folder | download | duplicates (5)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
VarScan User's Manual
=====================

VarScan is coded in Java, and should be executed from the command line
(Terminal, in Linux/UNIX/OSX, or Command Prompt in MS Windows). For variant
calling, you will need a pileup file. See the How to Build A Pileup File
section for details. Running VarScan with no arguments prints the usage
information. Because some fields changed as of VarScan v2.2.3, we are providing
updated documentations for the current release. For documentation of v2.2.2 and
prior, see below.


VarScan Documentation (v2.2.3 and later)


        USAGE: varscan  [COMMAND] [OPTIONS]

        COMMANDS:

        Single-sample Calling:
        pileup2snp [pileup file]
        pileup2indel [pileup file]
        pileup2cns [pileup file]

        Multi-sample Calling:
        mpileup2snp [mpileup file]
        mpileup2indel [mpileup file]
        mpileup2cns [mpileup file]

        Tumor-normal Comparison:
        somatic [normal pileup] [tumor pileup] or [normal-tumor mpileup]
        copynumber [normal pileup] [tumor pileup] or [normal-tumor mpileup]

        Variant Filtering:
        filter [variants file]
        somaticFilter [mutations file]

        Utility Functions:
        limit [variants file]
        readcounts [pileup file]
        compare [file1] [file2]


pileup2snp

This command calls SNPs from a pileup file based on user-defined parameters:

        USAGE: varscan pileup2snp [pileup file] OPTIONS
        pileup file - The SAMtools pileup file

        OPTIONS:
        --min-coverage  Minimum read depth at a position to make a call [8]
        --min-reads2    Minimum supporting reads at a position to call variants [2]
        --min-avg-qual  Minimum base quality at a position to count a read [15]
        --min-var-freq  Minimum variant allele frequency threshold [0.01]
        --p-value       Default p-value threshold for calling variants [99e-02]

        OUTPUT
        Tab-delimited SNP calls with the following columns:
        Chrom           chromosome name
        Position        position (1-based)
        Ref             reference allele at this position
        Cons            Consensus genotype of sample in IUPAC format.
        Reads1          reads supporting reference allele
        Reads2          reads supporting variant allele
        VarFreq         frequency of variant allele by read count
        Strands1        strands on which reference allele was observed
        Strands2        strands on which variant allele was observed
        Qual1           average base quality of reference-supporting read bases
        Qual2           average base quality of variant-supporting read bases
        Pvalue          Significance of variant read count vs. expected baseline error
        MapQual1        Average map quality of ref reads (only useful if in pileup)
        MapQual2        Average map quality of var reads (only useful if in pileup)
        Reads1Plus      Number of reference-supporting reads on + strand
        Reads1Minus     Number of reference-supporting reads on - strand
        Reads2Plus      Number of variant-supporting reads on + strand
        Reads2Minus     Number of variant-supporting reads on - strand
        VarAllele       Most frequent non-reference allele observed


pileup2indel

This command calls indels from a pileup file based on user-defined parameters:

        USAGE: varscan pileup2indel [pileup file] OPTIONS
        pileup file - The SAMtools pileup file

        OPTIONS:
        --min-coverage  Minimum read depth at a position to make a call [8]
        --min-reads2    Minimum supporting reads at a position to call variants [2]
        --min-avg-qual  Minimum base quality at a position to count a read [15]
        --min-var-freq  Minimum variant allele frequency threshold [0.01]
        --p-value       Default p-value threshold for calling variants [99e-02]

        OUTPUT
        Tab-delimited indel calls with the following columns:
        Chrom           chromosome name
        Position        position (1-based)
        Ref             reference allele at this position
        Cons            Consensus genotype of sample; */(var) indicates heterozygous
        Reads1          reads supporting reference allele
        Reads2          reads supporting variant allele
        VarFreq         frequency of variant allele by read count
        Strands1        strands on which reference allele was observed
        Strands2        strands on which variant allele was observed
        Qual1           average base quality of reference-supporting read bases
        Qual2           average base quality of variant-supporting read bases
        Pvalue          Significance of variant read count vs. expected baseline error
        MapQual1        Average map quality of ref reads (only useful if in pileup)
        MapQual2        Average map quality of var reads (only useful if in pileup)
        Reads1Plus      Number of reference-supporting reads on + strand
        Reads1Minus     Number of reference-supporting reads on - strand
        Reads2Plus      Number of variant-supporting reads on + strand
        Reads2Minus     Number of variant-supporting reads on - strand
        VarAllele       Most frequent non-reference allele observed


pileup2cns

This command makes consensus calls (SNP/Indel/Reference) from a pileup file
based on user-defined parameters:

        USAGE: varscan pileup2cns [pileup file] OPTIONS
        pileup file - The SAMtools pileup file

        OPTIONS:
        --min-coverage  Minimum read depth at a position to make a call [8]
        --min-reads2    Minimum supporting reads at a position to call variants [2]
        --min-avg-qual  Minimum base quality at a position to count a read [15]
        --min-var-freq  Minimum variant allele frequency threshold [0.01]
        --p-value       Default p-value threshold for calling variants [99e-02]

        OUTPUT
        Tab-delimited consensus calls with the following columns:
        Chrom           chromosome name
        Position        position (1-based)
        Ref             reference allele at this position
        Cons            Consensus genotype of sample; */(var) indicates heterozygous
        Reads1          reads supporting reference allele
        Reads2          reads supporting variant allele
        VarFreq         frequency of variant allele by read count
        Strands1        strands on which reference allele was observed
        Strands2        strands on which variant allele was observed
        Qual1           average base quality of reference-supporting read bases
        Qual2           average base quality of variant-supporting read bases
        Pvalue          Significance of variant read count vs. expected baseline error
        MapQual1        Average map quality of ref reads (only useful if in pileup)
        MapQual2        Average map quality of var reads (only useful if in pileup)
        Reads1Plus      Number of reference-supporting reads on + strand
        Reads1Minus     Number of reference-supporting reads on - strand
        Reads2Plus      Number of variant-supporting reads on + strand
        Reads2Minus     Number of variant-supporting reads on - strand
        VarAllele       Most frequent non-reference allele observed

mpileup2snp

This command calls SNPs from an mpileup file based on user-defined parameters:

        USAGE: varscan mpileup2snp [mpileup file] OPTIONS
        mpileup file - The SAMtools mpileup file

        OPTIONS:
        --min-coverage  Minimum read depth at a position to make a call [8]
        --min-reads2    Minimum supporting reads at a position to call variants [2]
        --min-avg-qual  Minimum base quality at a position to count a read [15]
        --min-var-freq  Minimum variant allele frequency threshold [0.01]
        --min-freq-for-hom      Minimum frequency to call homozygote [0.75]
        --p-value       Default p-value threshold for calling variants [99e-02]
        --strand-filter Ignore variants with >90% support on one strand [1]
        --output-vcf    If set to 1, outputs in VCF format
        --variants      Report only variant (SNP/indel) positions (mpileup2cns only) [0]


        OUTPUT
        Tab-delimited SNP calls with the following columns:
        Chrom           chromosome name
        Position        position (1-based)
        Ref                     reference allele at this position
        Var                     variant allele observed
        PoolCall        Cross-sample call using all data (Cons:Cov:Reads1:Reads2:Freq:P-value)
                        Cons - consensus genotype in IUPAC format
                        Cov - total depth of coverage
                        Reads1 - number of reads supporting reference
                        Reads2 - number of reads supporting variant
                        Freq - the variant allele frequency by read count
                        P-value - FET p-value of observed reads vs expected non-variant
        StrandFilt      Information to look for strand bias using all reads (R1+:R1-:R2+:R2-:pval)
                        R1+ = reference supporting reads on forward strand
                        R1- = reference supporting reads on reverse strand
                        R2+ = variant supporting reads on forward strand
                        R2- = variant supporting reads on reverse strand
                        pval = FET p-value for strand distribution, R1 versus R2
        SamplesRef      Number of samples called reference (wildtype)
        SamplesHet      Number of samples called heterozygous-variant
        SamplesHom      Number of samples called homozygous-variant
        SamplesNC       Number of samples not covered / not called
        SampleCalls     The calls for each sample in the mpileup, space-delimited
                        Each sample has six values separated by colons:
                        Cons - consensus genotype in IUPAC format
                        Cov - total depth of coverage
                        Reads1 - number of reads supporting reference
                        Reads2 - number of reads supporting variant
                        Freq - the variant allele frequency by read count
                        P-value - FET p-value of observed reads vs expected non-variant


mpileup2indel

This command calls indels from a mpileup file based on user-defined parameters:

        USAGE: varscan mpileup2indel [mpileup file] OPTIONS
        mpileup file - The SAMtools mpileup file

        OPTIONS:
        --min-coverage  Minimum read depth at a position to make a call [8]
        --min-reads2    Minimum supporting reads at a position to call variants [2]
        --min-avg-qual  Minimum base quality at a position to count a read [15]
        --min-var-freq  Minimum variant allele frequency threshold [0.01]
        --min-freq-for-hom      Minimum frequency to call homozygote [0.75]
        --p-value       Default p-value threshold for calling variants [99e-02]
        --strand-filter Ignore variants with >90% support on one strand [1]
        --output-vcf    If set to 1, outputs in VCF format
        --variants      Report only variant (SNP/indel) positions (mpileup2cns only) [0]


        OUTPUT
        Tab-delimited SNP calls with the following columns:
        Chrom           chromosome name
        Position        position (1-based)
        Ref                     reference allele at this position
        Var                     variant allele observed
        PoolCall        Cross-sample call using all data (Cons:Cov:Reads1:Reads2:Freq:P-value)
                                Cons - consensus genotype in IUPAC format
                                Cov - total depth of coverage
                                Reads1 - number of reads supporting reference
                                Reads2 - number of reads supporting variant
                                Freq - the variant allele frequency by read count
                                P-value - FET p-value of observed reads vs expected non-variant
        StrandFilt      Information to look for strand bias using all reads, format R1+:R1-:R2+:R2-:pval
                                R1+ = reference supporting reads on forward strand
                                R1- = reference supporting reads on reverse strand
                                R2+ = variant supporting reads on forward strand
                                R2- = variant supporting reads on reverse strand
                                pval = FET p-value for strand distribution, R1 versus R2
        SamplesRef      Number of samples called reference (wildtype)
        SamplesHet      Number of samples called heterozygous-variant
        SamplesHom      Number of samples called homozygous-variant
        SamplesNC       Number of samples not covered / not called
        SampleCalls     The calls for each sample in the mpileup, space-delimited
                        Each sample has six values separated by colons:
                        Cons - consensus genotype in IUPAC format
                        Cov - total depth of coverage
                        Reads1 - number of reads supporting reference
                        Reads2 - number of reads supporting variant
                        Freq - the variant allele frequency by read count
                        P-value - FET p-value of observed reads vs expected non-variant


mpileup2cns

This command makes consensus calls (SNP/Indel/Reference) from a mpileup file
based on user-defined parameters:

        USAGE: varscan mpileup2cns [mpileup file] OPTIONS
        mpileup file - The SAMtools mpileup file

        OPTIONS:
        --min-coverage  Minimum read depth at a position to make a call [8]
        --min-reads2    Minimum supporting reads at a position to call variants [2]
        --min-avg-qual  Minimum base quality at a position to count a read [15]
        --min-var-freq  Minimum variant allele frequency threshold [0.01]
        --min-freq-for-hom      Minimum frequency to call homozygote [0.75]
        --p-value       Default p-value threshold for calling variants [99e-02]
        --strand-filter Ignore variants with >90% support on one strand [1]
        --output-vcf    If set to 1, outputs in VCF format
        --variants      Report only variant (SNP/indel) positions (mpileup2cns only) [0]


        OUTPUT
        Tab-delimited SNP calls with the following columns:
        Chrom           chromosome name
        Position        position (1-based)
        Ref                     reference allele at this position
        Var                     variant allele observed
        PoolCall        Cross-sample call using all data (Cons:Cov:Reads1:Reads2:Freq:P-value)
                                Cons - consensus genotype in IUPAC format
                                Cov - total depth of coverage
                                Reads1 - number of reads supporting reference
                                Reads2 - number of reads supporting variant
                                Freq - the variant allele frequency by read count
                                P-value - FET p-value of observed reads vs expected non-variant
        StrandFilt      Information to look for strand bias using all reads, format R1+:R1-:R2+:R2-:pval
                                R1+ = reference supporting reads on forward strand
                                R1- = reference supporting reads on reverse strand
                                R2+ = variant supporting reads on forward strand
                                R2- = variant supporting reads on reverse strand
                                pval = FET p-value for strand distribution, R1 versus R2
        SamplesRef      Number of samples called reference (wildtype)
        SamplesHet      Number of samples called heterozygous-variant
        SamplesHom      Number of samples called homozygous-variant
        SamplesNC       Number of samples not covered / not called
        SampleCalls     The calls for each sample in the mpileup, space-delimited
                        Each sample has six values separated by colons:
                        Cons - consensus genotype in IUPAC format
                        Cov - total depth of coverage
                        Reads1 - number of reads supporting reference
                        Reads2 - number of reads supporting variant
                        Freq - the variant allele frequency by read count
                        P-value - FET p-value of observed reads vs expected non-variant


somatic

This command calls variants and identifies their somatic status (Germline/LOH/
Somatic) using pileup files from a matched tumor-normal pair.

        USAGE: varscan somatic [normal_pileup] [tumor_pileup] [output] OPTIONS
        normal_pileup - The SAMtools pileup file for Normal
        tumor_pileup - The SAMtools pileup file for Tumor
        output - Output base name for SNP and indel output

You can also give it a single mpileup file with normal and tumor data.


        USAGE: varscan somatic [normal-tumor.mpileup] [output] --mpileup 1 OPTIONS
        normal-tumor.mpileup - The SAMtools mpileup file with normal and then tumor
        output - Output base name for SNP and indel output

Both formats of the command share these common options:


        OPTIONS:
        --output-snp - Output file for SNP calls [default: output.snp]
        --output-indel - Output file for indel calls [default: output.indel]
        --min-coverage - Minimum coverage in normal and tumor to call variant [8]
        --min-coverage-normal - Minimum coverage in normal to call somatic [8]
        --min-coverage-tumor - Minimum coverage in tumor to call somatic [6]
        --min-var-freq - Minimum variant frequency to call a heterozygote [0.10]
        --min-freq-for-hom      Minimum frequency to call homozygote [0.75]
        --normal-purity - Estimated purity (non-tumor content) of normal sample [1.00]
        --tumor-purity - Estimated purity (tumor content) of tumor sample [1.00]
        --p-value - P-value threshold to call a heterozygote [0.99]
        --somatic-p-value - P-value threshold to call a somatic site [0.05]
        --strand-filter - If set to 1, removes variants with >90% strand bias
        --validation - If set to 1, outputs all compared positions even if non-variant

Note that more specific options (e.g. min-coverage-normal) will override the
default or specificied value of less specific options (e.g. min-coverage).

The normal and tumor purity values should be a value between 0 and 1. The
default (1) implies that the normal is 100% pure with no contaminating tumor
cells, and the tumor is 100% pure with no contaminating stromal or other
non-malignant cells. You would change tumor-purity to something less than 1 if
you have a low-purity tumor sample and thus expect lower variant allele
frequencies for mutations. You would change normal-purity to something less
than 1 only if it's possible that there will be some tumor content in your
"normal" sample, e.g. adjacent normal tissue for a solid tumor, malignant blood
cells in the skin punch normal for some liquid tumors, etc.

There are two p-value options. One (p-value) is the significance threshold for
the first-pass algorithm that determines, for each position, if either normal
or tumor is variant at that position. The second (somatic-p-value) is more
important; this is the threshold below which read count differences between
tumor and normal are deemed significant enough to classify the sample as a
somatic mutation or an LOH event. In the case of a shared (germline) variant,
this p-value is used to determine if the combined normal and tumor evidence
differ significantly enough from the null hypothesis (no variant with same
coverage) to report the variant. See the somatic mutation calling section for
details.


        OUTPUT
        Two tab-delimited files (SNPs and Indels) with the following columns:
        chrom                                   chromosome name
        position                                position (1-based from the pileup)
        ref                                             reference allele at this position
        var                                             variant allele at this position
        normal_reads1                   reads supporting reference allele
        normal_reads2                   reads supporting variant allele
        normal_var_freq                 frequency of variant allele by read count
        normal_gt                               genotype call for Normal sample
        tumor_reads1                    reads supporting reference allele
        tumor_reads2                    reads supporting variant allele
        tumor_var_freq                  frequency of variant allele by read count
        tumor_gt                                genotype call for Tumor sample
        somatic_status                  status of variant (Germline, Somatic, or LOH)
        variant_p_value                 Significance of variant read count vs. baseline error rate
        somatic_p_value                 Significance of tumor read count vs. normal read count
        tumor_reads1_plus       Ref-supporting reads from + strand in tumor
        tumor_reads1_minus      Ref-supporting reads from - strand in tumor
        tumor_reads2_plus       Var-supporting reads from + strand in tumor
        tumor_reads2_minus              Var-supporting reads from - strand in tumor


copynumber

This command calls variants and identifies their somatic status (Germline/LOH/
Somatic) using pileup files from a matched tumor-normal pair.

        USAGE: varscan copynumber [normal_pileup] [tumor_pileup] [output] OPTIONS
        normal_pileup - The SAMtools pileup file for Normal
        tumor_pileup - The SAMtools pileup file for Tumor
        output - Output base name for SNP and indel output

You can also give it a single mpileup file with normal and tumor data.


        USAGE: varscan copynumber [normal-tumor.mpileup] [output] --mpileup 1 OPTIONS
        normal-tumor.mpileup - The SAMtools mpileup file with normal and then tumor
        output - Output base name for SNP and indel output

Both formats of the command share these common options:


        OPTIONS:
        --min-base-qual - Minimum base quality to count for coverage [20]
        --min-map-qual - Minimum read mapping quality to count for coverage [20]
        --min-coverage - Minimum coverage threshold for copynumber segments [20]
        --min-segment-size - Minimum number of consecutive bases to report a segment [10]
        --max-segment-size - Max size before a new segment is made [100]
        --p-value - P-value threshold for significant copynumber change-point [0.01]
        --data-ratio - The normal/tumor input data ratio for copynumber adjustment [1.0]

Note: The data ratio is intended to help you account for overall differences in
the amount of sequencing coverage between normal and tumor, which might
otherwise give the appearance of global copy number differences. If normal has
more data than tumor, set this to something greater than 1. If tumor has more
data than normal, adjust it to something below 1. A basic formula for data
ratio might be something like ratio = normal_unique_bp / tumor_unique_bp where
unique base pairs are computed as mapped_non_dup_reads * read_length.


        OUTPUT
        chrom                           Chromosome name
        chr_start                       Region start position (1-based from the pileup)
        chr_stop                        Region stop position (1-based from the pileup)
    num_positions               Size of the region in base pairs
    normal_depth                Average normal sequence depth for the region
    tumor_depth                 Average tumor sequence depth for the region
    log2_ratio                  Log-base-2 ratio of: adjusted tumor depth over normal depth
    gc_content                  Estimated GC content of the region (0-100)

The raw regions reported by VarScan are delineated by drops in coverage or
changes in the tumor/normal ratio, so there are many small, nearby regions with
similar copy number. It is therefore recommended that raw VarScan copynumber
output be processed with circular binary segmentation (CBS) or a similar
algorithm, which will generate larger segments delineated by statistically
significant change points. See the copy number calling section for details.

filter

This command filters variants in a file by coverage, supporting reads, variant
frequency, or average base quality. It is for use with output from pileup2snp
or pileup2indel.

        USAGE: varscan filter [variants file] OPTIONS
        variants file - A file of SNP or indel calls from VarScan pileup2snp or pileup2indel

        OPTIONS:
        --min-coverage  Minimum read depth at a position to make a call [10]
        --min-reads2    Minimum supporting reads at a position to call variants [2]
        --min-strands2  Minimum # of strands on which variant observed (1 or 2) [1]
        --min-avg-qual  Minimum average base quality for variant-supporting reads [20]
        --min-var-freq  Minimum variant allele frequency threshold [0.20]
        --p-value       Default p-value threshold for calling variants [1e-01]
        --indel-file    File of indels for filtering nearby SNPs, from pileup2indel command
        --output-file   File to contain variants passing filters



somaticFilter

This command filters somatic mutation calls to remove clusters of false
positives and SNV calls near indels. Note: this is a basic filter. More
advanced filtering strategies consider mapping quality, read mismatches,
soft-trimming, and other factors when deciding whether or not to filter a
variant. See the VarScan 2 publication (Koboldt et al, Genome Research, Feb
2012) for details.

        USAGE: varscan somaticFilter [mutations file] OPTIONS
        mutations file - A file of SNVs from VarScan somatic

        OPTIONS:
        --min-coverage  Minimum read depth [10]
        --min-reads2    Minimum supporting reads for a variant [2]
        --min-strands2  Minimum # of strands on which variant observed (1 or 2) [1]
        --min-avg-qual  Minimum average base quality for variant-supporting reads [20]
        --min-var-freq  Minimum variant allele frequency threshold [0.20]
        --p-value       Default p-value threshold for calling variants [1e-01]
        --indel-file    File of indels for filtering nearby SNPs
        --output-file   Optional output file for filtered variants


limit

This command limits variants in a file to a set of positions or regions

USAGE: varscan limit [infile] OPTIONS
        infile - A file of chromosome-positions, tab-delimited

        OPTIONS
        --positions-file - a file of chromosome-positions, tab delimited
        --regions-file - a file of chromosome-start-stops, tab delimited
        --output-file - Output file for the matching variants


readcounts

This command reports the read counts for each base at positions in a pileup
file

USAGE: varscan readcounts [pileup file] OPTIONS
        pileup file - The SAMtools pileup file

        OPTIONS:
        --variants-file A list of variants at which to report readcounts
        --output-file   Output file to contain the readcounts
        --min-coverage  Minimum read depth at a position to make a call [8]
        --min-base-qual Minimum base quality at a position to count a read [30]


compare

This command performs set-comparison operations on two files of variants.

USAGE: varscan compare [file1] [file2] [type] [output] OPTIONS
        file1 - A file of chromosome-positions, tab-delimited
        file2 - A file of chromosome-positions, tab-delimited
        type - Type of comparison [intersect|merge|unique1|unique2]
        output - Output file for the comparison result



For detailed usage information, see the VarScan JavaDoc.




VarScan Documentation (v2.2.2 and before)


        USAGE: varscan  [COMMAND] [OPTIONS]

        COMMANDS
        pileup2snp [pileup file]
        pileup2indel [pileup file]
        pileup2cns [pileup file]
        somatic [normal pileup] [tumor pileup]
        filter [variants file]
        somaticFilter [mutations file]
        limit [variants file]
        readcounts [pileup file]
        compare [file1] [file2]



pileup2snp

This command calls SNPs from a pileup file based on user-defined parameters:

        USAGE: varscan pileup2snp [pileup file] OPTIONS
        pileup file - The SAMtools pileup file

        OPTIONS:
        --min-coverage  Minimum read depth at a position to make a call [10]
        --min-reads2    Minimum supporting reads at a position to call variants [2]
        --min-avg-qual  Minimum base quality at a position to count a read [15]
        --min-var-freq  Minimum variant allele frequency threshold [0.01]
        --p-value       Default p-value threshold for calling variants [99e-02]

        OUTPUT
        Tab-delimited SNP calls with the following columns:
        Chrom           chromosome name
        Position        position (1-based)
        Ref             reference allele at this position
        Var             variant allele at this position
        Reads1          reads supporting reference allele
        Reads2          reads supporting variant allele
        VarFreq         frequency of variant allele by read count
        Strands1        strands on which reference allele was observed
        Strands2        strands on which variant allele was observed
        Qual1           average base quality of reference-supporting read bases
        Qual2           average base quality of variant-supporting read bases
        Pvalue          Significance of variant read count vs. expected baseline error


pileup2indel

This command calls indels from a pileup file based on user-defined parameters:

        USAGE: varscan pileup2indel [pileup file] OPTIONS
        pileup file - The SAMtools pileup file

        OPTIONS:
        --min-coverage  Minimum read depth at a position to make a call [8]
        --min-reads2    Minimum supporting reads at a position to call variants [2]
        --min-avg-qual  Minimum base quality at a position to count a read [15]
        --min-var-freq  Minimum variant allele frequency threshold [0.01]
        --p-value       Default p-value threshold for calling variants [99e-02]

        OUTPUT
        Tab-delimited indel calls with the following columns:
        Chrom           chromosome name
        Position        position (1-based)
        Ref             reference allele at this position
        Var             variant allele at this position
        Reads1          reads supporting reference allele
        Reads2          reads supporting variant allele
        VarFreq         frequency of variant allele by read count
        Strands1        strands on which reference allele was observed
        Strands2        strands on which variant allele was observed
        Qual1           average base quality of reference-supporting read bases
        Qual2           average base quality of variant-supporting read bases
        Pvalue          Significance of variant read count vs. expected baseline error


pileup2cns

This command makes consensus calls (SNP/Indel/Reference) from a pileup file
based on user-defined parameters:

        USAGE: varscan pileup2cns [pileup file] OPTIONS
        pileup file - The SAMtools pileup file

        OPTIONS:
        --min-coverage  Minimum read depth at a position to make a call [8]
        --min-reads2    Minimum supporting reads at a position to call variants [2]
        --min-avg-qual  Minimum base quality at a position to count a read [15]
        --min-var-freq  Minimum variant allele frequency threshold [0.01]
        --p-value       Default p-value threshold for calling variants [99e-02]

        OUTPUT
        Tab-delimited consensus calls with the following columns:
        Chrom           chromosome name
        Position        position (1-based)
        Ref             reference allele at this position
        Var             consensus call (reference, IUPAC SNP code, or indel)
        Reads1          reads supporting reference allele
        Reads2          reads supporting variant allele
        VarFreq         frequency of variant allele by read count
        Strands1        strands on which reference allele was observed
        Strands2        strands on which variant allele was observed
        Qual1           average base quality of reference-supporting read bases
        Qual2           average base quality of variant-supporting read bases
        Pvalue          Significance of variant read count vs. expected baseline error


somatic

This command calls variants and identifies their somatic status (Germline/LOH/
Somatic) using pileup files from a matched tumor-normal pair.

        USAGE: varscan somatic [normal_pileup] [tumor_pileup] [output] OPTIONS
        normal_pileup - The SAMtools pileup file for Normal
        tumor_pileup - The SAMtools pileup file for Tumor
        output - Output base name for SNP and indel output

        OPTIONS:
        --output-snp    Output file for SNP calls [output.snp]
        --output-indel  Output file for indel calls [output.indel]
        --min-coverage  Minimum coverage in normal and tumor to call variant [10]
        --min-coverage-normal   Minimum coverage in normal to call somatic [10]
        --min-coverage-tumor    Minimum coverage in tumor to call somatic [5]
        --min_var_freq  Minimum variant frequency to call a heterozygote [0.20]
        --p-value       P-value threshold to call a heterozygote [1.0e-01]
        --somatic-p-value       P-value threshold to call a somatic site [1.0e-04]

        OUTPUT
        Two tab-delimited files (SNPs and Indels) with the following columns:
        Chrom           chromosome name
        Position        position (1-based)
        Ref             reference allele at this position
        Var             variant allele at this position
        Normal_Reads1   reads supporting reference allele
        Normal_Reads2   reads supporting variant allele
        Normal_VarFreq  frequency of variant allele by read count
        Normal_Gt       genotype call for Normal sample
        Tumor_Reads1    reads supporting reference allele
        Tumor_Reads2    reads supporting variant allele
        Tumor_VarFreq   frequency of variant allele by read count
        Tumor_Gt        genotype call for Tumor sample
        Somatic_Status  status of variant (Germline, Somatic, or LOH)
        Pvalue          Significance of variant read count vs. expected baseline error
        Somatic_Pvalue  Significance of tumor read count vs. normal read count


filter

This command filters variants in a file by coverage, supporting reads, variant
frequency, or average base quality

        USAGE: varscan filter [variants file] OPTIONS
        variants file - A file of SNP or indel calls from VarScan

        OPTIONS:
        --min-coverage  Minimum read depth at a position to make a call [8]
        --min-reads2    Minimum supporting reads at a position to call variants [2]
        --min-avg-qual  Minimum base quality at a position to count a read [15]
        --min-var-freq  Minimum variant allele frequency threshold [0.01]
        --p-value       Default p-value threshold for calling variants [99e-02]


somaticFilter

This command filters somatic mutation calls to remove clusters of false
positives and SNV calls near indels.

        USAGE: varscan somaticFilter [mutations file] OPTIONS
        mutations file - A file of SNVs from VarScan somatic

        OPTIONS:
        --min-coverage  Minimum read depth [10]
        --min-reads2    Minimum supporting reads for a variant [2]
        --min-strands2  Minimum # of strands on which variant observed (1 or 2) [1]
        --min-avg-qual  Minimum average base quality for variant-supporting reads [20]
        --min-var-freq  Minimum variant allele frequency threshold [0.20]
        --p-value       Default p-value threshold for calling variants [1e-01]
        --indel-file    File of indels for filtering nearby SNPs
        --output-file   Optional output file for filtered variants


limit

This command limits variants in a file to a set of positions or regions

USAGE: varscan limit [infile] OPTIONS
        infile - A file of chromosome-positions, tab-delimited

        OPTIONS
        --positions-file - a file of chromosome-positions, tab delimited
        --regions-file - a file of chromosome-start-stops, tab delimited
        --output-file - Output file for the matching variants


readcounts

This command reports the read counts for each base at positions in a pileup
file

USAGE: varscan readcounts [pileup file] OPTIONS
        pileup file - The SAMtools pileup file

        OPTIONS:
        --variants-file A list of variants at which to report readcounts
        --output-file   Output file to contain the readcounts
        --min-coverage  Minimum read depth at a position to make a call [8]
        --min-base-qual Minimum base quality at a position to count a read [30]


compare

This command performs set-comparison operations on two files of variants.

USAGE: varscan compare [file1] [file2] [type] [output] OPTIONS
        file1 - A file of chromosome-positions, tab-delimited
        file2 - A file of chromosome-positions, tab-delimited
        type - Type of comparison [intersect|merge|unique1|unique2]
        output - Output file for the comparison result



For detailed usage information, see the VarScan JavaDoc.


How to Build a SAMtools (m)pileup File


The variant calling features of VarScan for single samples (pileup2snp,
pileup2indel, pileup2cns) and multiple samples (mpileup2snp, mpileup2indel,
mpileup2cns, and somatic) expect input in SAMtools pileup or mpileup format. In
current versions of SAMtools, the "pileup" command has now been replaced with
the "mpileup" command. For a single sample, these operate in a very similar
fashion, except that mpileup applies BAQ adjustments by default, and the output
is identical. When you give it multiple BAM files, however, SAMtools mpileup
generates a multi-sample pileup format that must be processed with the
mpileup2* commands in VarScan. To build a mpileup file, you will need:

  • One or more BAM files ("myData.bam") that have been sorted using the sort
    command of SAMtools.
  • The reference sequence ("reference.fasta") to which reads were aligned, in
    FASTA format.
  • The SAMtools software package.


Generate a mpileup file with the following command:


samtools mpileup -f [reference sequence] [BAM file(s)] >myData.mpileup


Note, to save disk space and file I/O, you can redirect mpileup output directly
to VarScan with a "pipe" command. For example:

One sample:
samtools mpileup -f reference.fasta myData.bam | java -jar VarScan.v2.2.jar pileup2snp

Multiple samples:
samtools mpileup -f reference.fasta sample1.bam sample2.bam | java -jar VarScan.v2.2.jar pileup2snp

Copyright © 2009-2013 by Washington University in St. Louis. Design by CSS
Templates