File: multisample.feature

package info (click to toggle)
bio-vcf 0.9.5-4
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 1,208 kB
  • sloc: ruby: 2,812; sh: 74; lisp: 48; makefile: 4
file content (90 lines) | stat: -rw-r--r-- 4,494 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
@multi
Feature: Multi-sample VCF

  Here we take a VCF line and parse the information for multiple named 
  samples

  Scenario: When parsing a record

    Given the multi sample header line
    """
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  Original	s1t1	s2t1	s3t1	s1t2	s2t2	s3t2
    """
    When I parse the header
    Given multisample vcf line
    """
1       10321   .       C       T       106.30  .       AC=5;AF=0.357;AN=14;BaseQRankSum=3.045;DP=1537;Dels=0.01;FS=5.835;HaplotypeScore=220.1531;MLEAC=5;MLEAF=0.357;MQ=26.69;MQ0=258;MQRankSum=-4.870;QD=0.10;ReadPosRankSum=0.815    GT:AD:DP:GQ:PL  0/1:189,25:218:30:30,0,810      0/0:219,22:246:24:0,24,593      0/1:218,27:248:34:34,0,1134     0/0:220,22:248:56:0,56,1207     0/1:168,23:193:19:19,0,493      0/1:139,22:164:46:46,0,689      0/1:167,26:196:20:20,0,522    
    """
    When I parse the record
    Then I expect rec.valid? to be true
    Then I expect rec.chrom to contain "1"
    Then I expect rec.pos to contain 10321
    Then I expect rec.ref to contain "C"
    And I expect multisample rec.alt to contain ["T"]
    And I expect rec.qual to be 106.30
    And I expect rec.info.ac to be 5
    And I expect rec.info.af to be 0.357
    And I expect rec.info.dp to be 1537
    And I expect rec.info['dp'] to be 1537
    And I expect rec.info.readposranksum to be 0.815
    And I expect rec.info['ReadPosRankSum'] to be 0.815
    And I expect rec.info.fields to contain ["AC", "AF", "AN", "BASEQRANKSUM", "DP", "DELS", "FS", "HAPLOTYPESCORE", "MLEAC", "MLEAF", "MQ", "MQ0", "MQRANKSUM", "QD", "READPOSRANKSUM"]
    And I expect rec.sample['Original'].ad to be [189,25]
    And I expect rec.sample['Original'].gt to be "0/1"
    And I expect rec.sample['s3t2'].ad to be [167,26]
    And I expect rec.sample['s3t2'].dp to be 196 
    And I expect rec.sample['s3t2'].gq to be 20
    And I expect rec.sample['s3t2'].pl to be [20,0,522]
    # And the nicer self resolving
    And I expect rec.sample.original.gt to be "0/1"
    And I expect rec.sample.s3t2.pl to be [20,0,522]
    # And the even better
    And I expect rec.original.gt? to be true
    And I expect rec.original.gt to be "0/1"
    And I expect rec.s3t2.pl to be [20,0,522]
    # Check for missing data
    And I expect test rec.missing_samples? to be false 
    And I expect test rec.original? to be true
    # Special functions
    And I expect r.original? to be true
    And I expect r.original.gti? to be true
    And I expect r.original.gti to be [0,1]
    And I expect r.original.gti[1] to be 1
    And I expect r.original.gts? to be true
    And I expect r.original.gts to be ["C","T"]
    And I expect r.original.gts[1] to be "T"

    Given multisample vcf line
    """
1 10723 . C G 73.85 . AC=4;AF=0.667;AN=6;BaseQRankSum=1.300;DP=18;Dels=0.00;FS=3.680;HaplotypeScore=0.0000;MLEAC=4;MLEAF=0.667;MQ=20.49;MQ0=11;MQRankSum=1.754;QD=8.21;ReadPosRankSum=0.000 GT:AD:DP:GQ:PL  ./. ./. 1/1:2,2:4:6:66,6,0  1/1:4,1:5:3:36,3,0  ./. ./.  0/0:6,0:6:3:0,3,33
    """
    When I parse the record
    Then I expect rec.pos to contain 10723
    Then I expect rec.valid? to be true
    And I expect rec.original? to be false
    And I expect rec.sample.s1t1? to be false
    And I expect rec.sample.s3t2? to be true
    And I expect rec.missing_samples? to be true

    # Phased genotype
    Given multisample vcf line
    """
1 10723 . C G 73.85 . AC=4;AF=0.667;AN=6;BaseQRankSum=1.300;DP=18;Dels=0.00;FS=3.680;HaplotypeScore=0.0000;MLEAC=4;MLEAF=0.667;MQ=20.49;MQ0=11;MQRankSum=1.754;QD=8.21;ReadPosRankSum=0.000 GT:AD:DP:GQ:PL  0|1 ./. 1/1:2,2:4:6:66,6,0  1/1:4,1:5:3:36,3,0  ./. ./.  0/0:6,0:6:3:0,3,33
    """
    When I parse the record
    Then I expect rec.pos to contain 10723
    Then I expect rec.valid? to be true
    And I expect r.original? to be true
    And I expect r.original.gts? to be true
    And I expect r.original.gts to be ["C","G"]
    And I expect r.original.gts[0] to be "C"
    And I expect r.original.gts[1] to be "G"
    
    # INFO fields with matching tails
    Given multisample vcf line
    """
1 10723 . C G 73.85 . AC=4;AF=0.667;CIEND=999;END=111;AN=6;BaseQRankSum=1.300;DP=18;Dels=0.00;FS=3.680;HaplotypeScore=0.0000;MLEAC=4;MLEAF=0.667;MQ=20.49;MQ0=11;MQRankSum=1.754;QD=8.21;ReadPosRankSum=0.000 GT:AD:DP:GQ:PL  0|1 ./. 1/1:2,2:4:6:66,6,0  1/1:4,1:5:3:36,3,0  ./. ./.  0/0:6,0:6:3:0,3,33
    """
    When I parse the record
    Then I expect r.info.end to be 111
    And I expect r.info.ciend to be 999