File: writing.rst

package info (click to toggle)
cyvcf2 0.32.0-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 1,516 kB
  • sloc: python: 1,646; ansic: 240; makefile: 228; sh: 91
file content (93 lines) | stat: -rw-r--r-- 3,457 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
Modifying Existing VCFs
=======================

`cyvcf2` is optimized for fast reading and extraction from existing files.
However, it also offers some means of modifying existing VCFs.

Modifying the INFO field
------------------------

.. this is the same example as on the index.rst page

Here, we show an example of how to modify the INFO field at a locus to annotate
variants with the genes that they overlap.

.. code-block:: python

    from cyvcf2 import VCF, Writer
    vcf = VCF(VCF_PATH)
    # adjust the header to contain the new field
    # the keys 'ID', 'Description', 'Type', and 'Number' are required.
    vcf.add_info_to_header({'ID': 'gene', 'Description': 'overlapping gene',
        'Type':'Character', 'Number': '1'})

    # create a new vcf Writer using the input vcf as a template.
    fname = "out.vcf"
    w = Writer(fname, vcf)

    for v in vcf:
        # The get_gene_intersections function is not shown.
        # This could be any operation to find intersections
        # or any manipulation required by the user.
        genes = get_gene_intersections(v)
        if genes is not None:
            v.INFO["gene"] = ",".join(genes)
        w.write_record(v)

    w.close(); vcf.close()

Modifying genotypes and the FORMAT field
----------------------------------------

Here is an example where we filter some calls from records in a VCF. This demonstrates
how to add to the FORMAT field and how to modify genotypes at a locus.

.. code-block:: python

    from cyvcf2 import VCF, Writer
    vcf = VCF(VCF_PATH)
    # adjust the header to contain the new field
    # the keys 'ID', 'Description', 'Type', and 'Number' are required.
    vcf.add_format_to_header({
        'ID': 'FILTER_CODE',
        'Description': 'Numeric code for filtering reason',
        'Type': 'Integer',
        'Number': '1'
    })

    # create a new vcf Writer using the input vcf as a template.
    fname = "out.vcf"
    w = Writer(fname, vcf)

    for v in vcf:
        # The filter_samples function is not shown.
        # This could be any manipulation required by the user.
        # Since we specified the FILTER_CODE format field is an Integer
        # with Number=1, reasons must be an n x 1 numpy array of integers
        # where n is the number of samples.
        indicies, reasons = filter_samples(v)
        if indicies:
            # add the reasons array to the format dictionary at this locus
            v.set_format('FILTER_CODE', reasons)
            for index in indicies:
                # overwrite the genotypes of each filtered locus to be nocalls
                # Note: until the reassignment to v.genotypes below, this
                # leaves the v Variant object in an inconsistent state
                v.genotypes[index] = [-1]*v.ploidy + [False]
            # it is necessary to reassign the genotypes field
            # so that the v Variant object reprocess it and future calls
            # to functions like v.genotype.array() properly reflect
            # the changed genotypes
            v.genotypes = v.genotypes

        w.write_record(v)

    w.close(); vcf.close()

.. _Limitations with writing:

Known Limitations with Writing VCFs
-----------------------------------
* `cyvcf2` currently does not support writing VCFs encoded with UTF-8 with non-ASCII characters in the contents of string-typed FORMAT fields.
* `cyvcf2` currently does not support writing string type format fields with number>1.