File: glossary.rst

package info (click to toggle)
python-pysam 0.15.4%2Bds-3
  • links: PTS, VCS
  • area: main
  • in suites: bullseye
  • size: 27,992 kB
  • sloc: ansic: 140,738; python: 7,881; sh: 265; makefile: 223; perl: 41
file content (119 lines) | stat: -rw-r--r-- 3,805 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
========
Glossary
========

.. glossary::
   :sorted:

   cigar
      Stands for Compact Idiosyncratic Gapped Alignment Report and
      represents a compressed (run-length encoded) pairwise alignment
      format.  It was first defined by the Exonerate Aligner, but was alter
      adapted and adopted as part of the :term:`SAM` standard and many other
      aligners.  In the Python API, the cigar alignment is presented as a
      list of tuples ``(operation,length)``.  For example, the tuple ``[
      (0,3), (1,5), (0,2) ]`` refers to an alignment with 3 matches, 5
      insertions and another 2 matches.

   region
      A genomic region, stated relative to a reference sequence. A
      region consists of reference name ('chr1'), start (10000), and
      end (20000). Start and end can be omitted for regions spanning
      a whole chromosome. If end is missing, the region will span from
      start to the end of the chromosome. Within pysam, coordinates
      are 0-based, half-open intervals, i.e., the position 10,000 is
      part of the interval, but 20,000 is not. An exception are
      :term:`samtools` compatible region strings such as
      'chr1:10000-20000', which are closed, i.e., both positions 10,000
      and 20,000 are part of the interval.

   column
      Reads that are aligned to a base in the :term:`reference` sequence.

   tid
      The :term:`target` id. The target id is 0 or a positive integer mapping to
      entries within the sequence dictionary in the header section of
      a :term:`TAM` file or :term:`BAM` file.

   contig
      The sequence that a :term:`tid` refers to. For example ``chr1``, ``contig123``.

   Reference
      Synonym for contig

   SAM
       A textual format for storing genomic alignment information.

   BAM
       Binary SAM format. BAM files are binary formatted, indexed and
       allow random access.

   TAM
       Text SAM file. TAM files are human readable files of
       tab-separated fields. TAM files do not allow random access.

   sam file
       A file containing aligned reads. The :term:`sam file` can either
       be a :term:`BAM` file or a :term:`TAM` file.

   pileup
      Pileup

   samtools
      The samtools_ package.

   csamtools
      The samtools_ C-API.

   fetching
      Retrieving all mapped reads mapped to a :term:`region`.

   target
      The sequence that a read has been aligned to. Target
      sequences have bot a numerical identifier (:term:`tid`)
      and an alphanumeric name (:term:`Reference`).

   tabix file
      A sorted, compressed and indexed tab-separated file created
      by the command line tool :file:`tabix` or the commands
      :meth:`tabix_compress` and :meth:`tabix_index`. The file
      is indexed by chromosomal coordinates.

   tabix row
      A row in a :term:`tabix file`. Fields within a row are
      tab-separated.

   soft clipping
   soft clipped

      In alignments with soft clipping part of the query sequence
      are not aligned. The unaligned query sequence is still part
      of the alignment record. This is in difference to
      :term:`hard clipped` reads.

   hard clipping
   hard clipped

      In hard clipped reads, part of the sequence has been removed
      prior to alignment. That only a subsequence is aligend might be
      recorded in the :term:`cigar` alignment, but the removed
      sequence will not be part of the alignment record, in contrast
      to :term:`soft clipped` reads.

   VCF
      Variant call format

   BCF
      Binary :term:`VCF`

   tabix
      Utility in the htslib package to index :term:`bgzip` compressed
      files.

   faidx
      Utility in the samtools package to index :term:`fasta` formatted
      files.

   bgzip
      Utility in the htslib package to block compress genomic data
      files.