File: snpSummary.Rd

package info (click to toggle)
r-bioc-variantannotation 1.52.0-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 4,372 kB
  • sloc: ansic: 1,357; makefile: 2
file content (99 lines) | stat: -rw-r--r-- 2,537 bytes parent folder | download | duplicates (5)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
\name{snpSummary}

\alias{snpSummary}
\alias{snpSummary,CollapsedVCF-method}


\title{Counts and distribution statistics for SNPs in a VCF object}

\description{
  Counts and distribution statistics for SNPs in a VCF object
}

\usage{
  \S4method{snpSummary}{CollapsedVCF}(x, ...)
}

\arguments{
  \item{x}{
    A \link{CollapsedVCF} object.
  }
  \item{\dots}{
    Additional arguments to methods.
  }
}

\details{
  Genotype counts, allele counts and Hardy Weinberg equilibrium
  (HWE) statistics are calculated for single nucleotide variants
  in a \link{CollapsedVCF} object. HWE has been established as a 
  useful quality filter on genotype data. This equilibrium should 
  be attained in a single generation of random mating. Departures
  from HWE are indicated by small p values and are almost invariably 
  indicative of a problem with genotype calls.

  The following caveats apply:
  \itemize{
    \item No distinction is made between phased and unphased genotypes. 
    \item Only diploid calls are included.
    \item Only `valid' SNPs are included. A `valid' SNP is defined
          as having a reference allele of length 1 and a single 
          alternate allele of length 1.
  }
  Variants that do not meet these criteria are set to NA. 
}

\value{
  The object returned is a \code{data.frame} with seven columns.
  \describe{
    \item{g00}{
      Counts for genotype 00 (homozygous reference).
    }
    \item{g01}{
      Counts for genotype 01 or 10 (heterozygous).
    }
    \item{g11}{
      Counts for genotype 11 (homozygous alternate).
    }
    \item{a0Freq}{
      Frequency of the reference allele.
    }
    \item{a1Freq}{
      Frequency of the alternate allele.
    }
    \item{HWEzscore}{
      Z-score for departure from a null hypothesis of Hardy Weinberg equilibrium.
    }
    \item{HWEpvalue}{
      p-value for departure from a null hypothesis of Hardy Weinberg equilibrium.
    }
  }
}

\author{
  Chris Wallace <cew54@cam.ac.uk>
}

\seealso{
  \link{genotypeToSnpMatrix},
  \link{probabilityToSnpMatrix}
}

\examples{
  fl <- system.file("extdata", "ex2.vcf", package="VariantAnnotation")
  vcf <- readVcf(fl, "hg19")

  ## The return value is a data.frame with genotype counts
  ## and allele frequencies.
  df <- snpSummary(vcf)
  df

  ## Compare to ranges in the VCF object:
  rowRanges(vcf)

  ## No statistics were computed for the variants in rows 3, 4 
  ## and 5. They were omitted because row 3 has two alternate 
  ## alleles, row 4 has none and row 5 is not a SNP.
}

\keyword{manip}