1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160
|
---
title: "TCGAbiolinks: Searching, downloading and visualizing mutation files"
date: "`r BiocStyle::doc_date()`"
vignette: >
%\VignetteIndexEntry{"5. Mutation data"}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_knit$set(progress = FALSE)
```
```{r message=FALSE, warning=FALSE, include=FALSE}
library(TCGAbiolinks)
library(SummarizedExperiment)
library(dplyr)
library(DT)
```
# Search and Download
**TCGAbiolinks** has provided a few functions to download mutation data from GDC.
There are two options to download the data:
1. Use `GDCquery`, `GDCdownload` and `GDCpreprare` to download MAF aligned against hg38
2. Use `GDCquery`, `GDCdownload` and `GDCpreprare` to download MAF aligned against hg19
3. Use `getMC3MAF()`, to download MC3 MAF from https://gdc.cancer.gov/about-data/publications/mc3-2017
## Mutation data (hg38)
This example will download Aggregate GDC MAFs.
For more information please access https://github.com/NCI-GDC/gdc-maf-tool and
[GDC docs](https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/).
```{r results = 'hide', echo=TRUE, message=FALSE, warning=FALSE,eval=F}
query <- GDCquery(
project = "TCGA-CHOL",
data.category = "Simple Nucleotide Variation",
access = "open",
legacy = FALSE,
data.type = "Masked Somatic Mutation",
workflow.type = "Aliquot Ensemble Somatic Variant Merging and Masking"
)
GDCdownload(query)
maf <- GDCprepare(query)
```
```{r results = 'hide', echo=TRUE, message=FALSE, warning=FALSE,eval=T,include=F}
maf <- chol_maf@data
```
```{r echo = TRUE, message = FALSE, warning = FALSE}
# Only first 50 to make render faster
datatable(maf[1:20,],
filter = 'top',
options = list(scrollX = TRUE, keys = TRUE, pageLength = 5),
rownames = FALSE)
```
## Mutation data (hg19)
This example will download MAF (mutation annotation files) aligned against hg19 (Old TCGA maf files)
```{r results = 'hide', echo=TRUE, message=FALSE, warning=FALSE}
query.maf.hg19 <- GDCquery(
project = "TCGA-CHOL",
data.category = "Simple nucleotide variation",
data.type = "Simple somatic mutation",
access = "open",
legacy = TRUE
)
```
```{r echo = TRUE, message = FALSE, warning = FALSE}
# Check maf availables
getResults(query.maf.hg19) %>%
dplyr::select(-contains("sample_type")) %>%
dplyr::select(-contains("cases")) %>%
DT::datatable(
filter = 'top',
options = list(scrollX = TRUE, keys = TRUE, pageLength = 10),
rownames = FALSE
)
```
```{r results = 'hide', echo=TRUE, message=FALSE, warning=FALSE,eval=FALSE}
query.maf.hg19 <- GDCquery(project = "TCGA-CHOL",
data.category = "Simple nucleotide variation",
data.type = "Simple somatic mutation",
access = "open",
file.type = "bcgsc.ca_CHOL.IlluminaHiSeq_DNASeq.1.somatic.maf",
legacy = TRUE)
GDCdownload(query.maf.hg19)
maf <- GDCprepare(query.maf.hg19)
```
```{r message=FALSE, warning=FALSE, include=FALSE}
data <- bcgsc.ca_CHOL.IlluminaHiSeq_DNASeq.1.somatic.maf
```
```{r echo = TRUE, message = FALSE, warning = FALSE}
# Only first 50 to make render faster
datatable(maf[1:20,],
filter = 'top',
options = list(scrollX = TRUE, keys = TRUE, pageLength = 5),
rownames = FALSE)
```
## Mutation data MC3 file
This will download the MC3 MAF file from https://gdc.cancer.gov/about-data/publications/mc3-2017,
and add project each sample belongs.
```{r results = 'hide', echo=TRUE, message=FALSE, warning=FALSE,eval=FALSE}
maf <- getMC3MAF()
```
# Visualize the data
To visualize the data you can use the Bioconductor package [maftools](https://bioconductor.org/packages/release/bioc/html/maftools.html). For more information, please check its [vignette](https://bioconductor.org/packages/release/bioc/vignettes/maftools/inst/doc/maftools.html#rainfall-plots).
```{r results = "hide",echo = TRUE, message = FALSE, warning = FALSE, eval=FALSE}
library(maftools)
library(dplyr)
query <- GDCquery(
project = "TCGA-CHOL",
data.category = "Simple Nucleotide Variation",
access = "open",
legacy = FALSE,
data.type = "Masked Somatic Mutation",
workflow.type = "Aliquot Ensemble Somatic Variant Merging and Masking"
)
GDCdownload(query)
maf <- GDCprepare(query)
maf <- maf %>% maftools::read.maf
```
```{r message=FALSE, warning=FALSE, include=FALSE}
library(maftools)
library(dplyr)
maf <- chol_maf
```
```{r results = "hide",echo = TRUE, message = FALSE, warning = FALSE}
datatable(getSampleSummary(maf),
filter = 'top',
options = list(scrollX = TRUE, keys = TRUE, pageLength = 5),
rownames = FALSE)
plotmafSummary(maf = maf, rmOutlier = TRUE, addStat = 'median', dashboard = TRUE)
```
```{r echo = TRUE, message = FALSE,eval = FALSE, warning = FALSE}
oncoplot(maf = maf, top = 10, removeNonMutated = TRUE)
titv = titv(maf = maf, plot = FALSE, useSyn = TRUE)
#plot titv summary
plotTiTv(res = titv)
```
|