1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168
|
---
title: "Making TxDb Objects"
author:
- name: Marc Carlson
- name: Patrick Aboyoun
- name: Hervé Pagès
- name: Seth Falcon
- name: Martin Morgan
date: "Compiled `r doc_date()`; Modified 4 April 2024"
package: txdbmaker
vignette: |
%\VignetteIndexEntry{Making TxDb Objects}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
%\VignetteKeywords{annotation, database}
output:
BiocStyle::html_document
---
# Introduction
The `r Biocpkg("txdbmaker")` package provides functions to make `TxDb`
objects from genomic annotation provided by the UCSC Genome Browser
(https://genome.ucsc.edu/), Ensembl (https://ensembl.org/),
BioMart (http://www.biomart.org/), or directly from a GFF or GTF file.
In this document we will quickly demonstrate the use of these
functions.
Note that the package also provides a lower-level utility, `makeTxDb()`,
for creating `TxDb` objects from data directly supplied by the user.
Please refer to its man page (`?makeTxDb`) for more information.
See vignette in the `r Biocpkg("GenomicFeatures")` package for an
introduction to `TxDb` objects.
# Installing the `txdbmaker` package
Install the package with:
```{r installtxdbmaker, eval=FALSE}
if (!require("BiocManager", quietly=TRUE))
install.packages("BiocManager")
BiocManager::install("txdbmaker")
```
Then load it with:
```{r loadtxdbmaker}
suppressPackageStartupMessages(library(txdbmaker))
```
# Using `makeTxDbFromUCSC`
The function `makeTxDbFromUCSC` downloads UCSC
Genome Bioinformatics transcript tables (e.g. `knownGene`,
`refGene`, `ensGene`) for a genome build (e.g.
`mm9`, `hg19`). Use the `supportedUCSCtables`
utility function to get the list of tables known to work with
`makeTxDbFromUCSC`.
```{r supportedUCSCtables}
supportedUCSCtables(genome="mm9")
mm9KG_txdb <- makeTxDbFromUCSC(genome="mm9", tablename="knownGene")
mm9KG_txdb
```
See `?makeTxDbFromUCSC` for more information.
# Using `makeTxDbFromBiomart`
Retrieve data from BioMart by specifying the mart and the data set to
the `makeTxDbFromBiomart` function (not all BioMart
data sets are currently supported):
```{r makeTxDbFromBiomart, eval=FALSE}
mmusculusEnsembl <- makeTxDbFromBiomart(dataset="mmusculus_gene_ensembl")
```
As with the `makeTxDbFromUCSC` function, the
`makeTxDbFromBiomart` function also has a
`circ_seqs` argument that will default to using the contents
of the `DEFAULT_CIRC_SEQS` vector. And just like those UCSC
sources, there is also a helper function called
`getChromInfoFromBiomart` that can show what the different
chromosomes are called for a given source.
Using the `makeTxDbFromBiomart`
`makeTxDbFromUCSC` functions can take a while and
may also require some bandwidth as these methods have to download and
then assemble a database from their respective sources. It is not
expected that most users will want to do this step every time.
Instead, we suggest that you save your annotation objects and label
them with an appropriate time stamp so as to facilitate reproducible
research.
See `?makeTxDbFromBiomart` for more information.
# Using `makeTxDbFromEnsembl`
The `makeTxDbFromEnsembl` function creates a `TxDb` object
for a given organism by importing the genomic locations of its transcripts,
exons, CDS, and genes from an Ensembl database.
See `?makeTxDbFromEnsembl` for more information.
# Using `makeTxDbFromGFF`
You can also extract transcript information from either GFF3 or GTF
files by using the `makeTxDbFromGFF` function.
Usage is similar to `makeTxDbFromBiomart` and
`makeTxDbFromUCSC`.
See `?makeTxDbFromGFF` for more information.
# Saving and Loading a `TxDb` Object
Once a `TxDb` object has been created, it can be saved
to avoid the time and bandwidth costs of recreating it and to make it
possible to reproduce results with identical genomic feature data at a
later date. Since `TxDb` objects are backed by a
SQLite database, the save format is a SQLite database file (which
could be accessed from programs other than R if desired). Note that
it is not possible to serialize a `TxDb` object using
R's `save` function.
```{r saveDb, results="hide"}
saveDb(mm9KG_txdb, file="mm9KG_txdb.sqlite")
```
And as was mentioned earlier, a saved `TxDb` object can
be initialized from a .sqlite file by simply using `loadDb`.
```{r loadDb}
mm9KG_txdb <- loadDb("mm9KG_txdb.sqlite")
```
# Using `makeTxDbPackageFromUCSC` and `makeTxDbPackageFromBiomart`
It is often much more convenient to just make an annotation package
out of your annotations. If you are finding that this is the case,
then you should consider the convenience functions:
`makeTxDbPackageFromUCSC` and
`makeTxDbPackageFromBiomart`. These functions are similar
to `makeTxDbFromUCSC` and
`makeTxDbFromBiomart` except that they will take the
extra step of actually wrapping the database up into an annotation
package for you. This package can then be installed and used as of
the standard TxDb packages found on in the Bioconductor
repository.
# Session Information
```{r SessionInfo, echo=FALSE}
sessionInfo()
```
|