File: txdbmaker.Rmd

package info (click to toggle)
r-bioc-txdbmaker 1.2.1%2Bds-2
  • links: PTS, VCS
  • area: main
  • in suites: sid, trixie
  • size: 3,168 kB
  • sloc: makefile: 2
file content (168 lines) | stat: -rw-r--r-- 5,050 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
---
title: "Making TxDb Objects"
author:
- name: Marc Carlson
- name: Patrick Aboyoun
- name: Hervé Pagès
- name: Seth Falcon
- name: Martin Morgan
date: "Compiled `r doc_date()`;  Modified 4 April 2024"
package: txdbmaker
vignette: |
  %\VignetteIndexEntry{Making TxDb Objects}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
  %\VignetteKeywords{annotation, database}
output:
  BiocStyle::html_document
---


# Introduction

The `r Biocpkg("txdbmaker")` package provides functions to make `TxDb`
objects from genomic annotation provided by the UCSC Genome Browser
(https://genome.ucsc.edu/), Ensembl (https://ensembl.org/),
BioMart (http://www.biomart.org/), or directly from a GFF or GTF file.

In this document we will quickly demonstrate the use of these
functions.

Note that the package also provides a lower-level utility, `makeTxDb()`,
for creating `TxDb` objects from data directly supplied by the user.
Please refer to its man page (`?makeTxDb`) for more information.

See vignette in the `r Biocpkg("GenomicFeatures")` package for an
introduction to `TxDb` objects.


# Installing the `txdbmaker` package

Install the package with:

```{r installtxdbmaker, eval=FALSE}
if (!require("BiocManager", quietly=TRUE))
    install.packages("BiocManager")

BiocManager::install("txdbmaker")
```

Then load it with:

```{r loadtxdbmaker}
suppressPackageStartupMessages(library(txdbmaker))
```


# Using `makeTxDbFromUCSC`

The function `makeTxDbFromUCSC` downloads UCSC
Genome Bioinformatics transcript tables (e.g. `knownGene`,
`refGene`, `ensGene`) for a genome build (e.g.
`mm9`, `hg19`).  Use the `supportedUCSCtables`
utility function to get the list of tables known to work with
`makeTxDbFromUCSC`.

```{r supportedUCSCtables}
supportedUCSCtables(genome="mm9")

mm9KG_txdb <- makeTxDbFromUCSC(genome="mm9", tablename="knownGene")
mm9KG_txdb
```

See `?makeTxDbFromUCSC` for more information.


# Using `makeTxDbFromBiomart`

Retrieve data from BioMart by specifying the mart and the data set to
the `makeTxDbFromBiomart` function (not all BioMart
data sets are currently supported):

```{r makeTxDbFromBiomart, eval=FALSE}
mmusculusEnsembl <- makeTxDbFromBiomart(dataset="mmusculus_gene_ensembl")
```

As with the `makeTxDbFromUCSC` function, the
`makeTxDbFromBiomart` function also has a
`circ_seqs` argument that will default to using the contents
of the `DEFAULT_CIRC_SEQS` vector.  And just like those UCSC
sources, there is also a helper function called
`getChromInfoFromBiomart` that can show what the different
chromosomes are called for a given source.

Using the `makeTxDbFromBiomart`
`makeTxDbFromUCSC` functions can take a while and
may also require some bandwidth as these methods have to download and
then assemble a database from their respective sources.  It is not
expected that most users will want to do this step every time.
Instead, we suggest that you save your annotation objects and label
them with an appropriate time stamp so as to facilitate reproducible
research.

See `?makeTxDbFromBiomart` for more information.


# Using `makeTxDbFromEnsembl`

The `makeTxDbFromEnsembl` function creates a `TxDb` object
for a given organism by importing the genomic locations of its transcripts,
exons, CDS, and genes from an Ensembl database.

See `?makeTxDbFromEnsembl` for more information.


# Using `makeTxDbFromGFF`

You can also extract transcript information from either GFF3 or GTF
files by using the `makeTxDbFromGFF` function.
Usage is similar to `makeTxDbFromBiomart` and
`makeTxDbFromUCSC`.

See `?makeTxDbFromGFF` for more information.


# Saving and Loading a `TxDb` Object

Once a `TxDb` object has been created, it can be saved
to avoid the time and bandwidth costs of recreating it and to make it
possible to reproduce results with identical genomic feature data at a
later date.  Since `TxDb` objects are backed by a
SQLite database, the save format is a SQLite database file (which
could be accessed from programs other than R if desired).  Note that
it is not possible to serialize a `TxDb` object using
R's `save` function.

```{r saveDb, results="hide"}
saveDb(mm9KG_txdb, file="mm9KG_txdb.sqlite")
```

And as was mentioned earlier, a saved `TxDb` object can
be initialized from a .sqlite file by simply using `loadDb`.

```{r loadDb}
mm9KG_txdb <- loadDb("mm9KG_txdb.sqlite")
```


# Using `makeTxDbPackageFromUCSC` and `makeTxDbPackageFromBiomart`

It is often much more convenient to just make an annotation package
out of your annotations.  If you are finding that this is the case,
then you should consider the convenience functions:
`makeTxDbPackageFromUCSC` and
`makeTxDbPackageFromBiomart`.  These functions are similar
to `makeTxDbFromUCSC` and
`makeTxDbFromBiomart` except that they will take the
extra step of actually wrapping the database up into an annotation
package for you.  This package can then be installed and used as of
the standard TxDb packages found on in the Bioconductor
repository.


# Session Information

```{r SessionInfo, echo=FALSE}
sessionInfo()
```