File: Filter-classes.Rd

package info (click to toggle)
r-bioc-ensembldb 2.14.0%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: bullseye
  • size: 2,764 kB
  • sloc: perl: 331; sh: 15; makefile: 5
file content (303 lines) | stat: -rw-r--r-- 12,851 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/Classes.R, R/Methods-Filter.R, R/Methods.R
\docType{class}
\name{Filter-classes}
\alias{Filter-classes}
\alias{OnlyCodingTxFilter-class}
\alias{OnlyCodingTxFilter}
\alias{ProtDomIdFilter-class}
\alias{ProtDomIdFilter}
\alias{ProteinDomainIdFilter-class}
\alias{ProteinDomainIdFilter}
\alias{ProteinDomainSourceFilter-class}
\alias{ProteinDomainSourceFilter}
\alias{UniprotDbFilter-class}
\alias{UniprotDbFilter}
\alias{UniprotMappingTypeFilter-class}
\alias{UniprotMappingTypeFilter}
\alias{TxSupportLevelFilter-class}
\alias{TxSupportLevelFilter}
\alias{seqnames,GRangesFilter-method}
\alias{seqlevels,GRangesFilter-method}
\alias{supportedFilters,EnsDb-method}
\title{Filters supported by ensembldb}
\usage{
OnlyCodingTxFilter()

ProtDomIdFilter(value, condition = "==")

ProteinDomainIdFilter(value, condition = "==")

ProteinDomainSourceFilter(value, condition = "==")

UniprotDbFilter(value, condition = "==")

UniprotMappingTypeFilter(value, condition = "==")

TxSupportLevelFilter(value, condition = "==")

\S4method{seqnames}{GRangesFilter}(x)

\S4method{seqlevels}{GRangesFilter}(x)

\S4method{supportedFilters}{EnsDb}(object, ...)
}
\arguments{
\item{value}{The value(s) for the filter. For \code{GRangesFilter} it has to be a
\code{GRanges} object.}

\item{condition}{\code{character(1)} specifying the \emph{condition} of the
filter. For \code{character}-based filters (such as
\code{GeneIdFilter}) \code{"=="}, \code{"!="}, \code{"startsWith"} and \code{"endsWith"} are
supported. Allowed values for \code{integer}-based filters (such as
\code{GeneStartFilter}) are \code{"=="}, \code{"!="}, \code{"<"}. \code{"<="}, \code{">"} and \code{">="}.}

\item{x}{For \code{seqnames}, \code{seqlevels}: a \code{GRangesFilter} object.}

\item{object}{For \code{supportedFilters}: an \code{EnsDb} object.}

\item{...}{For \code{supportedFilters}: currently not used.}
}
\value{
For \code{ProtDomIdFilter}: A \code{ProtDomIdFilter} object.

For \code{ProteinDomainIdFilter}: A \code{ProteinDomainIdFilter} object.

For \code{ProteinDomainSourceFilter}: A \code{ProteinDomainSourceFilter}
object.

For \code{UniprotDbFilter}: A \code{UniprotDbFilter} object.

For \code{UniprotMappingTypeFilter}: A \code{UniprotMappingTypeFilter} object.

For \code{TxSupportLevel}: A \code{TxSupportLevel} object.

For \code{supportedFilters}: a \code{data.frame} with the names and
the corresponding field of the supported filter classes.
}
\description{
\code{ensembldb} supports most of the filters from the \link{AnnotationFilter}
package to retrieve specific content from \link{EnsDb} databases. These filters
can be passed to the methods such as \code{\link[=genes]{genes()}} with the \code{filter} parameter
or can be added as a \emph{global} filter to an \code{EnsDb} object (see
\code{\link[=addFilter]{addFilter()}} for more details). Use \code{\link[=supportedFilters]{supportedFilters()}} to get an
overview of all filters supported by \code{EnsDb} object.

\code{seqnames}: accessor for the sequence names of the \code{GRanges}
object within a \code{GRangesFilter}.

\code{seqnames}: accessor for the \code{seqlevels} of the \code{GRanges}
object within a \code{GRangesFilter}.

\code{supportedFilters} returns a \code{data.frame} with the
names of all filters and the corresponding field supported by the
\code{EnsDb} object.
}
\details{
\code{ensembldb} supports the following filters from the \code{AnnotationFilter}
package:
\itemize{
\item \code{GeneIdFilter}: filter based on the Ensembl gene ID.
\item \code{GeneNameFilter}: filter based on the name of the gene as provided
Ensembl. In most cases this will correspond to the official gene symbol.
\item \code{SymbolFilter} filter based on the gene names. \code{EnsDb} objects don't
have a dedicated \emph{symbol} column, the filtering is hence based on the
gene names.
\item \code{GeneBiotype}: filter based on the biotype of genes (e.g.
\code{"protein_coding"}).
\item \code{GeneStartFilter}: filter based on the genomic start coordinate of genes.
\item \code{GeneEndFilter}: filter based on the genomic end coordinate of genes.
\item \code{EntrezidFilter}: filter based on the genes' NCBI Entrezgene ID.
\item \code{TxIdFilter}: filter based on the Ensembld transcript ID.
\item \code{TxNameFilter}: filter based on the Ensembld transcript ID; no transcript
names are provided in \code{EnsDb} databases.
\item \code{TxBiotypeFilter}: filter based on the transcripts' biotype.
\item \code{TxStartFilter}: filter based on the genomic start coordinate of the
transcripts.
\item \code{TxEndFilter}: filter based on the genonic end coordinates of the
transcripts.
\item \code{ExonIdFilter}: filter based on Ensembl exon IDs.
\item \code{ExonRankFilter}: filter based on the index/rank of the exon within the
transcrips.
\item \code{ExonStartFilter}: filter based on the genomic start coordinates of the
exons.
\item \code{ExonEndFilter}: filter based on the genomic end coordinates of the exons.
\item \code{GRangesFilter}: Allows to fetch features within or overlapping specified
genomic region(s)/range(s). This filter takes a \code{GRanges} object
as input and, if \code{type = "any"} (the default) will restrict results to
features (genes, transcripts or exons) that are partially overlapping the
region. Alternatively, by specifying \code{condition = "within"} it will
return features located within the range. In addition, the \code{GRangesFilter}
\code{condition = "start"}, \code{condition = "end"} and \code{condition = "equal"}
filtering for features with the same start or end coordinate or that are
equal to the \code{GRanges}.

Note that the type of feature on which the filter is applied depends on
the method that is called, i.e. \code{\link[=genes]{genes()}} will filter on the
genomic coordinates of genes, \code{\link[=transcripts]{transcripts()}} on those of
transcripts and \code{\link[=exons]{exons()}} on exon coordinates.

Calls to the methods \code{\link[=exonsBy]{exonsBy()}}, \code{\link[=cdsBy]{cdsBy()}} and
\code{\link[=transcriptsBy]{transcriptsBy()}} use the start and end coordinates of the
feature type specified with argument \code{by} (i.e. \code{"gene"},
\code{"transcript"} or \code{"exon"}) for the filtering.

If the specified \code{GRanges} object defines multiple regions, all
features within (or overlapping) any of these regions are returned.

Chromosome names/seqnames can be provided in UCSC format (e.g.
\code{"chrX"}) or Ensembl format (e.g. \code{"X"}); see \code{\link[=seqlevelsStyle]{seqlevelsStyle()}} for
more information.
\item \code{SeqNameFilter}: filter based on chromosome names.
\item \code{SeqStrandFilter}: filter based on the chromosome strand. The strand can
be specified with \code{value = "+"}, \code{value = "-"}, \code{value = -1} or
\code{value = 1}.
\item \code{ProteinIdFilter}: filter based on Ensembl protein IDs. This filter is
only supported if the \code{EnsDb} provides protein annotations; use the
\code{\link[=hasProteinData]{hasProteinData()}} method to check.
\item \code{UniprotFilter}: filter based on Uniprot IDs. This filter is only
supported if the \code{EnsDb} provides protein annotations; use the
\code{\link[=hasProteinData]{hasProteinData()}} method to check.
}

In addition, the following filters are defined by \code{ensembldb}:
\itemize{
\item \code{TxSupportLevel}: allows to filter results using the provided transcript
support level. Support levels for transcripts are defined by Ensembl
based on the available evidences for a transcript with 1 being the
highest evidence grade and 5 the lowest level. This filter is only
supported on \code{EnsDb} databases with a db schema version higher 2.1.
\item \code{UniprotDbFilter}: allows to filter results based on the specified Uniprot
database name(s).
\item \code{UniprotMappingTypeFilter}: allows to filter results based on the mapping
method/type that was used to assign Uniprot IDs to Ensembl protein IDs.
\item \code{ProtDomIdFilter}, \code{ProteinDomainIdFilter}: allows to retrieve entries
from the database matching the provided filter criteria based on their
protein domain ID (\emph{protein_domain_id}).
\item \code{ProteinDomainSourceFilter}: filter results based on the source
(database/method) defining the protein domain (e.g. \code{"pfam"}).
\item \code{OnlyCodingTxFilter}: allows to retrieve entries only for protein coding
transcripts, i.e. transcripts with a CDS. This filter does not take any
input arguments.
}
}
\note{
For users of \code{ensembldb} version < 2.0: in the \code{GRangesFilter} from the
\code{AnnotationFilter} package the \code{condition} parameter was renamed to \code{type}
(to be consistent with the \code{IRanges} package). In addition,
\code{condition = "overlapping"} is no longer recognized. To retrieve all
features overlapping the range \code{type = "any"} has to be used.

Protein annotation based filters can only be used if the
\code{EnsDb} database contains protein annotations, i.e. if \code{hasProteinData}
is \code{TRUE}. Also, only protein coding transcripts will have protein
annotations available, thus, non-coding transcripts/genes will not be
returned by the queries using protein annotation filters.
}
\examples{

## Create a filter that could be used to retrieve all informations for
## the respective gene.
gif <- GeneIdFilter("ENSG00000012817")
gif

## Create a filter for a chromosomal end position of a gene
sef <- GeneEndFilter(10000, condition = ">")
sef

## For additional examples see the help page of "genes".


## Example for GRangesFilter:
## retrieve all genes overlapping the specified region
grf <- GRangesFilter(GRanges("11", ranges = IRanges(114129278, 114129328),
                             strand = "+"), type = "any")
library(EnsDb.Hsapiens.v86)
edb <- EnsDb.Hsapiens.v86
genes(edb, filter = grf)

## Get also all transcripts overlapping that region.
transcripts(edb, filter = grf)

## Retrieve all transcripts for the above gene
gn <- genes(edb, filter = grf)
txs <- transcripts(edb, filter = GeneNameFilter(gn$gene_name))
## Next we simply plot their start and end coordinates.
plot(3, 3, pch=NA, xlim=c(start(gn), end(gn)), ylim=c(0, length(txs)),
yaxt="n", ylab="")
## Highlight the GRangesFilter region
rect(xleft=start(grf), xright=end(grf), ybottom=0, ytop=length(txs),
col="red", border="red")
for(i in 1:length(txs)){
    current <- txs[i]
    rect(xleft=start(current), xright=end(current), ybottom=i-0.975, ytop=i-0.125, border="grey")
    text(start(current), y=i-0.5,pos=4, cex=0.75, labels=current$tx_id)
}
## Thus, we can see that only 4 transcripts of that gene are indeed
## overlapping the region.


## No exon is overlapping that region, thus we're not getting anything
exons(edb, filter = grf)


## Example for ExonRankFilter
## Extract all exons 1 and (if present) 2 for all genes encoded on the
## Y chromosome
exons(edb, columns = c("tx_id", "exon_idx"),
      filter=list(SeqNameFilter("Y"),
                  ExonRankFilter(3, condition = "<")))


## Get all transcripts for the gene SKA2
transcripts(edb, filter = GeneNameFilter("SKA2"))

## Which is the same as using a SymbolFilter
transcripts(edb, filter = SymbolFilter("SKA2"))


## Create a ProteinIdFilter:
pf <- ProteinIdFilter("ENSP00000362111")
pf
## Using this filter would retrieve all database entries that are associated
## with a protein with the ID "ENSP00000362111"
if (hasProteinData(edb)) {
    res <- genes(edb, filter = pf)
    res
}

## UniprotFilter:
uf <- UniprotFilter("O60762")
## Get the transcripts encoding that protein:
if (hasProteinData(edb)) {
    transcripts(edb, filter = uf)
    ## The mapping Ensembl protein ID to Uniprot ID can however be 1:n:
    transcripts(edb, filter = TxIdFilter("ENST00000371588"),
        columns = c("protein_id", "uniprot_id"))
}

## ProtDomIdFilter:
pdf <- ProtDomIdFilter("PF00335")
## Also here we could get all transcripts related to that protein domain
if (hasProteinData(edb)) {
    transcripts(edb, filter = pdf, columns = "protein_id")
}

}
\seealso{
\code{\link[=supportedFilters]{supportedFilters()}} to list all filters supported for \code{EnsDb} objects.

\code{\link[=listUniprotDbs]{listUniprotDbs()}} and \code{\link[=listUniprotMappingTypes]{listUniprotMappingTypes()}} to list all Uniprot
database names respectively mapping method types from the database.

\code{\link[=GeneIdFilter]{GeneIdFilter()}} in the \code{AnnotationFilter} package for more details on the
filter objects.

\code{\link[=genes]{genes()}}, \code{\link[=transcripts]{transcripts()}}, \code{\link[=exons]{exons()}}, \code{\link[=listGenebiotypes]{listGenebiotypes()}},
\code{\link[=listTxbiotypes]{listTxbiotypes()}}.

\code{\link[=addFilter]{addFilter()}} and \code{\link[=filter]{filter()}} for globally adding filters to an \code{EnsDb}.
}
\author{
Johannes Rainer
}