File: genomeToProtein.Rd

package info (click to toggle)
r-bioc-ensembldb 2.14.0%2Bdfsg-1
links: PTS, VCS
area: main
in suites: bullseye
size: 2,764 kB
sloc: perl: 331; sh: 15; makefile: 5
file content (96 lines) | stat: -rw-r--r-- 3,726 bytes
parent folder | download | duplicates (2)
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/genomeToX.R
\name{genomeToProtein}
\alias{genomeToProtein}
\title{Map genomic coordinates to protein coordinates}
\usage{
genomeToProtein(x, db)
}
\arguments{
\item{x}{\code{GRanges} with the genomic coordinates that should be mapped to
within-protein coordinates.}

\item{db}{\code{EnsDb} object.}
}
\value{
An \code{IRangesList} with each element representing the mapping of one of the
\code{GRanges} in \code{x} (i.e. the length of the \code{IRangesList} is \code{length(x)}).
Each element in \code{IRanges} provides the coordinates within the protein
sequence, names being the (Ensembl) IDs of the protein. The ID of the
transcript encoding the protein, the ID of the exon within which the
genomic coordinates are located and its rank in the transcript are provided
in metadata columns \code{"tx_id"}, \code{"exon_id"} and \code{"exon_rank"}. Metadata
columns \code{"cds_ok"} indicates whether the length of the CDS matches the
length of the encoded protein. Coordinates for which \code{cds_ok = FALSE} should
be taken with caution, as they might not be correct. Metadata columns
\code{"seq_start"}, \code{"seq_end"}, \code{"seq_name"} and \code{"seq_strand"} provide the
provided genomic coordinates.

For genomic coordinates that can not be mapped to within-protein sequences
an \code{IRanges} with a start coordinate of -1 is returned.
}
\description{
Map positions along the genome to positions within the protein sequence if
a protein is encoded at the location. The provided coordinates have to be
completely within the genomic position of an exon of a protein coding
transcript (see \code{\link[=genomeToTranscript]{genomeToTranscript()}} for details). Also, the provided
positions have to be within the genomic region encoding the CDS of a
transcript (excluding its stop codon; soo \code{\link[=transcriptToProtein]{transcriptToProtein()}} for
details).

For genomic positions for which the mapping failed an \code{IRanges} with
negative coordinates (i.e. a start position of -1) is returned.
}
\details{
\code{genomeToProtein} combines calls to \code{\link[=genomeToTranscript]{genomeToTranscript()}} and
\code{\link[=transcriptToProtein]{transcriptToProtein()}}.
}
\examples{

library(EnsDb.Hsapiens.v86)
## Restrict all further queries to chromosome x to speed up the examples
edbx <- filter(EnsDb.Hsapiens.v86, filter = ~ seq_name == "X")

## In the example below we define 4 genomic regions:
## 630898: corresponds to the first nt of the CDS of ENST00000381578
## 644636: last nt of the CDS of ENST00000381578
## 644633: last nt before the stop codon in ENST00000381578
## 634829: position within an intron.
gnm <- GRanges("X", IRanges(start = c(630898, 644636, 644633, 634829),
    width = c(5, 1, 1, 3)))
res <- genomeToProtein(gnm, edbx)

## The result is an IRangesList with the same length as gnm
length(res)
length(gnm)

## The first element represents the mapping for the first GRanges:
## the coordinate is mapped to the first amino acid of the protein(s).
## The genomic coordinates can be mapped to several transcripts (and hence
## proteins).
res[[1]]

## The stop codon is not translated, thus the mapping for the second
## GRanges fails
res[[2]]

## The 3rd GRanges is mapped to the last amino acid.
res[[3]]

## Mapping of intronic positions fail
res[[4]]
}
\seealso{
Other coordinate mapping functions: 
\code{\link{cdsToTranscript}()},
\code{\link{genomeToTranscript}()},
\code{\link{proteinToGenome}()},
\code{\link{proteinToTranscript}()},
\code{\link{transcriptToCds}()},
\code{\link{transcriptToGenome}()},
\code{\link{transcriptToProtein}()}
}
\author{
Johannes Rainer
}
\concept{coordinate mapping functions}