File: linkedTxome.Rd

package info (click to toggle)
r-bioc-tximeta 1.16.1%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: bookworm
  • size: 860 kB
  • sloc: makefile: 2
file content (126 lines) | stat: -rw-r--r-- 5,299 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/linkedTxome.R
\name{linkedTxome}
\alias{linkedTxome}
\alias{makeLinkedTxome}
\alias{loadLinkedTxome}
\title{Make and load linked transcriptomes ("linkedTxome")}
\usage{
makeLinkedTxome(
  indexDir,
  source,
  organism,
  release,
  genome,
  fasta,
  gtf,
  write = TRUE,
  jsonFile
)

loadLinkedTxome(jsonFile)
}
\arguments{
\item{indexDir}{the local path to the Salmon index}

\item{source}{the source of transcriptome (e.g. "de-novo").
Note: if you specify "GENCODE" or "Ensembl", this will trigger
behavior by tximeta that may not be desired: e.g. attempts to
download canonical transcriptome data from AnnotationHub
(unless useHub=FALSE when running tximeta) and parsing of
Ensembl GTF using ensembldb (which may fail if the GTF file
has been modified). For transcriptomes that are defined by
local GTF files, it is recommended to use the terms "LocalGENCODE"
or "LocalEnsembl". Setting "LocalEnsembl" will also strip
version numbers from the FASTA transcript IDs to enable matching
with the Ensembl GTF.}

\item{organism}{organism (e.g. "Homo sapiens")}

\item{release}{release number (e.g. "27")}

\item{genome}{genome (e.g. "GRCh38", or "none")}

\item{fasta}{location(s) for the FASTA transcript sequences
(of which the transcripts used to build the index is equal or a subset).
This can be a local path, or an HTTP or FTP URL}

\item{gtf}{location for the GTF/GFF file
(of which the transcripts used to build the index is equal or a subset).
This can be a local path, or an HTTP or FTP URL
While the \code{fasta} argument can take a vector of length greater than one
(more than one FASTA file containing transcripts used in indexing),
the \code{gtf} argument has to be a single GTF/GFF file.
This can also be a serialized GRanges object (location of a .rds file)
imported with rtracklayer.
If transcripts were added to a standard set of reference transcripts (e.g. fusion genes,
or pathogen transcripts), it is recommended that the tximeta user would manually
add these to the GTF/GFF file, and post the modified GTF/GFF publicly, such as
on Zenodo. This enables consistent annotation and downstream annotation
tasks, such as by \code{summarizeToGene}.}

\item{write}{logical, should a JSON file be written out
which documents the transcriptome checksum and metadata? (default is TRUE)}

\item{jsonFile}{the path to the json file for the linkedTxome}
}
\value{
nothing, the function is run for its side effects
}
\description{
\code{makeLinkedTxome} reads the checksum associated with a Salmon
index at \code{indexDir}, and links it to key information
about the transcriptome, including the \code{source}, \code{organism},
\code{release}, and \code{genome} (these are custom character strings),
as well as the locations (e.g. local, HTTP, or FTP) for one or more \code{fasta}
files and one \code{gtf} file. \code{loadLinkedTxome} loads this
information from a JSON file. See Details.
}
\details{
\code{makeLinkedTxome} links the information about the transcriptome
used for quantification in two ways:
1) the function will store a record in tximeta's cache such that
future import of quantification data will automatically access and
parse the GTF as if the transcriptome were one of those automatically
detected by tximeta. Then all features of tximeta (e.g. summarization
to gene, programmatic adding of IDs or metadata) will be available;
2) it will by default write out a JSON file
that can be shared, or posted online, and which can be read by
\code{loadLinkedTxome} which will store the information in tximeta's
cache. This should make the full quantification-import pipeline
computationally reproducible / auditable even for transcriptomes
which differ from those provided by references (GENCODE, Ensembl,
RefSeq).

For further details please see the "Linked transcriptomes"
section of the tximeta vignette.
}
\examples{

# point to a Salmon quantification file with an additional artificial transcript
dir <- system.file("extdata/salmon_dm", package="tximportData")
file <- file.path(dir, "SRR1197474.plus", "quant.sf")
coldata <- data.frame(files=file, names="SRR1197474", sample="1",
                      stringsAsFactors=FALSE)

# now point to the Salmon index itself to create a linkedTxome
# as the index will not match a known txome
indexDir <- file.path(dir, "Dm.BDGP6.22.98.plus_salmon-0.14.1")

# point to the source FASTA and GTF:
fastaFTP <- c("ftp://ftp.ensembl.org/pub/release-98/fasta/drosophila_melanogaster/cdna/Drosophila_melanogaster.BDGP6.22.cdna.all.fa.gz",
              "ftp://ftp.ensembl.org/pub/release-98/fasta/drosophila_melanogaster/ncrna/Drosophila_melanogaster.BDGP6.22.ncrna.fa.gz",
              "extra_transcript.fa.gz")
gtfPath <- file.path(dir, "Drosophila_melanogaster.BDGP6.22.98.plus.gtf.gz")

# now create a linkedTxome, linking the Salmon index to its FASTA and GTF sources
makeLinkedTxome(indexDir=indexDir, source="Ensembl", organism="Drosophila melanogaster",
                release="98", genome="BDGP6.22", fasta=fastaFTP, gtf=gtfPath, write=FALSE)

# to clear the entire linkedTxome table
# (don't run unless you want to clear this table!)
# bfcloc <- getTximetaBFC()
# bfc <- BiocFileCache(bfcloc)
# bfcremove(bfc, bfcquery(bfc, "linkedTxomeTbl")$rid)

}