File: parse.folder.Rd

package info (click to toggle)
r-cran-tcr 2.3.2%2Bds-1
  • links: PTS, VCS
  • area: main
  • in suites: bookworm, bullseye, trixie
  • size: 2,316 kB
  • sloc: cpp: 187; makefile: 5
file content (136 lines) | stat: -rw-r--r-- 4,501 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/parsing.R
\name{parse.folder}
\alias{parse.folder}
\alias{parse.file.list}
\alias{parse.file}
\alias{parse.mitcr}
\alias{parse.mitcrbc}
\alias{parse.migec}
\alias{parse.vdjtools}
\alias{parse.immunoseq}
\alias{parse.immunoseq2}
\alias{parse.immunoseq3}
\alias{parse.tcr}
\alias{parse.mixcr}
\alias{parse.imseq}
\alias{parse.migmap}
\title{Parse input table files with immune receptor repertoire data.}
\usage{
parse.file(.filename, 
.format = c('mitcr', 'mitcrbc', 'migec', 'vdjtools', 'immunoseq', 
'mixcr', 'imseq', 'tcr'), ...)

parse.file.list(.filenames, 
.format = c('mitcr', 'mitcrbc', 'migec', 'vdjtools', 'immunoseq', 
'mixcr', 'imseq', 'tcr'), .namelist = NA)

parse.folder(.folderpath, 
.format = c('mitcr', 'mitcrbc', 'migec', 'vdjtools', 'immunoseq', 
'mixcr', 'imseq', 'tcr'), ...)

parse.mitcr(.filename)

parse.mitcrbc(.filename)

parse.migec(.filename)

parse.vdjtools(.filename)

parse.immunoseq(.filename)

parse.immunoseq2(.filename)

parse.immunoseq3(.filename)

parse.mixcr(.filename)

parse.imseq(.filename)

parse.tcr(.filename)

parse.migmap(.filename)
}
\arguments{
\item{.folderpath}{Path to the folder with text cloneset files.}

\item{.format}{String that specifies the input format.}

\item{...}{Parameters passed to \code{parse.cloneset}.}

\item{.filename}{Path to the input file with cloneset data.}

\item{.filenames}{Vector or list with paths to files with cloneset data.}

\item{.namelist}{Either NA or character vector of length \code{.filenames} with names for output data frames.}
}
\value{
Data frame with immune receptor repertoire data. Each row in this data frame corresponds to a clonotype.
The data frame has following columns:

- "Umi.count" - number of barcodes (events, UMIs);

- "Umi.proportion" - proportion of barcodes (events, UMIs);

- "Read.count" - number of reads;

- "Read.proportion" - proportion of reads;

- "CDR3.nucleotide.sequence" - CDR3 nucleotide sequence;

- "CDR3.amino.acid.sequence" - CDR3 amino acid sequence;

- "V.gene" - names of aligned Variable gene segments;

- "J.gene" - names of aligned Joining gene segments;

- "D.gene" - names of aligned Diversity gene segments;

- "V.end" - last positions of aligned V gene segments (1-based);

- "J.start" - first positions of aligned J gene segments (1-based);

- "D5.end" - positions of D'5 end of aligned D gene segments (1-based);

- "D3.end" - positions of D'3 end of aligned D gene segments (1-based);

- "VD.insertions" - number of inserted nucleotides (N-nucleotides) at V-D junction (-1 for receptors with VJ recombination);

- "DJ.insertions" - number of inserted nucleotides (N-nucleotides) at D-J junction (-1 for receptors with VJ recombination);

- "Total.insertions" - total number of inserted nucleotides (number of N-nucleotides at V-J junction for receptors with VJ recombination).
}
\description{
Load the TCR data from the file with the given filename to a data frame or load all 
files from the given folder to a list of data frames. The folder must contain onky files with the specified format.
Input files could be either text files or archived with gzip ("filename.txt.gz") or bzip2 ("filename.txt.bz2").
For a general parser see \code{\link{parse.cloneset}}.

Parsers are available for:
MiTCR ("mitcr"), MiTCR w/ UMIs ("mitcrbc"), MiGEC ("migec"), VDJtools ("vdjtools"), 
ImmunoSEQ ("immunoseq" or 'immunoseq2' for old and new formats respectively),
MiXCR ("mixcr"), IMSEQ ("imseq") and tcR ("tcr", data frames saved with the `repSave()` function).

Output of MiXCR should contain either all hits or best hits for each gene segment.

Output of IMSEQ should be generated with parameter "-on". In this case there will be no positions of aligned gene segments in the output data frame
due to restrictions of IMSEQ output.

tcR's data frames should be saved with the `repSave()` function.
}
\examples{
\dontrun{
# Parse file in "~/mitcr/immdata1.txt" as a MiTCR file.
immdata1 <- parse.file("~/mitcr/immdata1.txt", 'mitcr')
# Parse VDJtools file archive as .gz file.
immdata1 <- parse.file("~/mitcr/immdata3.txt.gz", 'vdjtools')
# Parse files "~/data/immdata1.txt" and "~/data/immdat2.txt" as MiGEC files.
immdata12 <- parse.file.list(c("~/data/immdata1.txt",
                             "~/data/immdata2.txt"), 'migec')
# Parse all files in "~/data/" as MiGEC files.
immdata <- parse.folder("~/data/", 'migec')
}
}
\seealso{
\link{parse.cloneset}, \link{repSave}, \link{repLoad}
}