1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136
|
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/hello-bigmemory.R
\docType{package}
\name{bigmemory-package}
\alias{bigmemory-package}
\alias{bigmemory}
\title{Manage massive matrices with shared memory and memory-mapped files.}
\description{
Create, store, access, and manipulate massive matrices. Matrices are, by
default, allocated to shared memory and may use memory-mapped files.
Packages \pkg{biganalytics}, \pkg{synchronicity}, \pkg{bigalgebra}, and
\pkg{bigtabulate} provide advanced functionality. Access to and
manipulation of a \code{\link{big.matrix}} object is exposed in an S4
class whose interface is similar to that of a \code{\link{matrix}}. Use of
these packages in parallel environments can provide substantial speed and
memory efficiencies. \pkg{bigmemory} also provides a \acronym{C++}
framework for the development of new tools that can work both with
\code{big.matrix} and native \code{matrix} objects.
}
\details{
Index of functions/methods (grouped in a friendly way): \preformatted{
big.matrix, filebacked.big.matrix, as.big.matrix
is.big.matrix, is.separated, is.filebacked
describe, attach.big.matrix, attach.resource
sub.big.matrix, is.sub.big.matrix
dim, dimnames, nrow, ncol, print, head, tail, typeof, length
read.big.matrix, write.big.matrix
mwhich
morder, mpermute
deepcopy
flush }
Multi-gigabyte data sets challenge and frustrate users, even on
well-equipped hardware. Use of \acronym{C/C++} can provide efficiencies, but
is cumbersome for interactive data analysis and lacks the flexibility and
power of 's rich statistical programming environment. The package
\pkg{bigmemory} and associated packages \pkg{biganalytics},
\pkg{synchronicity}, \pkg{bigtabulate}, and \pkg{bigalgebra} bridge
this gap, implementing massive matrices and supporting their manipulation
and exploration. The data
structures may be allocated to shared memory, allowing separate processes on
the same computer to share access to a single copy of the data set. The
data structures may also be file-backed, allowing users to easily manage and
analyze data sets larger than available RAM and share them across nodes of a
cluster. These features of the Bigmemory Project open the door for powerful
and memory-efficient parallel analyses and data mining of massive data sets.
This project (\pkg{bigmemory} and its sister packages) is still actively
developed, although the design and current features can be viewed as
"stable." Please feel free to email us with any questions:
bigmemoryauthors@gmail.com.
}
\note{
Various options are available.
\code{options(bigmemory.typecast.warning)} can be set to avoid annoying
warnings that might occur if, for example, you assign objects (typically
type double) to char, short, or integer \code{\link{big.matrix}} objects.
\code{options(bigmemory.print.warning)} protects against extracting and
printing a massive matrix (which would involve the creation of a second
massive copy of the matrix). \code{options(bigmemory.allow.dimnames)} by
default prevents the setting of \code{dimnames} attributes, because they
aren't allocated to shared memory and changes will not be visible across
processes. \code{options(bigmemory.default.type)} is \code{"double"} be
default (a change in default behavior as of 4.1.1) but may be changed by the
user.
Note that you can't simply use a \code{big.matrix} with many (most) existing
functions (e.g. \code{\link{lm}}, \code{\link{kmeans}}). One nice exception
is \code{\link{split}}, because this function only accesses subsets of the
matrix.
}
\section{Memory considerations}{
For obvious reasons memory that the \code{big.matrix} uses is managed outside
the R memory pool available to the garbage collector and the memory occupied
by the \code{big.matrix} is not visible to the R.
This has subtle implications:
\itemize{
\item Memory usage is not visible via general R functions (e.g. the \code{gc()} function)
\item Garbage collector is mislead by the very small memory footprint of the \code{big.matrix}
object (which acts merely as a pointer to the external memory structure), which can result
in much less eagerness to garbage-collect the unused \code{big.memory} objects.
After removing a last reference to a big \code{big.matrix}, user should manually run
\code{gc()} to reclaim the memory.
\item Attaching the description of already finalized \code{big.matrix} and accessing this object
will result in undefined behavior, which simply means it will crash the current R session
with no hope of saving the data in it. To prevent R from de-allocating (finalizing) the
matrices, user should keep at least one \code{big.memory} object somewhere in R memory in at
least one R session on the current machine.
\item Abruptly closed R (using e.g. task manager) will not have a chance to finalize the
\code{big.matrix} objects, which will result in a memory leak, as the \code{big.matrices}
will remain in the memory (perhaps under obfuscated names) with no easy way to reconnect R to them.
}
}
\examples{
# Our examples are all trivial in size, rather than burning huge amounts
# of memory.
x <- big.matrix(5, 2, type="integer", init=0,
dimnames=list(NULL, c("alpha", "beta")))
x
x[1:2,]
x[,1] <- 1:5
x[,"alpha"]
colnames(x)
options(bigmemory.allow.dimnames=TRUE)
colnames(x) <- NULL
x[,]
}
\seealso{
For example, \code{\link{big.matrix}}, \code{\link{mwhich}},
\code{\link{read.big.matrix}}
}
\author{
Michael J. Kane, John W. Emerson, Peter Haverty, and Charles Determan Jr.
Maintainers: Michael J. Kane bigmemoryauthors@gmail.com
}
\keyword{package}
|