File: frameApply.Rd

package info (click to toggle)
gregmisc 2.0.6-1
  • links: PTS
  • area: main
  • in suites: sarge
  • size: 1,712 kB
  • ctags: 379
  • sloc: perl: 5,142; asm: 127; sh: 30; makefile: 17
file content (87 lines) | stat: -rwxr-xr-x 3,918 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
% $Id%
%
\name{frameApply}
\alias{frameApply}
\title{Subset analysis on data frames}
\description{Apply a function to row subsets of a data frame. 
}
\usage{
frameApply(x, by = NULL, on = by[1], fun = function(xi) c(Count =
nrow(xi)), subset = TRUE, simplify = TRUE, byvar.sep = "\\$\\@\\$", ...)
}
\arguments{
  \item{x}{a data frame}
  \item{by}{names of columns in \code{x} specifying the variables to use
    to form the subgroups. 
    None of the \code{by} variables should have
    the name "sep" (you will get an error if one of them does; a bit of
    laziness in the code). Unused levels of 
    the \code{by} variables will be dropped. Use \code{by = NULL} (the
    default) to indicate that all of the data is to be treated as a
    single (trivial) subgroup.}
  \item{on}{names of columns in \code{x} specifying columns over which
    \code{fun} is to be applied. These can include columns specified in
    \code{by}, (as with the default) although that is not usually the case.}
  \item{fun}{a function that can operate on data frames that are row
    subsets of \code{x[on]}. If \code{simplify = TRUE},
    the return value of the function should always be either a try-error
    (see \code{\link{try}}), or a vector of
    fixed length (i.e. same length for every subset), preferably with
    named elements.}
  \item{subset}{logical vector (can be specified in terms of variables
    in data). This row subset of \code{x} is taken before doing anything
    else.}
  \item{simplify}{logical. If TRUE (the default), return value will
    be a data frame including the \code{by} columns and a column for
    each element of the return vector of \code{fun}. If FALSE, the
    return value will be a list, sometimes necessary for less structured
    output (see description of return value below).}
  \item{byvar.sep}{character. This can be any character string not
    found anywhere in the values of the \code{by} variables. The
    \code{by} variables will be pasted together using this as the
    separator, and the result will be used as the index to form the
    subgroups.  }
  \item{...}{additional arguments to \code{fun}.}
}
\value{a data frame if \code{simplify = TRUE} (the default), assuming
  there is sufficiently structured output from \code{fun}. If
  \code{simplify = FALSE} and \code{by} is not NULL, the return value will be a list with two
  elements. The first element, named "by", will be a data frame with the
  unique rows of \code{x[by]}, and the second element, named "result"
  will be a list where the ith 
  component gives the result for the ith row of the "by" element.  
}
\details{This function accomplishes something similar to
  \code{\link{by}}. The main difference is that \code{frameApply} is
  designed to return data frames and lists instead of objects of class
  'by'. Also, \code{frameApply} works only on the unique combinations of
  the \code{by} that are actually present in the data, not on the entire
  cartesian product of the \code{by} variables. In some cases this
  results in great gains in efficiency, although \code{frameApply} is
  hardly an efficient function.}

\examples{

data(ELISA)

# Default is slightly unintuitive, but commonly useful: 
frameApply(ELISA, by = c("PlateDay", "Read"))

# Wouldn't actually recommend this model! Just a demo:
frameApply(ELISA, on = c("Signal", "Concentration"), by = c("PlateDay", "Read"),
           fun = function(dat) coef(lm(Signal ~ Concentration, data =
dat)))

frameApply(ELISA, on = "Signal", by = "Concentration",
           fun = function(dat, ...) {
                    x <- dat[[1]]
                    out <- c(Mean = mean(x, ...),
                             SD = sd(x, ...),
                             N = sum(!is.na(x)))
                  },
           na.rm = TRUE,
           subset = !is.na(Concentration))
}
\author{Jim Rogers \email{james\_a\_rogers@groton.pfizer.com}}
\keyword{manip}