File: Merge.Rd

package info (click to toggle)

hmisc 5.2-4-2

links: PTS, VCS
area: main
in suites: sid
size: 4,044 kB
sloc: asm: 28,905; f90: 590; ansic: 415; xml: 160; fortran: 75; makefile: 2

file content (36 lines) | stat: -rw-r--r-- 2,035 bytes

parent folder | download | duplicates (4)

% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/Merge.r
\name{Merge}
\alias{Merge}
\title{Merge Multiple Data Frames or Data Tables}
\usage{
Merge(..., id = NULL, all = TRUE, verbose = TRUE)
}
\arguments{
\item{\dots}{two or more dataframes or data tables}

\item{id}{a formula containing all the identification variables such that the combination of these variables uniquely identifies subjects or records of interest.  May be omitted for data tables; in that case the \code{key} function retrieves the id variables.}

\item{all}{set to \code{FALSE} to drop observations not found in second and later data frames (only applies if not using \code{data.table})}

\item{verbose}{set to \code{FALSE} to not print information about observations}
}
\description{
Merges an arbitrarily large series of data frames or data tables containing common \code{id} variables.  Information about number of observations and number of unique \code{id}s in individual and final merged datasets is printed.  The first data frame/table has special meaning in that all of its observations are kept whether they match \code{id}s in other data frames or not.  For all other data frames, by default non-matching observations are dropped.  The first data frame is also the one against which counts of unique \code{id}s are compared.  Sometimes \code{merge} drops variable attributes such as \code{labels} and \code{units}.  These are restored by \code{Merge}.
}
\examples{
\dontrun{
a <- data.frame(sid=1:3, age=c(20,30,40))
b <- data.frame(sid=c(1,2,2), bp=c(120,130,140))
d <- data.frame(sid=c(1,3,4), wt=c(170,180,190))
all <- Merge(a, b, d, id = ~ sid)
# First file should be the master file and must
# contain all ids that ever occur.  ids not in the master will
# not be merged from other datasets.
a <- data.table(a); setkey(a, sid)
# data.table also does not allow duplicates without allow.cartesian=TRUE
b <- data.table(sid=1:2, bp=c(120,130)); setkey(b, sid)
d <- data.table(d); setkey(d, sid)
all <- Merge(a, b, d)
}
}