File: make.pbalanced.Rd

package info (click to toggle)
r-cran-plm 2.6-2%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: bookworm
  • size: 6,032 kB
  • sloc: sh: 13; makefile: 4
file content (182 lines) | stat: -rw-r--r-- 7,445 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/make.pconsecutive_pbalanced.R
\name{make.pbalanced}
\alias{make.pbalanced}
\alias{make.pbalanced.pdata.frame}
\alias{make.pbalanced.pseries}
\alias{make.pbalanced.data.frame}
\title{Make data balanced}
\usage{
make.pbalanced(
  x,
  balance.type = c("fill", "shared.times", "shared.individuals"),
  ...
)

\method{make.pbalanced}{pdata.frame}(
  x,
  balance.type = c("fill", "shared.times", "shared.individuals"),
  ...
)

\method{make.pbalanced}{pseries}(
  x,
  balance.type = c("fill", "shared.times", "shared.individuals"),
  ...
)

\method{make.pbalanced}{data.frame}(
  x,
  balance.type = c("fill", "shared.times", "shared.individuals"),
  index = NULL,
  ...
)
}
\arguments{
\item{x}{an object of class \code{pdata.frame}, \code{data.frame},
or \code{pseries};}

\item{balance.type}{character, one of \code{"fill"},
\code{"shared.times"}, or \code{"shared.individuals"}, see
\strong{Details},}

\item{\dots}{further arguments.}

\item{index}{only relevant for \code{data.frame} interface; if
\code{NULL}, the first two columns of the data.frame are
assumed to be the index variables; if not \code{NULL}, both
dimensions ('individual', 'time') need to be specified by
\code{index} as character of length 2 for data frames, for
further details see \code{\link[=pdata.frame]{pdata.frame()}},}
}
\value{
An object of the same class as the input \code{x}, i.e., a
pdata.frame, data.frame or a pseries which is made balanced
based on the index variables. The returned data are sorted as a
stacked time series.
}
\description{
This function makes the data balanced, i.e., each individual has the same
time periods, by filling in or dropping observations
}
\details{
(p)data.frame and pseries objects are made balanced, meaning each
individual has the same time periods.  Depending on the value of
\code{balance.type}, the balancing is done in different ways:
\itemize{ \item \code{balance.type = "fill"} (default): The union
of available time periods over all individuals is taken (w/o
\code{NA} values).  Missing time periods for an individual are
identified and corresponding rows (elements for pseries) are
inserted and filled with \code{NA} for the non--index variables
(elements for a pseries).  This means, only time periods present
for at least one individual are inserted, if missing.

\item \code{balance.type = "shared.times"}: The intersect of available time
periods over all individuals is taken (w/o \code{NA} values).  Thus, time
periods not available for all individuals are discarded, i. e., only time
periods shared by all individuals are left in the result).

\item \code{balance.type = "shared.individuals"}: All available time periods
are kept and those individuals are dropped for which not all time periods
are available, i. e., only individuals shared by all time periods are left
in the result (symmetric to \code{"shared.times"}).  }

The data are not necessarily made consecutive (regular time series
with distance 1), because balancedness does not imply
consecutiveness. For making the data consecutive, use
\code{\link[=make.pconsecutive]{make.pconsecutive()}} (and, optionally, set argument
\code{balanced = TRUE} to make consecutive and balanced, see also
\strong{Examples} for a comparison of the two functions.

Note: Rows of (p)data.frames (elements for pseries) with \code{NA}
values in individual or time index are not examined but silently
dropped before the data are made balanced. In this case, it cannot
be inferred which individual or time period is meant by the missing
value(s) (see also \strong{Examples}).  Especially, this means:
\code{NA} values in the first/last position of the original time
periods for an individual are dropped, which are usually meant to
depict the beginning and ending of the time series for that
individual.  Thus, one might want to check if there are any
\code{NA} values in the index variables before applying
\code{make.pbalanced}, and especially check for \code{NA} values in the
first and last position for each individual in original data and,
if so, maybe set those to some meaningful begin/end value for the
time series.
}
\examples{

# take data and make it unbalanced
# by deletion of 2nd row (2nd time period for first individual)
data("Grunfeld", package = "plm")
nrow(Grunfeld)                            # 200 rows
Grunfeld_missing_period <- Grunfeld[-2, ]
pdim(Grunfeld_missing_period)$balanced    # check if balanced: FALSE
make.pbalanced(Grunfeld_missing_period)   # make it balanced (by filling)
make.pbalanced(Grunfeld_missing_period, balance.type = "shared.times") # (shared periods)
nrow(make.pbalanced(Grunfeld_missing_period))
nrow(make.pbalanced(Grunfeld_missing_period, balance.type = "shared.times"))

# more complex data:
# First, make data unbalanced (and non-consecutive) 
# by deletion of 2nd time period (year 1936) for all individuals
# and more time periods for first individual only
Grunfeld_unbalanced <- Grunfeld[Grunfeld$year != 1936, ]
Grunfeld_unbalanced <- Grunfeld_unbalanced[-c(1,4), ]
pdim(Grunfeld_unbalanced)$balanced        # FALSE
all(is.pconsecutive(Grunfeld_unbalanced)) # FALSE

g_bal <- make.pbalanced(Grunfeld_unbalanced)
pdim(g_bal)$balanced        # TRUE
unique(g_bal$year)          # all years but 1936
nrow(g_bal)                 # 190 rows
head(g_bal)                 # 1st individual: years 1935, 1939 are NA

# NA in 1st, 3rd time period (years 1935, 1937) for first individual
Grunfeld_NA <- Grunfeld
Grunfeld_NA[c(1, 3), "year"] <- NA
g_bal_NA <- make.pbalanced(Grunfeld_NA)
head(g_bal_NA)        # years 1935, 1937: NA for non-index vars
nrow(g_bal_NA)        # 200

# pdata.frame interface
pGrunfeld_missing_period <- pdata.frame(Grunfeld_missing_period)
make.pbalanced(Grunfeld_missing_period)

# pseries interface
make.pbalanced(pGrunfeld_missing_period$inv)

# comparison to make.pconsecutive
g_consec <- make.pconsecutive(Grunfeld_unbalanced)
all(is.pconsecutive(g_consec)) # TRUE
pdim(g_consec)$balanced        # FALSE
head(g_consec, 22)             # 1st individual:   no years 1935/6; 1939 is NA; 
                               # other indviduals: years 1935-1954, 1936 is NA
nrow(g_consec)                 # 198 rows

g_consec_bal <- make.pconsecutive(Grunfeld_unbalanced, balanced = TRUE)
all(is.pconsecutive(g_consec_bal)) # TRUE
pdim(g_consec_bal)$balanced        # TRUE
head(g_consec_bal)                 # year 1936 is NA for all individuals
nrow(g_consec_bal)                 # 200 rows

head(g_bal)                        # no year 1936 at all
nrow(g_bal)                        # 190 rows

}
\seealso{
\code{\link[=is.pbalanced]{is.pbalanced()}} to check if data are balanced;
\code{\link[=is.pconsecutive]{is.pconsecutive()}} to check if data are consecutive;
\code{\link[=make.pconsecutive]{make.pconsecutive()}} to make data consecutive (and,
optionally, also balanced).\cr \code{\link[=punbalancedness]{punbalancedness()}}
for two measures of unbalancedness, \code{\link[=pdim]{pdim()}} to check
the dimensions of a 'pdata.frame' (and other objects),
\code{\link[=pvar]{pvar()}} to check for individual and time variation
of a 'pdata.frame' (and other objects), \code{\link[=lag]{lag()}} for
lagging (and leading) values of a 'pseries' object.\cr
\code{\link[=pseries]{pseries()}}, \code{\link[=data.frame]{data.frame()}},
\code{\link[=pdata.frame]{pdata.frame()}}.
}
\author{
Kevin Tappe
}
\keyword{attribute}