File: read.dta.Rd

package info (click to toggle)
foreign 0.8.91-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 1,712 kB
  • sloc: ansic: 7,572; asm: 4; makefile: 2
file content (103 lines) | stat: -rw-r--r-- 4,300 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
% This file is part of the 'foreign' package for R
% It is distributed under the GPL version 2 or later

\name{read.dta}
\alias{read.dta}
\title{Read Stata Binary Files}
\description{
  Reads a file in Stata version 5--12 binary format into a data frame. 
  
  Frozen: will not support Stata formats after 12.
}
\usage{
read.dta(file, convert.dates = TRUE, convert.factors = TRUE,
         missing.type = FALSE,
         convert.underscore = FALSE, warn.missing.labels = TRUE)
}
\arguments{
  \item{file}{a filename or URL as a character string.}
  \item{convert.dates}{Convert Stata dates to \code{Date} class, and
    date-times to \code{POSIXct} class?}
  \item{convert.factors}{Use Stata value labels to create factors?
    (Version 6.0 or later).}
  \item{missing.type}{For version 8 or later, store information about
    different types of missing data?}
  \item{convert.underscore}{Convert \code{"_"} in Stata variable names
    to \code{"."} in R names?}
  \item{warn.missing.labels}{Warn if a variable is specified with value
    labels and those value labels are not present in the file.}
}
\details{
  If the filename appears to be a URL (of schemes \samp{http:},
  \samp{ftp:} or \samp{https:}) the URL is first downloaded to a
  temporary file and then read.  (\samp{https:} is only supported on
  some platforms.)
  
  The variables in the Stata data set become the columns of the data
  frame.  Missing values are correctly handled.  The data label,
  variable labels, timestamp, and variable/dataset characteristics
  are stored as attributes of the data frame.

  By default Stata dates (\%d and \%td formats) are converted to \R's
  \code{Date} class, and variables with Stata value labels are
  converted to factors.  Ordinarily, \code{read.dta} will not convert
  a variable to a factor unless a label is present for every level.  Use
  \code{convert.factors = NA} to override this.  In any case the value
  label and format information is stored as attributes on the returned
  data frame.  Stata's date formats are sketchily documented: if
  necessary use \code{convert.dates = FALSE} and examine the attributes
  to work out how to post-process the dates.
  
  Stata 8 introduced a system of 27 different missing data values.  If
  \code{missing.type} is \code{TRUE} a separate list is created with the
  same variable names as the loaded data.  For string variables the list
  value is \code{NULL}.  For other variables the value is \code{NA}
  where the observation is not missing and 0--26 when the observation is
  missing.  This is attached as the \code{"missing"} attribute of the
  returned value.

  The default file format for Stata 13, \code{format-115}, is
  substantially different from those for Stata 5--12.
}
\value{
  A data frame with attributes.  These will include \code{"datalabel"},
  \code{"time.stamp"}, \code{"formats"}, \code{"types"},
  \code{"val.labels"}, \code{"var.labels"} and \code{"version"} and may
  include \code{"label.table"} and \code{"expansion.table"}.  
  Possible versions are \code{5, 6, 7},
  \code{-7} (Stata 7SE, \sQuote{format-111}), \code{8} (Stata 8 and 9,
  \sQuote{format-113}), \code{10} (Stata 10 and 11, \sQuote{format-114}).
  and \code{12} (Stata 12, \sQuote{format-115}).
  
  The value labels in attribute \code{"val.labels"} name a table for
  each variable, or are an empty string.  The tables are elements of the
  named list attribute \code{"label.table"}: each is an integer vector with
  names.
}
\references{
  Stata Users Manual (versions 5 & 6), Programming manual (version 7),
  or online help (version 8 and later) describe the format of the files.
  Or directly at \url{https://www.stata.com/help.cgi?dta_114} and
  \url{https://www.stata.com/help.cgi?dta_113}, but note that these have
  been changed since first published.
}

\author{
  Thomas Lumley and R-core members: support for value labels by
  Brian Quistorff.
}
\seealso{
  Different approaches are available in package \pkg{memisc} (see its
  help for \code{Stata.file}), function \code{read_dta} in package
  \pkg{haven} and package \pkg{readstata13}.

  \code{\link{write.dta}},
  \code{\link{attributes}},
  \code{\link{Date}},
  \code{\link{factor}}
}
\examples{
write.dta(swiss,swissfile <- tempfile())
read.dta(swissfile)
}
\keyword{file}