File: data_read.Rd

package info (click to toggle)
r-cran-datawizard 1.0.1%2Bdfsg-1
links: PTS, VCS
area: main
in suites: forky, sid, trixie
size: 2,300 kB
sloc: sh: 13; makefile: 2
file content (132 lines) | stat: -rw-r--r-- 5,891 bytes
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/data_read.R, R/data_write.R
\name{data_read}
\alias{data_read}
\alias{data_write}
\title{Read (import) data files from various sources}
\usage{
data_read(
  path,
  path_catalog = NULL,
  encoding = NULL,
  convert_factors = TRUE,
  verbose = TRUE,
  ...
)

data_write(
  data,
  path,
  delimiter = ",",
  convert_factors = FALSE,
  save_labels = FALSE,
  verbose = TRUE,
  ...
)
}
\arguments{
\item{path}{Character string, the file path to the data file.}

\item{path_catalog}{Character string, path to the catalog file. Only relevant
for SAS data files.}

\item{encoding}{The character encoding used for the file. Usually not needed.}

\item{convert_factors}{If \code{TRUE} (default), numeric variables, where all
values have a value label, are assumed to be categorical and converted into
factors. If \code{FALSE}, no variable types are guessed and no conversion of
numeric variables into factors will be performed. For \code{data_read()}, this
argument only applies to file types with \emph{labelled data}, e.g. files from
SPSS, SAS or Stata. See also section 'Differences to other packages'. For
\code{data_write()}, this argument only applies to the text (e.g. \code{.txt} or
\code{.csv}) or spreadsheet file formats (like \code{.xlsx}). Converting to factors
might be useful for these formats because labelled numeric variables are then
converted into factors and exported as character columns - else, value labels
would be lost and only numeric values are written to the file.}

\item{verbose}{Toggle warnings and messages.}

\item{...}{Arguments passed to the related \verb{read_*()} or \verb{write_*()} functions.}

\item{data}{The data frame that should be written to a file.}

\item{delimiter}{For CSV-files, specifies the delimiter. Defaults to \code{","},
but in particular in European regions, \code{";"} might be a useful alternative,
especially when exported CSV-files should be opened in Excel.}

\item{save_labels}{Only applies to CSV files. If \code{TRUE}, value and variable
labels (if any) will be saved as additional CSV file. This file has the same
file name as the exported CSV file, but includes a \code{"_labels"} suffix (i.e.
when the file name is \code{"mydat.csv"}, the additional file with value and
variable labels is named \code{"mydat_labels.csv"}).}
}
\value{
A data frame.
}
\description{
This functions imports data from various file types. It is a small wrapper
around \code{haven::read_spss()}, \code{haven::read_stata()}, \code{haven::read_sas()},
\code{readxl::read_excel()} and \code{data.table::fread()} resp. \code{readr::read_delim()}
(the latter if package \strong{data.table} is not installed). Thus, supported file
types for importing data are data files from SPSS, SAS or Stata, Excel files
or text files (like '.csv' files). All other file types are passed to
\code{rio::import()}. \code{data_write()} works in a similar way.
}
\section{Supported file types}{

\itemize{
\item \code{data_read()} is a wrapper around the \strong{haven}, \strong{data.table}, \strong{readr}
\strong{readxl} and \strong{rio} packages. Currently supported file types are \code{.txt},
\code{.csv}, \code{.xls}, \code{.xlsx}, \code{.sav}, \code{.por}, \code{.dta} and \code{.sas} (and related
files). All other file types are passed to \code{rio::import()}.
\item \code{data_write()} is a wrapper around \strong{haven}, \strong{readr} and \strong{rio}
packages, and supports writing files into all formats supported by these
packages.
}
}

\section{Compressed files (zip) and URLs}{

\code{data_read()} can also read the above mentioned files from URLs or from
inside zip-compressed files. Thus, \code{path} can also be a URL to a file like
\code{"http://www.url.com/file.csv"}. When \code{path} points to a zip-compressed file,
and there are multiple files inside the zip-archive, then the first supported
file is extracted and loaded.
}

\section{General behaviour}{

\code{data_read()} detects the appropriate \verb{read_*()} function based on the
file-extension of the data file. Thus, in most cases it should be enough to
only specify the \code{path} argument. However, if more control is needed, all
arguments in \code{...} are passed down to the related \verb{read_*()} function. The
same applies to \code{data_write()}, i.e. based on the file extension provided in
\code{path}, the appropriate \verb{write_*()} function is used automatically.
}

\section{SPSS specific behaviour}{

\code{data_read()} does \emph{not} import user-defined ("tagged") \code{NA} values from
SPSS, i.e. argument \code{user_na} is always set to \code{FALSE} when importing SPSS
data with the \strong{haven} package. Use \code{convert_to_na()} to define missing
values in the imported data, if necessary. Furthermore, \code{data_write()}
compresses SPSS files by default. If this causes problems with (older) SPSS
versions, use \code{compress = "none"}, for example
\code{data_write(data, "myfile.sav", compress = "none")}.
}

\section{Differences to other packages that read foreign data formats}{

\code{data_read()} is most comparable to \code{rio::import()}. For data files from
SPSS, SAS or Stata, which support labelled data, variables are converted into
their most appropriate type. The major difference to \code{rio::import()} is for
data files from SPSS, SAS, or Stata, i.e. file types that support
\emph{labelled data}. \code{data_read()} automatically converts fully labelled numeric
variables into factors, where imported value labels will be set as factor
levels. If a numeric variable has \emph{no} value labels or less value labels than
values, it is not converted to factor. In this case, value labels are
preserved as \code{"labels"} attribute. Character vectors are preserved. Use
\code{convert_factors = FALSE} to remove the automatic conversion of numeric
variables to factors.
}