1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120
|
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/variableKey.R
\name{keyImport}
\alias{keyImport}
\title{Import/validate a key object or import/validate a key from a file.}
\usage{
keyImport(
key,
ignoreCase = TRUE,
sep = c(character = "\\\\|", logical = "\\\\|", integer = "\\\\|", factor = "\\\\|",
ordered = "[\\\\|<]", numeric = "\\\\|"),
na.strings = c("\\\\.", "", "\\\\s+", "N/A"),
missSymbol = ".",
...,
keynames = NULL
)
}
\arguments{
\item{key}{A key object (class key or keylong) or a file name
character string (ending in csv, xlsx or rds).}
\item{ignoreCase}{In the use of this key, should we ignore
differences in capitalization of the "name_old" variable?
Sometimes there are inadvertent misspellings due to changes in
capitalization. Columns named "var01" and "Var01" and "VAR01"
probably should receive the same treatment, even if the key
has name_old equal to "Var01".}
\item{sep}{Character separator in \code{value_old} and
\code{value_new} strings in a wide key. Default is are "|".
It is also allowed to use "<" for ordered variables. Use
regular expressions in supplying separator values.}
\item{na.strings}{Values that should be converted to missing data.
This is relevant in \code{name_new} as well as
\code{value_new}. In spreadsheet cells, we treat "empty" cells
(the string ""), or values like "." or "N/A", as missing with
defaults ".", "", "\\s" (white space), and "N/A". Change that
if those are not to be treated as missings.}
\item{missSymbol}{Defaults to period "." as missing value
indicator.}
\item{...}{additional arguments for read.csv or read.xlsx.}
\item{keynames}{Don't use this unless you are very careful. In our
current scheme, the column names in a key should be
c("name_old", "name_new", "class_old", "class_new",
"value_old", "value_new", "missings", "recodes"). If your key
does not use those column names, it is necessary to provide
keynames in a format "our_name"="your_name". For example,
keynames = c(name_old = "oldvar", name_new = "newname",
class_old = "vartype", class_new = "class", value_old =
"score", value_new = "val").}
}
\value{
key object, should be same "wide" or "long" as the input
Missing symbols in value_old and value_new converted to ".".
}
\description{
After the researcher has updated the key by filling in new names
and values, we import that key file. This function can import the
file by its name, after deducing the file type from the suffix, or
it can receive a key object from memory.
}
\details{
This can be either a wide or long format key file.
This cleans up variables in following ways. 1) \code{name_old}
and \code{name_new} have leading and trailing spaces removed 2)
\code{value_old} and \code{value_new} have leading and trailing
spaces removed, and if they are empty or blank spaces, then new
values are set as NA.
Policy change concerning empty "value_new" cells in input keys
(20170929).
There is confusion about what ought to happen in a wide key when
the user leaves value_new as empty or missing. Literally, this
means all values are converted to missing, which does not seem
reasonable. Hence, when a key is wide, and value_new is one of the
na.strings elements, we assume the value_new is to be copied
from value_old. That is to say, if value_new is not supplied,
the values remain same as in old data.
In a long key, the behavior is different. Since the user can
specify each value for a variable in a separate row, the na.strings
appearing in value_new are treated as missing scores in the new
data set to be created.
}
\examples{
mydf.key.path <- system.file("extdata", "mydf.key.csv", package = "kutils")
mydf.key <- keyImport(mydf.key.path)
## Create some dupes
mydf.key <- rbind(mydf.key, mydf.key[c(1,7), ])
mydf.key2 <- keyImport(mydf.key)
mydf.key2
## create some empty value_new cells
mydf.key[c(3, 5, 7) , "value_new"] <- ""
mydf.key3 <- keyImport(mydf.key)
mydf.key3
mydf.keylong.path <- system.file("extdata", "mydf.key_long.csv", package = "kutils")
mydf.keylong <- keyImport(mydf.keylong.path)
## testDF is a slightly more elaborate version created for unit testing:
testdf.path <- system.file("extdata", "testDF.csv", package = "kutils")
testdf <- read.csv(testdf.path, header = TRUE)
keytemp <- keyTemplate(testdf, long = TRUE)
## A "hand edited key file"
keyPath <- system.file("extdata", "testDF-key.csv", package="kutils")
key <- keyImport(keyPath)
keydiff <- keyDiff(keytemp, key)
key2 <- rbind(key, keydiff$neworaltered)
key2 <- unique(key)
if(interactive())View(key2)
}
\author{
Paul Johnson <pauljohn@ku.edu>
}
|