1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175
|
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/peek.R
\name{peek}
\alias{peek}
\alias{histOMatic}
\title{Show variables, one at a time, QUICKLY and EASILY.}
\usage{
peek(
dat,
sort = TRUE,
file = NULL,
textout = FALSE,
ask,
...,
xlabstub = "kutils peek: ",
freq = FALSE,
histargs = list(probability = !freq),
barargs = list(horiz = TRUE, las = 1)
)
}
\arguments{
\item{dat}{An R data frame or something that can be coerced to a
data frame by \code{as.data.frame}}
\item{sort}{Default TRUE. Do you want display of the columns in
alphabetical order?}
\item{file}{Should output go in file rather than to the screen.
Default is NULL, meaning show on screen. If you supply a file
name, we will write PDF output into it.}
\item{textout}{If TRUE, counts from histogram bins and tables will
appear in the console.}
\item{ask}{As in the old style R \code{par(ask = TRUE)}: should
keyboard interaction advance to the next plot. Will default
to false if the file argument is non-null. If file is null,
setting ask = FALSE will cause graphs to whir bye without
pausing.}
\item{...}{Additional arguments for the pdf, histogram, table, or
barplot functions. Please see Details below.}
\item{xlabstub}{A text stub that will appear in the x axis
label. Currently it includes advertising for this package.}
\item{freq}{As in the histogram frequency argument. Should graphs
show counts (freq = TRUE) or proportions (AKA densities) (freq
= FALSE)}
\item{histargs}{A list of arguments to be passed to the
\code{hist} function.}
\item{barargs}{A list of arguments to be passed to the
\code{barplot} function.}
}
\value{
A vector of column names that were plotted
}
\description{
This makes it easy to quickly scan through all of the columns in a
data frame to spot unexpected patterns or data entry errors. Numeric variables are depicted as
histograms, while factor and character variables are summarized by
the R table function and then presented as barplots. This is most
useful with a large screen graphic device (try running the function
provided with this package, \code{dev.create(height=7, width=7)})
or any other method you prefer to create a large device.
}
\section{Try the Defaults}{
Every effort has been made to make this
simple and easy to use. Please run the examples as they are
before becoming too concerned about customization. This
function is intended for getting a quick look at each
variable, one-by-one, it is not intended to create publication
quality histograms. For sake of the fastidious users, a lot
of settings can be adjusted. Users can control the parameters
for presentation of histograms (parameters for \code{hist})
and barplots (parameters for \code{barplot}). The function also
can create frequency tables (which users can control by providing
additional named arguments).
}
\section{Style}{
The histograms are standard, upright histograms.
The barplots are horizontal. I chose to make the bars
horizontal because long value labels are more easily
accomodated on the left axis. The code measures the length
(in inches) for strings and the margin is increased
accordingly. The examples have a demonstration of that
effect.
}
\section{Dealing with Dots}{
additional named arguments,
\code{...}, are inspected and sorted into groups intended to
control use of R functions \code{hist}, \code{barplot},
\code{table} and \code{pdf}. \cr \cr The parameters
c("exclude", "dnn", "useNA", "deparse.level") and will go to
the \code{table} function, which is used to make barplots for
factor and character variables. These named arguments are
extracted and sent to the pdf function: c("width", "height",
"onefile", "family", "title", "fonts", "version", "paper",
"encoding", "bg", "fg", "pointsize", "pagecentre",
"colormodel", "useDingbats", "useKerning", "fillOddEven",
"compress"). Any other arguments that are unique to
\code{hist} or \code{barplot} are sorted out and sent only to
those functions. \cr \cr Any other arguments, including
graphical parameters will be sent to both the histogram and
barplot functions, so it is a convenient way to obtain uniform
appearance. Additional arguments that are common to
\code{barplot} and \code{hist} will work, and so will any
graphics parameters (named arguments of \code{par}, for
example). However, if one wants to target some arguments to
\code{hist}, but not \code{barplot}, then the \code{histargs}
list argument should be used. Similarly, \code{barargs} should
be used to send argument to the \code{barplot}
function. Warning: the defaults for \code{histargs} and
\code{barargs} include some settings that are needed for the
existing design. If new lists for \code{histargs} or
\code{barargs} are supplied, the previously specified defaults
are lost. Hence, users should include the existing members of
those lists, possibly with revised values. \cr \cr All of
this argument sorting effort is done in order to reduce a
prolific number of warnings that were observed in previous
editions of this function.
}
\examples{
\donttest{
set.seed(234234)
N <- 200
mydf <- data.frame(x5 = rnorm(N), x4 = rnorm(N), x3 = rnorm(N),
x2 = letters[sample(1:24, 200, replace = TRUE)],
x1 = factor(sample(c("cindy", "bobby", "marsha",
"greg", "chris"), 200, replace = TRUE)),
stringsAsFactors = FALSE)
## Insert 16 missings
mydf$x1[sample(1:150, 16,)] <- NA
mydf$adate <- as.Date(c("1jan1960", "2jan1960", "31mar1960", "30jul1960"), format = "\%d\%b\%y")
peek(mydf)
peek(mydf, sort = FALSE)
## Demonstrate the dot-dot-dot usage to pass in hist params
peek(mydf, breaks = 30, ylab = "These are Counts, not Densities", freq = TRUE)
## Not Run: file output
## peek(mydf, sort = FALSE, file = "three_histograms.pdf")
## Use some objects from the datasets package
library(datasets)
peek(cars, xlabstub = "R cars data: ")
peek(EuStockMarkets, xlabstub = "Euro Market Data: ")
peek(EuStockMarkets, xlabstub = "Euro Market Data: ", breaks = 50,
freq = TRUE)
## Not run: file output
## peek(EuStockMarkets, breaks = 50, file = "myeuro.pdf",
## height = 4, width=3, family = "Times")
## peek(EuStockMarkets, breaks = 50, file = "myeuro-\%d3.pdf",
## onefile = FALSE, family = "Times", textout = TRUE)
## xlab goes into "..." and affects both histograms and barplots
peek(mydf, breaks = 30, ylab = "These are Counts, not Densities",
freq = TRUE)
## xlab is added in the barargs list.
peek(mydf, breaks = 30, ylab = "These are Counts, not Densities",
freq = TRUE, barargs = list(horiz = TRUE, las = 1, xlab = "I'm in barargs"))
peek(mydf, breaks = 30, ylab = "These are Counts, not Densities", freq = TRUE,
barargs = list(horiz = TRUE, las = 1, xlim = c(0, 100),
xlab = "I'm in barargs, not in histargs"))
levels(mydf$x1) <- c(levels(mydf$x1), "arthur philpot smythe")
mydf$x1[4] <- "arthur philpot smythe"
mydf$x2[1] <- "I forgot what letter"
peek(mydf, breaks = 30,
barargs = list(horiz = TRUE, las = 1))
}
}
\author{
Paul Johnson <pauljohn@ku.edu>
}
|