1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325
|
% -*- mode: noweb; noweb-default-code-mode: R-mode; -*-
\documentclass[nojss]{jss}
\usepackage{dsfont}
\usepackage{bbm}
\usepackage{amsfonts}
\usepackage{wasysym}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% declarations for jss.cls %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% just as usual
\author{Robin K. S. Hankin\\Auckland University of Technology}
\title{Urn Sampling Without Replacement: Enumerative Combinatorics In
\proglang{R}}
%\VignetteIndexEntry{Urn sampling without replacement}
%% for pretty printing and a nice hypersummary also set:
\Plainauthor{Robin K. S. Hankin}
\Plaintitle{Urn sampling without replacement: enumerative combinatorics in
R}
\Shorttitle{Urn sampling without replacement}
%% an abstract and keywords
\Abstract{This vignette is based on~\cite{hankin2007}.
This short paper introduces a code snippet in the form of
two new \proglang{R} functions that enumerate possible draws from an urn
without replacement; these functions call \proglang{C}
code, written by the author. Some simple combinatorial problems are solved
using the software.
For reasons of performance, this vignette uses pre-calculated
answers. To calculate everything from scratch, set variable {\tt
calculate\_from\_scratch} in the first chunk to {\tt TRUE}.
}
\Keywords{Urn problems, drawing without replacement, enumerative
combinatorics, Scrabble, \proglang{R}}
\Plainkeywords{Urn problems, drawing without replacement, enumerative
combinatorics, Scrabble, R}
%% publication information
%% NOTE: This needs to filled out ONLY IF THE PAPER WAS ACCEPTED.
%% If it was not (yet) accepted, leave them commented.
%% \Volume{13}
%% \Issue{9}
%% \Month{September}
%% \Year{2004}
%% \Submitdate{2004-09-29}
%% \Acceptdate{2004-09-29}
%% The address of (at least) one author should be given
%% in the following format:
\Address{
Robin K. S. Hankin\\
Auckland University of Technology\\
AUT Tower\\
Wakefield Street\\
Auckland, New Zealand\\
E-mail: \email{hankin.robin@gmail.com}\\
{\rule{10mm}{0mm}}\hfill\includegraphics[width=1in]{\Sexpr{system.file("help/figures/partitions.png",package="partitions")}}
}
%% It is also possible to add a telephone and fax number
%% before the e-mail in the following format:
%% Telephone: +43/1/31336-5053
%% Fax: +43/1/31336-734
%% for those who use Sweave please include the following line (with % symbols):
%% need no \usepackage{Sweave.sty}
%% end of declarations %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\newsymbol\leqslant 1336
\SweaveOpts{echo=FALSE}
\begin{document}
<<set_cutdownproblem,echo=FALSE,print=FALSE>>=
<<results=hide>>=
do_from_scratch <- TRUE
@
\hfill\includegraphics[width=1in]{\Sexpr{system.file("help/figures/partitions.png",package="partitions")}}
\section{Introduction}
Drawing balls from an urn without replacement is a classical paradigm
in probability~\citep{feller1968}. It is useful in practice, and many
elementary statistics textbooks use it to introduce the binomial and
hypergeometric distributions.
In this paper, I introduce software, written by the author, that
enumerates\footnote{Enumerate: ``to mention [a number of things] one
by one, as if for the purpose of counting''} all possible draws from
an urn containing specified numbers of balls of each of a finite
number of types. Order is not important in the sense that drawing AAB
is equivalent to drawing ABA or BAA. Formally, drawing~$n$ balls
without replacement from an urn containing $f_1,f_2,\ldots,f_S$ balls
of types $1,2,\ldots, S$ is equivalent to choosing a solution
$a_1,\ldots,a_S$ to the Diophantine equation
\begin{equation}
\label{basic_equation}
\sum_{i=1}^Sa_i=n,\qquad 0\leqslant a_i\leqslant f_i,
\end{equation}
with
probability~$\left.\left(\begin{array}{c}f_i\\a_i\end{array}\right)\right/
\left(\begin{array}{c}\sum f_i\\n\end{array}\right)$.
Combinatorial enumeration is often necessary in the context of integer
optimization: the appropriate configurations are enumerated and the
optimal one reported. Sometimes explicit enumeration is needed to
count solutions satisfying some condition: simply enumerate candidate
solutions, then test them one by one.
The software discussed here comprises two new \proglang{R}
\citep{rmanual} functions \code{S()} and \code{blockparts()}, which
are currently part of the \pkg{partitions} package~\citep{hankin2006},
version 1.3-3. These functions call \proglang{C} code, also written
by the author, which is available as part of the package.
All software is available from CRAN, \url{https://CRAN.r-project.org/}.
\section{Examples}
The software associated with this snippet is now used to answer a
variety of combinatorial questions, written in textbook example style,
that require enumerative techniques to solve.
{\bf\em Question\/} A chess player is considering endgames in which
White has a king, no pawns, and exactly three other pieces. What
combinations of white pieces are possible? No promotions have
occurred.
{\bf\em Answer\/} This is an urn problem with a pool of~7 objects, in
this case non-king chess pieces. An enumeration of the size-3 draws
is required, which is given by new function {\tt blockparts()}. This
function enumerates the distinct solutions to
equation~\ref{basic_equation} in columns which appear in
lexicographical order:
<<load_library,echo=FALSE,print=FALSE>>=
library(partitions)
library(polynom)
options(width=85)
<<chess_problem,echo=TRUE,print=TRUE>>=
blockparts(c(Bishops=2,Knights=2,Rooks=2,Queens=1),3)
@
The first sample appears as the first column: this is the first
lexicographically, as all draws are from as low an index of \code{f}
as possible. Subsequent draws are in lexicographical order. Starting
with a draw \code{d}---a vector of \code{length(y)} elements---the
next draw is obtained by the following algorithm:
\begin{enumerate}
\item Starting at the beginning of the vector, find the first block
that can be moved one square to the right, and move it. If no such
block exists, all draws have been enumerated: {\bf stop}.
\item Take all blocks to the left of the one that has moved, and place
them sequentially in the frame, starting from the left.
\item Go to item 1.
\end{enumerate}
Figure~\ref{blocks} shows an example of this in action. The left
diagram shows a draw of a bishop and two knights, corresponding to the
the second column of the matrix returned by \code{blockparts()} above.
The first block that can be moved is one of the knights. This moves
one place to the right and becomes a rook. The remaining pieces (that
is, one bishop and one knight) are redistributed starting from the
left; they become two bishops.
There are thus~\Sexpr{S(c(1,2,2,2),3)} combinations: note that the
majority of them have no Queen. The solutions are in lexicographical
order, which is useful in some contexts.
\begin{figure}[htbp]
\begin{center}
\includegraphics[width=10cm]{blocks}
\caption{A pictorial description of the algorithm used in \label{blocks}
function \code{blockparts()}. Three blocks (grey squares) are
arranged in a tableau (B, N, R, Q, representing the chess pieces
under consideration) in two consecutive configurations, the left
one first. The larger, line squares above each piece name show the
maximum number of chess pieces allowed; thus the two knights in the
left diagram completely fill the `N' column and this indicates that
a maximum of two knights may be drawn. The left diagram thus
corresponds to one bishop and two knights: this is column two in the
matrix returned by \code{blockparts()} in the \proglang{R} chunk
above. The right diagram shows the next lexicographical
arrangement, corresponding to column three; the algorithm for the
change is described in the text} \end{center} \end{figure}
{\bf\em Question\/} ``Scrabble'' is a popular word board game that
involves choosing, at random, a rack of~7 tiles from a pool of~100
with the following frequencies:
<<scrabble_tiles>>=
scrabble <-
c(a=9,b=2,c=2,d=4,e=12,f=2,g=3,h=2,i=9,j=1,k=1,l=4,m=2,n=6,o=8,p=2,q=1,r=6,s=4,t=6,u=4,v=2,w=2,x=1,y=2,z=1," "=2)
<<echo=TRUE>>=
scrabble
@
(note the last entry: two of the tiles are blank).
\begin{itemize}
\item[({\bf\em i\/})] how many distinct racks are possible?
\item[({\bf\em ii\/})] what proportion of racks have no blanks?
\item[({\bf\em iii\/})] what is the most probable rack and what is its frequency?
\end{itemize}
{\bf\em Answer}
({\bf\em i\/}). The number of draws is given by function {\tt S()}.
This function returns the number of solutions to
equation~\ref{basic_equation} by determining the coefficient of $x^n$
in the generating function~$\prod_{i=1}^S\sum_{j=0}^{f_i} x^j$ using
the \pkg{polynom} \citep{venables2006} package:
<<call_S,echo=TRUE>>=
S(scrabble,7)
@
({\bf\em ii\/}). The number of racks with no blank is given by function {\tt
S()}, applied with a suitably shortened vector argument; the
proportion of racks with no blank is then:
<<call_S_noblanks,print=TRUE,echo=TRUE>>=
S(scrabble[-27],7)/S(scrabble,7)
@
Note that this question is distinct from that of determining the {\em
probability} of drawing no blanks, which is given by elementary
combinatorial arguments as~{\small $\left.
\left(\begin{array}{c}98 \\7\end{array}\right)\right/
\left(\begin{array}{c}100\\7\end{array}\right)
$}, or about~\Sexpr{100*round(choose(98,7)/choose(100,7),2)}\%. This
value is larger because racks with one or more blanks have relatively
low probabilities of being drawn.
({\bf\em iii\/}). To determine the most probable rack, we note that the
probability of a given draw is given by
\[
\frac{
\prod\left(\begin{array}{c}f_i\\a_i\end{array}\right)}
{\left(\begin{array}{c}100\\7\end{array}\right)}.
\]
The appropriate \proglang{R} idiom would be to enumerate all possible
racks using \code{blockparts()}, and apply a function that calculates
the probability of each rack:
<<probOfRacks,print=FALSE,echo=FALSE,cache=TRUE>>=
f <- function(a){prod(choose(scrabble,a))/choose(sum(scrabble),7)}
if(do_from_scratch){
racks <- blockparts(scrabble,7)
probs <- apply(racks,2,f)
otarine_ans <- rep(names(scrabble), racks[, which.max(probs)])
max_probs <- max(probs)
mp <- min(probs)
a <- floor(log10(mp))
how_many_racks <- sum(probs==min(probs))
} else {
load("answers.Rdata")
}
@
\begin{Schunk}
\begin{Sinput}
> f <- function(a) { prod(choose(scrabble, a))/choose(sum(scrabble), 7) }
> racks <- blockparts(scrabble, 7)
> probs <- apply(racks, 2, f)
\end{Sinput}
\end{Schunk}
The draw of maximal probability is given by the maximal element of
\code{probs}. The corresponding rack is then:
\begin{Schunk}
\begin{Sinput}
> rep(names(scrabble), racks[, which.max(probs)])
\end{Sinput}
\end{Schunk}
<<print_otarine_letters,echo=FALSE,print=TRUE,cache=FALSE>>=
otarine_ans
@
In the context of the full Scrabble problem, there is only one
acceptable anagram of the most probable rack: ``otarine''. Its
probability is
\begin{Schunk}
\begin{Sinput}
> max(probs)
\end{Sinput}
\end{Schunk}
<<print_max_probs,echo=FALSE,print=TRUE,cache=FALSE>>=
max_probs
@
or just over once per~\Sexpr{ceiling(1/max_probs)} draws. It is
interesting to note that the {\em least} probable rack is not unique:
there are exactly~\Sexpr{how_many_racks} racks each with
minimal probability (about~$\Sexpr{round(mp/10^a,3)}\times
10^{\Sexpr{a}}$).
\section{Conclusions}
The software discussed in this code snippet enumerates the possible
draws from an urn made without replacement; it is used to answer
several combinatorial questions that require enumeration for their
answer. Further work might include enumeration of solutions of
arbitrary linear Diophantine equations.
\subsection*{Acknowledgement}
I would like to acknowledge the many stimulating and helpful comments
made by the R-help list while preparing this software.
I would also like to thank an anonymous referee who suggested that the
\code{polynom} package could be used to evaluate the generating
function appearing in \code{S()}.
\bibliography{partitions}
\end{document}
|