1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295
|
\documentclass[11pt]{article}
\usepackage{fullpage}
\usepackage{color}
\bibliographystyle{plain}
\input{pstricks}
% Taken from omegahat/Docs/WebMacros.tex
\def\strip#1>{}
\def\Escape#1{\def\next{#1}%
{\frenchspacing\expandafter\strip\meaning\next}}
\def\Env#1{{\gray\texttt{\Escape{#1}}}}
% I don't know why, but this line is a problem here. It works
% in other documents. dfs
%\def\file#1{\HREF{#1}{\textbf{\Escape{#1}}}}
\def\file#1{\textsl{#1}}
\def\dir#1{\textbf{\Escape{#1/}}}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Taken from omegahat/Docs/XMLMacros.tex
\def\XMLTag#1{\textit{#1}}
\def\XMLAttribute#1{\textsl{#1}}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\def\directory#1{\dir{#1}}
\def\XMLElement#1{\XMLTag{#1}}
\begin{document}
\title{Introduction to ggvis: multidimensional scaling as a plugin in {\tt ggobi}}
\author{Deborah F. Swayne and Andreas Buja}
\maketitle
\begin{abstract}
ggvis is a plugin for ggobi, and this short manual begins
to describe its use. It is a port of xgvis, previously
available as part of the xgobi software.
\end{abstract}
\section{The data}
First, you need some data. The ascii format supported by ggobi is not
adequate for the specification of edges, so you'll either use an xml file
or define the graph in R and feed it to ggobi using Rggobi.
If you use an xml file, it will have at least two datasets: a set of
cases or nodes, and a set of edges. The records in the node data must
have \XMLAttribute{id}s:
\begin{verbatim}
<record id="0"> ... </record>
<record id="1"> ... </record>
\end{verbatim}
The edge records use those \XMLAttribute{id}s to specify a
\XMLAttribute{source} and \XMLAttribute{destination}:
\begin{verbatim}
<record source="0" destination="1"> ... </record>
\end{verbatim}
No two records in the same xml file may have the same record id.
The sample data that is discussed here is part of the ggobi distribution:
\file{snetwork.xml}, an artificial social network of 140 people who are
connected by 205 edges, representing some (unspecified) form of social
contact. Notice that there are two variables recorded for each node,
and two variables for each edge.
In a ggobi scatterplot display, use the {\bf Edges} menu to
display the edges, and maybe the ``arrowheads'' which indicate
edge direction.
\section{Initiating ggvis}
Initiate the plugin by selecting the ``ggvis (MDS) ...'' item on
ggobi's Tools menu. The ggvis control panel starts with a
``notebook'' widget with four tabs. In most cases, you will be
able to jump straight to the last tab, {\em Run}, and start
ggvis, but we will discuss the settings in each tab with reference
to two sample datasets.
The {\em snetwork.xml} data supplied with ggobi portrays a simple
social network, and we can use ggvis to lay out its graph. The {\em
morsecodes.xml} data is the result of an experiment about rates of
confusion of Morse codes which are interpreted as dissimilarities
among pairs of symbols, and we can use ggvis to explore these
dissimilarities. For both of these sets of data, the defaults that
ggvis infers and displays in the first three notebook tabs are
correct.
\subsection{Specifying the datasets}
The first tab, ``Datasets,'' contains
two lists, one for datasets which can supply nodes and one for datasets
which can supply edges. In most cases, only one choice is possible,
but more complex arrangements can occur.
The social network data has exactly one set of nodes and one set of
edges. For this data, there are no choices to be made, and you can
simply move on to run multidimensional scaling.
The Morse code data has two sets of edges. The first one,
``distance,'' supplies the distances to be used in determining the
point positions. The second set, ``edges,'' is a set of edges that
can be used for display, because it emphasizes the structure of of
the data.
Once the nodes and edges have been specified, move to the next tab.
\subsection{Task}
Under the next tab are some parameters which define the task
you're performing, and perhaps set some options. First, the program
wants to know whether you're going to perform {\em Dissimilarity
analysis}, as you would with the Morse code data, or {\em Graph
layout}, as you would with the social network data. ggvis has
attempted, in a primitive way, to guess which task you're doing:
If you have included an edge set named ``dist,'' ``distance,'' or
``dissim,'' then ggvis guesses that you are interested in
dissimilarity analysis; otherwise, it guesses that you're going
to lay out a graph. You may override that guess here.
If you are in fact planning to lay out a graph, then you have two more
choices to make. In the simplest graph layout situation, as in the
case of the social network example, the pairwise distances are simply
defined as the minimum number of edges traversed in traveling between
any pair of nodes. (Edge direction is ignored.) By supplying edges,
you have specified the pairwise distances for only those nodes that
are directly connected; select {\em Complete graph distances} so that
the remaining pairwise distances will be calculated.
Your second choice is whether to supply a vector of edge weights,
which you can do by supplying a weighting variable in the edge
dataset. The social network example includes no edge weights.
\subsection{Dist}
The next tab is where you will specify the source of pairwise
distances (for dissimilarity analysis) or edge weights (for
weighted graph layout).
For the {\em morsecodes} data, the target distance matrix is a confusion
matrix representing the fraction of times that one character in the
Morse code was confused with another in an experiment. For ggvis,
we have represented that as an edge variable called {\bf D}, which
you will see has been selected by default.
\subsection{Run}
Once the target distance matrix D has been populated, a new
scatterplot display appears. Proceed to the next tab, ``Run,''
to begin running the multidimensional scaling algorithm.
Before clicking on the ``Run MDS'' button, choose the dimension
of the space into which you want to project the data. Both the
social network graph and the Morse code data project well into
3-dimensional spaces, but it's also interesting to see how they
look in the plane.
You can now click on the ``Run MDS'' button to start the
algorithm. You may want to start with a large step size, and
then reduce the step size as the layout converges.
\section {Controlling MDS}
The sliders below the histogram of the target distances are quite
useful in tuning the layout. Increasing the exponent of {\bf D}
increases the influences of the large distances. For the social
network data, this slider controls the length of the edges close to
the center and the spread of the leaf nodes.
This rest of the documentation for this section has not yet been
written, but we can refer you to documentation for the older {\em
xgvis} (\cite{xgvis_jcgs2002},
http://www.research.att.com/areas/stat/xgobi/), which was used with
xgobi, ggobi's predecessor.
\section {Comparing sets of distances or edge weights}
If your input data contains two vectors of distances of edge weights,
then you can pause MDS, back up to the {\em Dist} tab, select an
alternative variable, and then begin to run MDS once more. The points
in the plot in the MDS window will smoothly migrate to their new
positions.
\section{Manipulations}
All the standard ggobi direct manipulations are available. Plots of
the graph can obviously be linked node-wise to plots of node covariates.
What might be less obvious is that they can also be linked to plots of
edge covariates, so that an edge in the graph corresponds to a node in the
plot of edge data. Nodes can be interactively brushed and ``identified;''
edges can be brushed -- to set the color of the line type and width.
The ``color schemes'' tool can be used to automatically color the nodes
or the edges.
Nodes can also be moved, which can help untangle the layout a bit.
Groups of nodes can be moved together: first brush them with
the same symbol and color, and then select {\tt Move brush group}
in the {\tt Move points} ViewMode.
% This functionality is to be implemented in a new plugin.
% There is only one added manipulation, and that puts it an appearance
% during the use of identification in a radial layout. When a node is
% made ``sticky,'' a set of nodes and edges are are highlighted using
% the current brushing color: the sticky node, its edges, and all
% nodes to which those edges are connected. To remove the
% highlighting, click again on the node to make it unsticky.
\subsection{Thinning the graph}
Brushing can be used to thin the plot, by hiding nodes with especially
high in or out degree, for example, or nodes with large values of depth.
Once those nodes are hidden, a new layout can be produced.
\section{Plot control from R}
If you have launched ggobi from within R, you can use the API to
drive the plot. Here are some fragments of code in the S language
that I have used.
In the first example, I have two xml files representing two
related graphs, and I'm interested in comparing them. This is
part of the code used to highlight the edges the two graphs
have in common.
\begin{verbatim}
g1 <- ggobi("f1.xml")
setDisplayEdges.ggobi(.gobi=g1)
e1 <- getEdges.ggobi(.data=2, .gobi=g1)
g2 <- ggobi("f2.xml")
setDisplayEdges.ggobi(.gobi=g2)
e2 <- getEdges.ggobi(.data=2, .gobi=g2)
...
esame1 <- enames1 %in% enames2
edgecolors1 <- rep(1, nedges1)
edgecolors1[esame1==T] <- 6
setColors.ggobi (edgecolors1, .data=2, .gobi=g1)
...
\end{verbatim}
In the second example, there is one graph, and its edge covariates are
the values of a variable recorded for each edge at $t_i$. I use R to
animate the graph, using color to encode the edge weight; I first chose
a sequential colorscheme. Similarly, one could use setGlyphs.ggobi() to
set node type or size. The same command sets edge type or thickness
when applied to the edge data. To hide some of the edges, use
setHiddenCases.ggobi().
\begin{verbatim}
edges <- getData.ggobi(2)
ntimesteps <- dim(edges)[2]
for (i in 1:ntimesteps) {
tgraph (i, edges)
...
colors <- integer(dim(edges)[1])
colors[lnw>(3*mx/4)] <- 5
colors[lnw>(mx/2) & lnw<=(3*mx/4)] <- 4
colors[lnw>(mx/4) & lnw<=(mx/2)] <- 3
colors[lnw>0 & lnw<=(mx/4)] <- 2
setColors.ggobi (colors, 1:length(colors), 2)
}
\end{verbatim}
In an extension of the second example, the xml file includes three
datasets. The third is of dimension $ntimesteps$ by $2$, and its
single time series plot represents the sum of all measurements
for all edges at each time step. As the animation runs,
we highlight the corresponding point in a scatterplot of this time series.
\begin{verbatim}
tcolors <- integer(ntimesteps)
tcolors[1:length(tcolors)] <- 3
tcolors[i] <- 7
setColors.ggobi (tcolors, 1:length(tcolors), 3)
\end{verbatim}
\section{Related work}
Another plugin, GraphLayout, offers several methods for laying
out graphs. One of its methods, called neato, produces layouts
that are similar to those of ggvis.
In addition, there is a plugin (``GraphAction'') for manipulations
that are specific to graphs.
\bibliography{references}
\end{document}
|