1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165
|
\import{mcx.zmm}
\begin{pud::man}{
{name}{clm info}
{html_title}{The clm info manual}
{author}{Stijn van Dongen}
{section}{1}
{synstyle}{long}
{defstyle}{long}
\man_share
}
\${html}{\"pud::man::maketoc"}
\sec{name}{NAME}
\NAME{clm_info}{compute performance measures for graphs and clusterings.}
\disclaim_clm{info}
\sec{synopsis}{SYNOPSIS}
\par{
\clm{info} [options] <graph file> <cluster file> <cluster file>*
}
\par{
\clm{info}
\synoptopt{-o}{fname}{write to file \genopt{fname}}
\synoptopt{-pi}{f}{apply inflation beforehand}
\shared_synoptopt{-tf}
\synoptopt{-cl-tree}{fname}{expect file with nested clusterings}
\synoptopt{-cat-max}{num}{do at most \genopt{num} tree levels}
\synoptopt{-cl-ceil}{<num>}{skip clusters of size exceeding <num>}
\synoptopt{--node-self-measures}{dump measure for native cluster}
\synoptopt{--node-all-measures}{dump measure for incident cluster}
\stdsynopt
<matrix file> <cluster file> <cluster file>*
}
\sec{description}{DESCRIPTION}
\par{
\clm{info} computes several numbers indicative for the efficiency with
with a clustering captures the edge mass of a given graph.
Use it in conjunction with \clm{dist} to determine which clusterings
you accept. See the EXAMPLES section in \clm{dist}
for an example of \clm{dist} and \clm{info} (and \clm{meet}) usage.
Output can be generated for multiple clusterings at the same time.}
\par{
The \bf{efficiency} factor is described in [1] (see
the \secref{references} section). It tries to balance the dual aims of
capturing a lot of edges or edge weights and keeping the cluster footprint
or area fraction small. The efficiency number has several appealing
mathematical properties, cf. [1]. It is related to, but not derivable from,
the second and third numbers, the \it{mass fraction} and the
\it{area fraction}.}
\par{
The \bf{mass fraction} is defined as follows.
Let \bf{e} be an edge of the graph. The clustering \it{captures} \bf{e}
if the two nodes associated with \bf{e} are in the same cluster.
Now the mass fraction is the joint weight of all captured edges divided
by the joint weight of all edges in the input graph.}
\par{
The \bf{area fraction} is roughly the sum of the
squares of all cluster sizes for all clusters in the clustering, divided by
the square of the number of nodes in the graph. It says \it{roughly},
because the actual formula uses the quantity \bf{N}*(\bf{N-1}) wherever it
says square (of \bf{N}) above. A low/high area fraction indicates a
fine-grained/coarse clustering.}
\sec{}{OPTIONS}
\'begin{itemize}{\mcx_itemopts}
\item{\defopt{-o}{fname}{output file name}}
\item{\defopt{-pi}{f}{apply inflation beforehand}}
\car{
Apply inflation to the graph matrix and compute the performance
measures for the result.}
\shared_itemopt{-tf}
\car{shared_defopt{-tf}}
\items{
{\defopt{-cl-tree}{fname}{expect file with nested clusterings (cone format)}}
{\defopt{-cl-ceil}{<num>}{skip (nested) clusters of size exceeding <num>}}
}
\car{
The specified file should contain a hierarchy of nested
clusterings such as generated by \mclcm. The output is then
in a special format, undocumented but easy to understand.
Its purpose is to help cherrypick a single clustering
from a tree, in conjunction with the slightly experimental
and undocumented program \bf{mlmfifofum}.
}
\par{
The measure that is used is very slow to compute for large clusters, and
generally it will be outside any interesting range (i.e. it will be small).
Use \genopt{-cl-ceil} to skip clusters exceeding the specified size \-
\clminfo will directly proceed to subclusters if they exist.
}
\item{\defopt{-cat-max}{num}{do at most num levels}}
\car{
This only has effect when used with \genopt{-cl-tree}.
\clm{info} will start at the most fine-grained level, working upwards.
}
\items{
{\defopt{--node-all-measures}{dump node-wise criteria for all incident clusters}}
{\defopt{--node-self-measures}{dump node-wise criteria for native cluster}}
}
\car{
These options return a key-value based format, with the meaning of
the keys as follows.
}
\verbatim{\:/
nm file name (redundant unless multiple cluster files are provided)
ni node index
ci cluster index
nn number of neighbours of this node (constant for a give node)
nc cluster size (constant for a given cluster)
ef efficiency for this node/cluster combination
em max-efficiency for this node/cluster combination
mf mass fraction: percentage of edge weights for this node in this cluster
ma total mass of edge weights for this node in this cluster
xn number of neighbours of the node that are not in the cluster
xc number of nodes in the cluster that are not a neighbour of the node
ns number of neighbours of the node that are also in this cluster
ti the maximum of the edge weights for neighbours of this node that are in this cluster
to the maximum of the edge weights for neighbours of this node that are NOT in this cluster
al (alien) 1 if the node is not native to the cluster, 0 if the node is native}
\stddefopt
\end{itemize}
\sec{author}{AUTHOR}
\par{
Stijn van Dongen.}
\sec{seealso}{SEE ALSO}
\par{
\mysib{mclfamily} for an overview of all the documentation
and the utilities in the mcl family.}
\sec{references}{REFERENCES}
\par{
[1] Stijn van Dongen. \it{Performance criteria for graph clustering and Markov
cluster experiments}. Technical Report INS-R0012, National Research
Institute for Mathematics and Computer Science in the Netherlands,
Amsterdam, May 2000.\|
\httpref{http://www.cwi.nl/ftp/CWIreports/INS/INS-R0012.ps.Z}}
\end{pud::man}
|