1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183

% This file is part of the Stanford GraphBase (c) Stanford University 1993
@i boilerplate.w %<< legal stuff: PLEASE READ IT BEFORE MAKING ANY CHANGES!
\def\title{GB\_\,SORT}
@* Introduction. This short GraphBase module provides a simple utility
routine called gb_linksort, which is used in many of the other programs.
@p
#include <stdio.h> /* the \.{NULL} pointer (NULL) is defined here */
#include "gb_flip.h" /* we need to use the random number generator */
@h@#
@<Declarations@>@;
@<The gb_linksort routine@>
@ Most of the graphs obtained from GraphBase data are parameterized,
so that different effects can be obtained easily from the same
underlying body of information. In many cases the desired graph
is determined by selecting the ``heaviest'' vertices according to some
notion of ``weight,'' and/or by taking a random sample of vertices. For
example, the GraphBase routine words(n,wt_vector,wt_threshold,seed) creates
a graph based on the n most common fiveletter words of English, where
commonness is determined by a given weight vector. When several words have
equal weight, we want to choose between them at random. In particular, this
means that we can obtain a completely random choice of words if the weight
vector assigns the same weight to each word.
The gb_linksort routine is a convenient tool for this purpose. It takes a
given linked list of nodes and shuffles their link fields so that the
nodes can be read in decreasing order of weight, and so that equalweight
nodes appear in random order. {\sl Note: The random number generator of
{\sc GB\_\,FLIP} must be initialized before gb_linksort is called.}
The nodes sorted by gb_linksort can be records of any structure type,
provided only that the first field is `long key' and the second field
is `struct \\{this\_struct\_type} *link'. Further fields are not
examined. The node type defined in this section is the simplest possible
example of such a structure.
Sorting is done by means of the key fields, which must each contain
nonnegative integers less than $2^{31}$.
After sorting is complete, the data will appear in 128 linked lists:
gb_sorted[127], gb_sorted[126], \dots, gb_sorted[0]. To
examine the nodes in decreasing order of weight, one can read through
these lists with a routine such as
$$\vcenter{\halign{#\hfil\cr
{\cr
\quadint j;\cr
\quadnode *p;\cr
\noalign{\smallskip}
\quadfor (j=127; j>=0; j)\cr
\qquadfor (p=(node*)gb_sorted[j]; p; p=p>link)\cr
\qquad\qquad\\{look\_at}(p);\cr
}\cr}}$$
All nodes whose keys are in the range $j\cdot2^{24}\lekey<(j+1)\cdot2^{24}$
will appear in list gb_sorted[j]. Therefore the results will all be found
in the single list gb_sorted[0], if all the keys are strictly less
than~$2^{24}$.
@f node int
@<Declarations@>=
typedef struct node_struct {
long key; /* a numeric quantity, assumed nonnegative */
struct node_struct *link; /* the next node on a list */
} node; /* applications of gb_linksort may have other fields after link */
@ In the header file, gb_sorted is declared to be
an array of pointers to char, since
nodes may have different types in different applications. User programs
should cast gb_sorted to the appropriate type as in the example above.
@(gb_sort.h@>=
extern void gb_linksort(); /* procedure to sort a linked list */
extern char* gb_sorted[]; /* the results of gb_linksort */
@ Six passes of a radix sort, using radix 256, will accomplish the desired
objective rather quickly. (See, for example, Algorithm 5.2.5R in
{\sl Sorting and Searching}.) The first two passes use random numbers instead
of looking at the key fields, thereby effectively extending the keys
so that nodes with equal keys will appear in reasonably random order.
We move the nodes back and forth between two arrays of lists: the external
array gb_sorted and a private array called alt_sorted.
@<Declarations@>=
node *gb_sorted[256]; /* external bank of lists, for evennumbered passes */
static node *alt_sorted[256];
/* internal bank of lists, for oddnumbered passes */
@ So here we go with six passes over the data.
@<The gb_linksort routine@>=
void gb_linksort(l)
node *l;
{@+register long k; /* index to destination list */
register node **pp; /* current place in list of pointers */
register node *p, *q; /* pointers for list manipulation */
@<Partition the given list into 256 random sublists alt_sorted@>;
@<Partition the alt_sorted lists into 256 random sublists gb_sorted@>;
@<Partition the gb_sorted lists into alt_sorted by loworder byte@>;
@<Partition the alt_sorted lists into gb_sorted by secondlowest byte@>;
@<Partition the gb_sorted lists into alt_sorted by secondhighest byte@>;
@<Partition the alt_sorted lists into gb_sorted by highorder byte@>;
}
@ @<Partition the given list into 256 random sublists alt_sorted@>=
for (pp=alt_sorted+255; pp>=alt_sorted; pp) *pp=NULL;
/* empty all the destination lists */
for (p=l; p; p=q) {
k=gb_next_rand() >> 23; /* extract the eight most significant bits */
q=p>link;
p>link=alt_sorted[k];
alt_sorted[k]=p;
}
@ @<Partition the alt_sorted lists into 256 random sublists gb_sorted@>=
for (pp=gb_sorted+255; pp>=gb_sorted; pp) *pp=NULL;
/* empty all the destination lists */
for (pp=alt_sorted+255; pp>=alt_sorted; pp)
for (p=*pp; p; p=q) {
k=gb_next_rand() >> 23; /* extract the eight most significant bits */
q=p>link;
p>link=gb_sorted[k];
gb_sorted[k]=p;
}
@ @<Partition the gb_sorted lists into alt_sorted by loworder byte@>=
for (pp=alt_sorted+255; pp>=alt_sorted; pp) *pp=NULL;
/* empty all the destination lists */
for (pp=gb_sorted+255; pp>=gb_sorted; pp)
for (p=*pp; p; p=q) {
k=p>key & 0xff; /* extract the eight least significant bits */
q=p>link;
p>link=alt_sorted[k];
alt_sorted[k]=p;
}
@ Here we must read from alt_sorted from 0 to 255, not from 255 to 0,
to get the desired final order. (Each pass reverses the order of the lists;
it's tricky, but it works.)
@<Partition the alt_sorted lists into gb_sorted by secondlowest byte@>=
for (pp=gb_sorted+255; pp>=gb_sorted; pp) *pp=NULL;
/* empty all the destination lists */
for (pp=alt_sorted; pp<alt_sorted+256; pp++)
for (p=*pp; p; p=q) {
k=(p>key >> 8) & 0xff; /* extract the next eight bits */
q=p>link;
p>link=gb_sorted[k];
gb_sorted[k]=p;
}
@ @<Partition the gb_sorted lists into alt_sorted by secondhighest byte@>=
for (pp=alt_sorted+255; pp>=alt_sorted; pp) *pp=NULL;
/* empty all the destination lists */
for (pp=gb_sorted+255; pp>=gb_sorted; pp)
for (p=*pp; p; p=q) {
k=(p>key >> 16) & 0xff; /* extract the next eight bits */
q=p>link;
p>link=alt_sorted[k];
alt_sorted[k]=p;
}
@ The most significant bits will lie between 0 and 127, because we assumed
that the keys are nonnegative and less than $2^{31}$. (A similar routine
would be able to sort signed integers, or unsigned long integers, but
the \CEE/ code would not then be portable.)
@<Partition the alt_sorted lists into gb_sorted by highorder byte@>=
for (pp=gb_sorted+255; pp>=gb_sorted; pp) *pp=NULL;
/* empty all the destination lists */
for (pp=alt_sorted; pp<alt_sorted+256; pp++)
for (p=*pp; p; p=q) {
k=(p>key >> 24) & 0xff; /* extract the most significant bits */
q=p>link;
p>link=gb_sorted[k];
gb_sorted[k]=p;
}
@* Index. Here is a list that shows where the identifiers of this program are
defined and used.
