1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Draft//EN">
<HTML>
<HEAD>
<TITLE>NoSQL: Extracting information from rdbtables</TITLE>
</HEAD>
<BODY>
<A HREF="NoSQL-3.html">Previous</A>
<A HREF="NoSQL-5.html">Next</A>
<A HREF="NoSQL.html#toc4">Contents</A>
<HR>
<H2><A NAME="s4">4. Extracting information from rdbtables</A> </H2>
<H2><A NAME="ss4.1">4.1 General</A>
</H2>
<P>The following shows some examples of how the system is
usually used, which involves a combinations of operators.
Using the rdbtable named 'sample' the command:
<P>
<BLOCKQUOTE><CODE>
<PRE>
nsq-col NAME OTHER TYP AMT < sample | nsq-sort TYP AMT | nsq-pr
</PRE>
</CODE></BLOCKQUOTE>
<P>gives the output:
<P>
<BLOCKQUOTE><CODE>
<PRE>
NAME OTHER TYP AMT
------ -------- ---- -----
Hansen One A 23
Bush Another A 133
Perry And B 244
Hart So D 1111
Holmes On D 1111
Jones Here X 77
</PRE>
</CODE></BLOCKQUOTE>
<P>Note that columns COUNT and RIGHT were excluded by the
'nsq-col' operator, and that the order of the selected columns
was changed from that in the rdbtable. Of course to save
the output in a file, (redirection of STDOUT into a file)
something like the following is used:
<P>
<BLOCKQUOTE><CODE>
<PRE>
nsq-col ... < sample | nsq-sort ... | nsq-pr > file.out
</PRE>
</CODE></BLOCKQUOTE>
<P>An example using the operator 'nsq-row' on the rdbtable sample is:
<P>
<BLOCKQUOTE><CODE>
<PRE>
nsq-row AMT lt 900 < sample | nsq-col NAME AMT TYP RIGHT |\
nsq-sort l NAME | nsq-pr
</PRE>
</CODE></BLOCKQUOTE>
<P>Note that the '\' character at the end of the first line
of the above command is the signal to the UNIX shell
that the command is continued on the next line. Here we
select some rows using 'nsq-row', select some columns using
'nsq-col', sort what we have with 'nsq-sort', and print with
'nsq-pr'. The output is:
<P>
<BLOCKQUOTE><CODE>
<PRE>
NAME AMT TYP RIGHT
------ ----- ---- --------
Bush 133 A This
Hansen 23 A Is
Jones 77 X On
Perry 244 B The
</PRE>
</CODE></BLOCKQUOTE>
<P>
<H2><A NAME="ss4.2">4.2 A real world problem</A>
</H2>
<P>The following shell script shows how the NoSQL operators and
other UNIX utilities can be fitted together to solve a real
world problem. The problem was to find out if the rows in a
large rdbtable were unique over four columns. Since 'nsq-summ'
will tell us whether the rows of an rdbtable are unique on
a single column, we need to construct a temporary tdbtable.
<P>
To illustrate the solution on a small rdbtable, the script
below works on an rdbtable like 'sample' but with some
rows repeated. First the script selects four columns
and adds a dummy column named 'uniq' (using 'nsq-col').
It then puts the combined values of the four columns into
the dummy column (using 'nsq-compute'). Next it examines the
value of the dummy column 'uniq' for uniquness (using
'nsq-summ') and then uses the UNIX command 'egrep' to show
only the lines of interest, e.g. those lines that start
with something other than a '1'.
<P>
<BLOCKQUOTE><CODE>
<PRE>
nsq-col < table.rdb Unit Day Time MSN -a uniq 12 |\
nsq-compute uniq = Unit . Day . Time . MSN |\
nsq-summ -cuu uniq |\
egrep -v '^ *1'
</PRE>
</CODE></BLOCKQUOTE>
<P>The output was like the following:
<P>
<BLOCKQUOTE><CODE>
<PRE>
Rows: 9
Unique values for uniq: 6
2 Jones77X77
3 Perry77B244
</PRE>
</CODE></BLOCKQUOTE>
<P>meaning that there were (in this example) two rows that
had duplicates over the four columns with one set of
values, and three rows that had duplicate with another
set of values.
<P>
<H2><A NAME="ss4.3">4.3 Another real world problem</A>
</H2>
<P>This next one is a bit more complicated although very
useful, and it does demonstrate the use of 'nsq-compute' using
a newly created column. The idea is to make a summary of
the data in rdbtable sample3 (from the section on 'nsq-ed').
The command is:
<P>
<BLOCKQUOTE><CODE>
<PRE>
nsq-col name datatype -a nr 2 < sample3 |\
nsq-compute nr = '++$x' |\
nsq-sort datatype name | nsq-report sum.frm
</PRE>
</CODE></BLOCKQUOTE>
<P>Note that 'nsq-col' selects the two columns on which to
make the summary and adds a new (null) column 'nr'. Then
'nsq-compute' puts data into the new column by using the
PERL expression shown. The expression '++$x' merely
increments itself by one each time it is evaluated, an
easy way to get an increasing number. Finally 'nsq-sort'
sorts the newly created rdbtable and it is then printed
with 'nsq-report' using the form file 'sum.frm'.
<P>
<BLOCKQUOTE><CODE>
<PRE>
Table 19
FORM FILE (SUM.FRM)
format top =
@<<<<<<<<<<<<<<<<<<<<<<<<<<<
_date_
Sample Three Database Summary by Datatype
Datatype Nr Name
-------- -- -----------------------------------------
.
format =
@>>>>>>> @> @<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
datatype nr name .
Table 20
DATABASE SUMMARY EXAMPLE
Wed Dec 4 21:23:20 PST 1991
Sample Three Database Summary by Datatype
Datatype Nr Name
-------- -- -----------------------------------------
ABC 5 ABC Duplicate
ABC 4 ABC Original
ABC 8 LAF-S Air Floating Model
ATO 3 ATO (Air Tasking Orders) Original
ATO 6 ATO Number 2222-11
BDA 1 ACAS (Air Combat Assessment)
BDA 2 ACAS (Air Combat Assessment) BDA Sorties
BDA 7 BDA Supplement number 17
</PRE>
</CODE></BLOCKQUOTE>
<P>The form file is shown in Table 19 and the output in Table 20.
<P>
<HR>
<A HREF="NoSQL-3.html">Previous</A>
<A HREF="NoSQL-5.html">Next</A>
<A HREF="NoSQL.html#toc4">Contents</A>
</BODY>
</HTML>
|