File: NoSQL-4.html

package info (click to toggle)
nosql 0.9-0
  • links: PTS
  • area: main
  • in suites: hamm
  • size: 1,364 kB
  • ctags: 225
  • sloc: perl: 3,766; sh: 476; makefile: 41
file content (213 lines) | stat: -rw-r--r-- 6,615 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Draft//EN">
<HTML>
<HEAD>
<TITLE>NoSQL: Extracting information from rdbtables</TITLE>
</HEAD>
<BODY>
<A HREF="NoSQL-3.html">Previous</A>
<A HREF="NoSQL-5.html">Next</A>
<A HREF="NoSQL.html#toc4">Contents</A>
<HR>
<H2><A NAME="s4">4. Extracting information from rdbtables</A>  </H2>

<H2><A NAME="ss4.1">4.1 General</A>
        </H2>

<P>The following shows some examples of how the system is
usually used, which involves a combinations of operators.
Using the rdbtable named 'sample' the command:
<P>
<BLOCKQUOTE><CODE>
<PRE>
      nsq-col NAME OTHER TYP AMT &lt; sample | nsq-sort TYP AMT | nsq-pr
        
</PRE>
</CODE></BLOCKQUOTE>
<P>gives the output:
<P>
<BLOCKQUOTE><CODE>
<PRE>
    
      NAME    OTHER     TYP     AMT
      ------  --------  ----  -----
      Hansen  One       A        23
      Bush    Another   A       133
      Perry   And       B       244
      Hart    So        D      1111
      Holmes  On        D      1111
      Jones   Here      X        77
    
        
</PRE>
</CODE></BLOCKQUOTE>
<P>Note that columns COUNT and RIGHT were excluded by the
'nsq-col' operator, and that the order of the selected columns
was changed from that in the rdbtable.  Of course to save
the output in a file, (redirection of STDOUT into a file)
something like the following is used:
<P>
<BLOCKQUOTE><CODE>
<PRE>
      nsq-col ... &lt; sample | nsq-sort ... | nsq-pr &gt; file.out
        
</PRE>
</CODE></BLOCKQUOTE>
<P>An example using the operator 'nsq-row' on the rdbtable sample is:
<P>
<BLOCKQUOTE><CODE>
<PRE>
      nsq-row AMT lt 900 &lt; sample | nsq-col NAME AMT TYP RIGHT |\
            nsq-sort l NAME | nsq-pr
    
</PRE>
</CODE></BLOCKQUOTE>
<P>Note that the '\' character at the end of the first line
of the above command is the signal to the UNIX shell
that the command is continued on the next line. Here we
select some rows using 'nsq-row', select some columns using
'nsq-col', sort what we have with 'nsq-sort', and print with
'nsq-pr'. The output is:
<P>
<BLOCKQUOTE><CODE>
<PRE>

      NAME      AMT  TYP      RIGHT
      ------  -----  ----  --------
      Bush      133  A         This
      Hansen     23  A           Is
      Jones      77  X           On
      Perry     244  B          The
    
        
</PRE>
</CODE></BLOCKQUOTE>
<P>
<H2><A NAME="ss4.2">4.2 A real world problem</A>
        </H2>

<P>The following shell script shows how the NoSQL operators and
other UNIX utilities can be fitted together to solve a real
world problem. The problem was to find out if the rows in a
large rdbtable were unique over four columns. Since 'nsq-summ'
will tell us whether the rows of an rdbtable are unique on
a single column, we need to construct a temporary tdbtable.
<P>   
To illustrate the solution on a small rdbtable, the script
below works on an rdbtable like 'sample' but with some
rows repeated.  First the script selects four columns
and adds a dummy column named 'uniq' (using 'nsq-col').
It then puts the combined values of the four columns into
the dummy column (using 'nsq-compute').  Next it examines the
value of the dummy column 'uniq' for uniquness (using
'nsq-summ') and then uses the UNIX command 'egrep' to show
only the lines of interest, e.g. those lines that start
with something other than a '1'.
<P>
<BLOCKQUOTE><CODE>
<PRE>

      nsq-col &lt; table.rdb  Unit  Day  Time  MSN  -a uniq 12 |\
          nsq-compute uniq = Unit . Day . Time . MSN |\
          nsq-summ  -cuu  uniq  |\
          egrep -v  '^  *1'

        
</PRE>
</CODE></BLOCKQUOTE>
<P>The output was like the following:
<P>
<BLOCKQUOTE><CODE>
<PRE>

      Rows: 9
      Unique values for uniq: 6
             2 Jones77X77 
             3 Perry77B244
    
        
</PRE>
</CODE></BLOCKQUOTE>
<P>meaning that there were (in this example) two rows that
had duplicates over the four columns with one set of
values, and three rows that had duplicate with another
set of values.
<P>
<H2><A NAME="ss4.3">4.3 Another real world problem</A>
    </H2>

<P>This next one is a bit more complicated although very
useful, and it does demonstrate the use of 'nsq-compute' using
a newly created column. The idea is to make a summary of
the data in rdbtable sample3 (from the section on 'nsq-ed').
The command is:
<P>
<BLOCKQUOTE><CODE>
<PRE>

      nsq-col name datatype -a nr 2 &lt; sample3 |\
            nsq-compute nr = '++$x' |\
            nsq-sort datatype name | nsq-report sum.frm
    
        
</PRE>
</CODE></BLOCKQUOTE>
<P>Note that 'nsq-col' selects the two columns on which to
make the summary and adds a new (null) column 'nr'. Then
'nsq-compute' puts data into the new column by using the
PERL expression shown. The expression '++$x' merely
increments itself by one each time it is evaluated, an
easy way to get an increasing number. Finally 'nsq-sort'
sorts the newly created rdbtable and it is then printed
with 'nsq-report' using the form file 'sum.frm'.
<P>
<BLOCKQUOTE><CODE>
<PRE>

                               Table 19
  
                          FORM FILE (SUM.FRM)
  
      format top = 
                                                    @&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;
                                                      _date_
                  Sample Three Database Summary by Datatype
      
             Datatype  Nr  Name
             --------  --  -----------------------------------------
      .
      format =
             @&gt;&gt;&gt;&gt;&gt;&gt;&gt;  @&gt;  @&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;
             datatype  nr  name .
      


                               Table 20

                        DATABASE SUMMARY EXAMPLE
      
                                        Wed Dec  4 21:23:20 PST 1991
      Sample Three Database Summary by Datatype
      
      Datatype  Nr  Name
      --------  --  -----------------------------------------
      
      ABC   5  ABC Duplicate
      ABC   4  ABC Original
      ABC   8  LAF-S Air Floating Model
      ATO   3  ATO (Air Tasking Orders) Original
      ATO   6  ATO Number 2222-11
      BDA   1  ACAS (Air Combat Assessment)
      BDA   2  ACAS (Air Combat Assessment) BDA Sorties
      BDA   7  BDA Supplement number 17
    
    
</PRE>
</CODE></BLOCKQUOTE>
<P>The form file is shown in Table 19 and the output in Table 20.
<P>
<HR>
<A HREF="NoSQL-3.html">Previous</A>
<A HREF="NoSQL-5.html">Next</A>
<A HREF="NoSQL.html#toc4">Contents</A>
</BODY>
</HTML>