1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353
|
<?xml version="1.0" encoding="ISO-8859-1"?>
<!--
-
- This file is part of the OpenLink Software Virtuoso Open-Source (VOS)
- project.
-
- Copyright (C) 1998-2018 OpenLink Software
-
- This project is free software; you can redistribute it and/or modify it
- under the terms of the GNU General Public License as published by the
- Free Software Foundation; only version 2 of the License, dated June 1991.
-
- This program is distributed in the hope that it will be useful, but
- WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- General Public License for more details.
-
- You should have received a copy of the GNU General Public License along
- with this program; if not, write to the Free Software Foundation, Inc.,
- 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
-
-
-->
<sect1 id="sqlinverse"><title>SQL Inverse Functions</title>
<para>Many virtual database application scenarios require performing various
conversions on data in legacy tables. An example of this is currency conversions,
conversions between British and metric units, replacing complex composite keys with single numeric row id's and so forth.</para>
<para>SQL views are the normal way of implementing such conversions. However,
when making queries with selection criteria referencing the results of these
conversions we often need to first convert and then test, possibly having to
bring data from the source system to the virtual database. Further, we
will not be able to use indices that may exist on the legacy system if
we first have to convert and only then test the value of a column.</para>
<para>Virtuoso introduces an SQL extension for dealing with these issues.
Virtuoso allows the definition of two SQL functions that are inverses of each other,
for example <function>dollar_to_euro</function> and <function>euro_to_dollar</function> would be examples of inverse functions.</para>
<para>Thus, when we define a view like:</para>
<programlisting>create view euro_item as Select I_id, dollar_to_euro (I_price) as I_price, I_name from item;</programlisting>
<para>We can further declare:</para>
<programlisting>db..sinv_create_inverse ('euro_to_dollar', 'dollar_to_euro', 1);</programlisting>
<para>and perform the query:</para>
<programlisting>select * from euro_item where I_price > 100;</programlisting>
<para>Normally, this would internally perform the query:</para>
<programlisting>select * from item where dollar_to_euro (I_price) > 100;</programlisting>
<para>Now however it will make use of the inverse function information and we get:</para>
<programlisting>select * from item where I_price > euro_to_dollar (100)</programlisting>
<para>The second form is substantially more efficient since only one function
call needs to be done for the whole query, not to mention the fact that if the
currency conversion function were only defined on the vdb and the item table
were linked from another server, all data would have to be brought over to
the vdb for the conversion.</para>
<para>Now if the conversion function were defined only on the vdb, the amount
of euros would be converted to dollars and then passed as a parameter to</para>
<programlisting>Select * from item where I_price > ?</programlisting>
<para>Many simple conversions can be efficiently handled by this method.
For most unit and data type conversions, the conversion preserves collation
order. Amount a will be greater than amount b whether a and b be expressed
in euros or dollars, as long as both a and b are in the same currency.
Some other conversions such as compression, encryption, hashes and mappings
of arbitrary unique ids to another set of ids do not preserve collation.
Whether a particular mapping preserves collation is indicated by the third
argument of <link linkend="fn_sinv_create_inverse"><function>sinv_create_inverse</function></link>. In the above example we see a value of 1,
meaning that collation is preserved. Zero specifies that collation is not preserved.</para>
<para>Sometimes it will be desirable to map multi-part keys into a single key for
convenience of querying or for compatibility with another database schema.</para>
<para>We will use the orders and customer tables from the TPC C benchmark schema
to illustrate the feature. Both tables have a primary key consisting of three parts,
a warehouse id, a district id and then a customer or order id. The order references
the customer with a foreign key consisting of its own warehouse and district id
and of a customer id.</para>
<para>Now we may wish to simplify the situation and make a one-column order id and a
one column foreign key reference to customer for presenting the system to an outside
reporting application. We will do this completely without touching the original database,
all logic taking place on the virtual database.</para>
<para>In this case, the original keys consist of three numbers: n digits of warehouse
id, 2 digits of district id and 9 digits of order or customer id. We can simply concatenate
these numbers into a single decimal. To do this, we define the following functions:</para>
<programlisting>create function num_truncate (in n numeric)
{
return cast (cast (n as numeric) - 0.5 as numeric (40, 0));
}
create function num_mod (in x numeric, in y numeric)
{
return (x - num_truncate (cast (x as numeric) / y) * y);
}
/* This function takes the three key parts and returns a numeric where each occupies a fixed range of digits. */
create function o_iid (in w_id int, in d_id int, in o_id int) returns numeric
{
return (w_id * 100000000000 + d_id * 1000000000 + o_id);
}
/* The below three functions retrieve each of the fields encoded in the numeric produced by o_iid */
create function o_iid_w_id (in iid numeric) returns int
{
return cast (num_truncate (iid / 100000000000) as int);
}
create function o_iid_d_id (in iid numeric) returns int
{
return cast (num_mod (num_truncate (iid / 1000000000), 100) as int);
}
create function o_iid_o_id (in iid numeric)
{
return cast (num_mod (iid, 1000000000) as int);
}
db..sinv_create_inverse ('O_IID', VECTOR ('O_IID_W_ID', 'O_IID_D_ID', 'O_IID_O_ID'), 0);</programlisting>
<para>The <link linkend="fn_sinv_create_inverse"><function>sinv_create_inverse</function></link> now defines an accessory for each of the arguments
of o_iid. The first function in the vector retrieves the first argument of a call
to o_iid, the second the second and so forth. Since we are talking about mapping
a set of values into one value it does not make sense to speak of preserving a collation,
thus the last argument is 0. When making this declaration we also assert that for each
distinct combination of arguments of o_iid, we get a distinct value from which all the
original arguments can be retrieved using the functions mentioned.</para>
<para>Now we define the same functions for constructing a synthetic one-part customer id.
The functions are exactly the same as for the order id.</para>
<programlisting>create function c_iid (in w_id int, in d_id int, in c_id int) returns numeric
{
return (w_id * 100000000000 + d_id * 1000000000 + c_id);
}
create function c_iid_w_id (in iid numeric) returns int
{
return cast (num_truncate (iid / 100000000000) as int);
}
create function c_iid_d_id (in iid numeric) returns int
{
return cast (num_mod (num_truncate (iid / 1000000000), 100) as int);
}
create function c_iid_c_id (in iid numeric)
{
return cast (num_mod (iid, 1000000000) as int);
}
db..sinv_create_inverse ('C_IID', VECTOR ('C_IID_W_ID', 'C_IID_D_ID', 'C_IID_C_ID'), 0);</programlisting>
<para>Now we have defined functions for producing the synthetic keys we wish to use.
Now we bind these functions to the original database using views. We assume that the
original orders and customer tables are attached from a remote database under the
names orders and customer.</para>
<programlisting>create view orders_2 as select o_iid (o_w_id, o_d_id, o_id) as o_iid numeric, c_iid (o_w_id, o_d_id, o_c_id) as o_c_iid numeric, * from orders;
create view customer_2 as select c_iid (c_w_id, c_d_id, c_id) as c_iid numeric, * from customer;</programlisting>
<para>As a simple test, let us get the synthetic o_c_iid from an order using the synthetic o_iid key.</para>
<screen>
SQL> select o_c_iid from orders_2 where o_iid = 102000000002;
returns________
102000000935
</screen>
<para>The order #2 in district #2 of warehouse #1 refers to customer #935 or district #2 of warehouse #1.</para>
<screen>
<![CDATA[
SQL> explain ('select o_c_iid from orders_2 where o_iid = 102000000002');
returns
{
Precode:
0: $25 "callret" := Call O_IID_O_ID (<constant (102000000002)>)
7: $26 "callret" := Call O_IID_D_ID (<constant (102000000002)>)
14: $27 "callret" := Call O_IID_W_ID (<constant (102000000002)>)
21: BReturn 0
Remote SELECT "t2"."O_C_ID", "t2"."O_D_ID", "t2"."O_W_ID" FROM "DBA"."ORDERS" "t2" where "t2"."O_ID" = ? and "t2"."O_D_ID" = ? and "t2"."O_W_ID" = ?
Params ($25 "callret", $26 "callret", $27 "callret")
Output ($29 "t1.O_C_ID", $30 "t1.O_D_ID", $31 "t1.O_W_ID")
After code:
0: $40 "O_C_IID" := Call C_IID ($31 "t1.O_W_ID", $30 "t1.O_D_ID", $29 "t1.O_C_ID")
7: BReturn 0
Select ($40 "O_C_IID")
}]]>
</screen>
<para>In this query execution plan we see that the virtual database first decomposes
the synthetic o_iid into its component parts, passes these to a query run on the
remote database, retrieves the components of the synthetic o_c_iid and finally
assembles this and returns it to the client.</para>
<para>We may also use this mapping in more complex queries:</para>
<programlisting>select count (*) from orders_2, customer_2 where o_c_iid = c_iid;</programlisting>
<para>This uses the synthetic foreign and primary keys for joining. Normally,
since the functions for making these keys are only defined on the vdb side,
each order would have to be retrieved and then the corresponding customer fetched.
Now however we can take advantage of the inverse declaration and decompose the
comparison of the synthetic keys into comparisons of each of their arguments:</para>
<screen>
<![CDATA[
SQL> explain ('select count (*) from orders_2, customer_2 where o_c_iid = c_iid');
returns
{
Remote SELECT COUNT ( *) FROM "DBA"."ORDERS" "t2" , "DBA"."CUSTOMER" "t4" where "t4"."C_ID" = "t2"."O_C_ID" and "t4"."C_D_ID" = "t2"."O_D_ID" and "t4"."C_W_ID" = "t2"."O_W_ID"
Output ($26 "aggregate")
Select ($26 "aggregate")
}]]>
</screen>
<para>We see that the SQL compiler decomposed the comparison of the c_iid's into
a comparison of the original arguments of the function. This can be done because
we earlier asserted that each distinct argument combination would produce a
unique result of the function.</para>
<para>The examples covered thus far are simple in that there is a clear way of mapping
the data back and forth using SQL functions. This is not always possible, for example
if we wish to generate unique numeric id's from keys consisting of arbitrary strings.</para>
<para>To this effect, Virtuoso provides a mechanism for automatically generating mapping
functions, which take a combination of n arguments of an arbitrary searchable data type
and generate a unique id for each distinct combination. The ids are assigned as needed
and the mapping is persisted in an automatically generated table.</para>
<para>We could have generated the o_iid, o_iid_w_id, o_iid_d_id and o_iid_o_id functions with the following single call:</para>
<programlisting>db..sinv_create_key_mapping ('O_IID', vector ('W_ID', 'int', 'D_ID', 'int', 'O_ID', 'int'));</programlisting>
<para>The first argument is the name of the mapping function to generate. The following
vector lists the arguments of this function and their data types. The name of each argument
is appended to the name of the mapping function, separated by an underscore for forming the
inverse functions, one for each argument. Thus the functions are named exactly as in the
previous example where we defined them manually.</para>
<para>Internally, this generates a table for keeping the w_id, d_id and o_id for each
allocated o_iid. The o_iid's come from a sequence object. The functions manage this
table automatically, without requiring developer intervention. The name of the
sequence is the same as that of the mapping, thus sequence_set can be used for
setting the sequence counter. Note that sequence names are case sensitive.</para>
<para>When a new combination of w_id, d_id, o_id is seen by o_iid, it inserts a
row in the mapping table and assigns this a new unique number. This update of
the mapping table is committed at the same time the enclosing transaction commits.
The table is normally read with read committed isolation and updated with serializable
isolation. This guarantees that when a single w_id, d_id, o_id combination is seen at
the same time on two threads the result will be the same id being given on both threads.</para>
<para>Any value returned by the mapping function may at all times be used for the inverse
functions. The inverse functions will return NULL if given a value that was at no time
returned by the mapping function.</para>
<para>The table created for the mapping is named MAP_<name of the mapping function>.
The system does not define an automatic cleanup function for removing established
mappings because the efficient way of implementing this is quite application dependent.
Such a function can however be easily written by the developer if one is needed.</para>
<para>Because comparison of mapping functions is reduced into pair-wise comparison of their arguments, the query</para>
<programlisting>select count (*) from orders, customer where o_c_iid = c_iid</programlisting>
<para>Will work even if all orders or customers have not been assigned a mapping on the vdb
side. If a virtual database application wishes to retrieve rows from a remote database using
a synthetic key made by functions defined with <link linkend="fn_sinv_create_key_mapping"><function>sinv_create_key_mapping</function></link>, any value returned by
the mapping function in a transaction that was successfully committed is valid and will
retrieve the data intended.</para>
<sect2 id="updthoughinverses"><title>Updating through Inverses</title>
<para>A view that selects calls to functions, which have inverses, is updateable.
This means that the SQL compiler will use the inverse function on the value being
assigned before assigning the column. This is done for both insert and update
operations. No special declaration is needed. If the function with inverse
has multiple arguments, all of which are columns, then assigning the function
will assign all the argument columns, having called the appropriate inverse for
getting each of the separate argument values.</para>
<para>For example:</para>
<screen>
SQL> insert into orders_2 (o_iid) values (1102000000003);
Done. -- 1 msec.
SQL> select * from orders_2 where o_w_id = 11;
O_IID O_C_IID O_ID O_D_ID O_W_ID O_C_ID O_ENTRY_D O_CARRIER_ID O_OL_CNT O_ALL_LOCAL
DECIMAL DECIMAL INTEGER NOT NULL INTEGER NOT NULL INTEGER NOT NULL INTEGER DATE INTEGER INTEGER INTEGER
_______________________________________________________________________________
1102000000003 NULL 3 2 11 NULL NULL NULL NULL NULL
1 Rows. -- 1 msec.
SQL> update item_euro set i_price = 120 where i_id = 1234;
</screen>
</sect2>
<note>
<title>Conclusions</title>
<para>The examples in this document used tables attached from remote databases
because the features discussed here are most likely to be useful in such contexts.
Also the query execution plans clearly show which operations take place where. The
inverse function mechanism is however in no way limited to virtual database applications.
All the examples will work equally well with local tables.</para>
</note>
</sect1>
|