1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775
|
<?xml version="1.0" encoding="ISO-8859-1"?>
<!--
-
- This file is part of the OpenLink Software Virtuoso Open-Source (VOS)
- project.
-
- Copyright (C) 1998-2018 OpenLink Software
-
- This project is free software; you can redistribute it and/or modify it
- under the terms of the GNU General Public License as published by the
- Free Software Foundation; only version 2 of the License, dated June 1991.
-
- This program is distributed in the hope that it will be useful, but
- WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- General Public License for more details.
-
- You should have received a copy of the GNU General Public License along
- with this program; if not, write to the Free Software Foundation, Inc.,
- 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
-
-
-->
<sect2 id="vdbconcepts"><title>Virtual Database</title>
<!-- ======================================== -->
<sect3 id="linkrmttableauto">
<title>Linking Remote Tables & Views</title>
<para>The Virtuoso Server supports linking in of tables, views, procedures and
other components from a remote ODBC data-source, enabling them to be
queried as native objects in Virtuoso; this process is called
"attaching" or "linking". The easiest way to link to an external table is
to use the <link linkend="remotetables">Linking Remote Tables Wizard</link>,
part of the Visual Server Administration Interface. Alternatively you can
attach these objects programmatically, as this section explains;
finally you can attach tables manually - see <link linkend="MANSETRDS">Manually
Setting Up A Remote Data Source</link> which is useful for connections to
less-capable ODBC data-sources.</para>
<sect4 id="ATTACH">
<title>ATTACH TABLE Statement</title>
<programlisting><![CDATA[
ATTACH TABLE <table> [PRIMARY KEY '(' <column> [, ...] ')']
[AS <local_name>] FROM <dsn> [USER <uid> PASSWORD <pwd>]
[ON SELECT] [REMOTE AS <literal_table_name>]
]]></programlisting>
<variablelist>
<varlistentry>
<term>table</term>
<listitem><para>Adequately qualified table name of the form: identifier | identifier.identifier | identifier.identifier.identifier | identifier..identifier</para></listitem>
</varlistentry>
<varlistentry>
<term>column</term>
<listitem><para>column to assume primary key</para></listitem>
</varlistentry>
<varlistentry>
<term>local_name</term>
<listitem><para>fully qualified table name specifying local reference.</para></listitem>
</varlistentry>
<varlistentry>
<term>dsn</term>
<listitem><para>scalar_exp</para></listitem>
</varlistentry>
<varlistentry>
<term>user</term>
<listitem><para>scalar_exp</para></listitem>
</varlistentry>
<varlistentry>
<term>password</term>
<listitem><para>scalar_exp</para></listitem>
</varlistentry>
<varlistentry>
<term>literal_table_name</term>
<listitem><para>scalar_exp</para></listitem>
</varlistentry>
</variablelist>
<para>This SQL statement defines a remote data source, copies the schema
information of a given table to the local database and defines the table as a
remote table residing on the data source in question.
</para>
<para>
The table is a designation of the table's name on the remote data source dsn. It
may consist of an optional qualifier, optional owner and table names, separated
by dots. This must identify exactly one table on the remote dsn. The optional
local_name is an optionally qualified table name which will identify the table on
the local database. If qualifier or owner are omitted, these default to the current
qualifier 'dbname()' and the logged in user, as with CREATE TABLE. If the
local_name is not given it defaults to the <current qualifier>.<dsn>.<table name on
dsn>. The <dsn> will be the dsn with all alphabetic characters in upper case and
all non-alphanumeric characters replaced by underscores. The <table name on
dsn> will be the exact name as it is on the remote dsn, case unmodified.
</para>
<para>
The PRIMARY KEY option is only required for attaching views or tables where the primary key on the
remote table cannot be ascertained directly from the remote data source.</para>
<para>
If a dsn is not previously defined with vd_remote_data_source or ATTACH
TABLE, the USER and PASSWORD clauses have to be given.
</para>
<para>The <emphasis>REMOTE AS</emphasis> option allows you to provide a
string literal for referencing the remote table. This is useful when linking
tables from sources that do not support three-part qualification correctly.</para>
</sect4>
<sect4 id="attachview">
<title>Attaching views</title>
<para>
A view can be attached as a table if a set of uniquely identifying
columns is supplied.
</para>
<para>
This is done with the PRIMARY KEY option to ATTACH TABLE as follows:
</para>
<programlisting>
attach table T1_VIEW primary key (ROW_NO) from 'somedsn';
</programlisting>
<note>
<title>Note:</title>
<para>Views cannot be attached unless the PRIMARY KEY options is used.</para>
</note>
</sect4>
<sect4 id="linkrmttableautoexamples">
<title>Examples for Linking Remote Tables into OpenLink Virtuoso</title>
<itemizedlist mark="bullet">
<listitem><link linkend="vdbenginemclink">Oracle</link></listitem>
<listitem><link linkend="vdbengineprlnk">Progress</link></listitem>
<listitem><link linkend="vdbengineinglink">Ingres</link></listitem>
<listitem><link linkend="vdbengineinflink">IBM Informix</link></listitem>
<listitem><link linkend="vdbenginedb2link">IBM DB2</link></listitem>
<listitem><link linkend="vdbenginesyblink">Sybase</link></listitem>
<listitem><link linkend="vdbenginemsqllink">MySQL</link></listitem>
<listitem><link linkend="vdbenginepsqllink">PostgreSQL</link></listitem>
<listitem><link linkend="vdbengineodtjdlink">JDBC</link></listitem>
<listitem><link linkend="vdbengineodbcodlink">ODBC to ODBC</link></listitem>
<listitem><link linkend="vdbenginefirebdlink">Firebird</link></listitem>
<listitem><link linkend="vdbenginemsqlslink">Microsoft SQL Server</link></listitem>
</itemizedlist>
</sect4>
</sect3>
<!-- ======================================== -->
<sect3 id="linkrmtprocauto">
<title>Linking Remote Procedures</title>
<sect4 id="ATTACHPROC">
<title>ATTACH PROCEDURE Statement</title>
<programlisting><![CDATA[
ATTACH (PROCEDURE|FUNCTION) <proc_name> ([<parameter1>[,<parameter2>[...]]])
[ RETURNS <rettype> ] [AS <local_name>] FROM <dsn>
]]></programlisting>
<variablelist>
<varlistentry>
<term>dsn</term>
<listitem>
<para>scalar_exp</para>
</listitem>
</varlistentry>
<varlistentry>
<term>proc_name</term>
<listitem>
<para>identifier | identifier.identifier | identifier.identifier.identifier | identifier..identifier</para>
</listitem>
</varlistentry>
<varlistentry>
<term>parameter1..parameterN</term>
<listitem>
<para>parameters declaration (as in CREATE PROCEDURE)</para>
</listitem>
</varlistentry>
<varlistentry>
<term>local_name</term>
<listitem>
<para>table</para>
</listitem>
</varlistentry>
</variablelist>
<para>The ATTACH PROCEDURE statement allows you to associate stored procedures
from remote datasources with Virtuoso so they can be used as if they were
defined directly within Virtuoso. Much like the ATTACH TABLE statement,
this SQL statement creates a local alias for a procedure on a given remote
data source so it can be considered locally defined. When this alias is called
called the procedure at the remote data source will actually be called.</para>
<para>Procedure generated result sets are not supported by the ATTACH PROCEDURE
statement. The only portable way to return values from a remote procedure is
to use INOUT or OUT parameters. Remote procedure result sets can be used by
combination of <function>rexecute()</function> and Virtuoso PL, but this is
left for the user to implement as required.</para>
<para>The ATTACH PROCEDURE statement is not able to define new connections to
remote data sources, the connection should be defined prior using either
vd_remote_data_source or by attaching a table or view using the ATTACH TABLE
statement with USER/PASSWORD supplied.</para>
<para>Note that when generating pass-through statements to a given remote,
any procedure call for an attached procedure is passed through if the current
DSN is the same as the remote procedure's DSN.</para>
<para>The <emphasis>proc_name</emphasis> is the designation of the
procedure's name on the remote data source, DSN. The remote procedure
name supplied should always be fully qualified to avoid ambiguity, it may
consist of an optional qualifier/catalog, optional owner and finally procedure
name, separated by dots. This must identify exactly one procedure on the
remote data source.</para>
<para>The optional <emphasis>local_name</emphasis> is an optionally qualified
procedure name which will identify the procedure on the local Virtuoso
database. If the local_name is not given it defaults to the <current
qualifier>.<dsn>.<proc name on dsn>. The <dsn> will be
the data source name in upper case and with all non-alphanumeric characters
replaced by underscores. The <proc name on dsn> will be the exact
name as it is on the remote dsn, case unmodified.</para>
<para>If a dsn is not previously defined with vd_remote_data_source or ATTACH
TABLE, the ATTACH PROCEDURE will fail.</para>
<example id="VDOCS-TRANS-01">
<title>Example:</title>
<para>
On remote Virtuoso (DSN name : remote_virt):
</para>
<programlisting>
CREATE PROCEDURE RPROC (IN PARAM VARCHAR) returns VARCHAR
{
return (ucase (PARAM));
}
</programlisting>
<para>
On the local virtuoso (DSN name : local_virt) :
</para>
<programlisting>
vd_remote_data_source ('remote_virt', '', 'demo', 'demopwd');
ATTACH PROCEDURE RPROC (IN PARAM VARCHAR) RETURNS VARCHAR from 'remote_virt';
</programlisting>
<para>
will result in creation of an procedure alias for RPROC in local_virt named DB.REMOTE_VIRT.RPROC
</para>
<para>
Calling it from the local_virt (using ISQL)
</para>
<programlisting>
select REMOTE_VIRT.RPROC ('MiXeD CaSe') as rproc_result;
rproc_result
---------------
MIXED CASE
1 rows
</programlisting>
</example>
</sect4>
<tip><title>See Also:</title>
<para>The Virtuoso Visual Server Administration Interface provides a
graphical user interface for
<link linkend="remoteprocedures">linking remote stored procedures</link>.</para></tip>
</sect3>
<!-- ======================================== -->
<sect3 id="TYPEMAPPING">
<title>Data Type Mappings</title>
<para>
If a statement is passed through to a remote data source, the types
returned by SQLDescribeCol are taken directly from the remote prepare
and converted to the closest Virtuoso supported type.
</para>
<para>
If a statement involves both local and remote resources all types are
taken from the Virtuoso copy of the data dictionary.
</para>
<para>
In executing remote selects Virtuoso binds output columns according
to the type and precision given by SQLDescribeCol after preparing the
remote statement.
</para>
<para>
When a table is attached from a remote data source the catalog is read and
the equivalent entries are created in Virtuoso. Since the types present on different
DBMS's vary, the following logic is used to map ODBC types to Virtuoso types.
</para>
<table colsep="1" frame="all" rowsep="0" shortentry="0" tocentry="1" tabstyle="decimalstyle" orient="land" pgwide="0">
<title>Attach Table Type Mappings</title>
<tgroup align="char" charoff="50" char="." cols="2">
<colspec align="left" colnum="1" colsep="0" colwidth="20pc"/>
<thead>
<row>
<entry>SQL Type</entry>
<entry>Mapped Type</entry>
</row>
</thead>
<tbody>
<row>
<entry>SQL_CHAR</entry>
<entry>varchar (precision)</entry>
</row>
<row>
<entry>SQL_VARCHAR</entry>
<entry>varchar (precision)</entry>
</row>
<row>
<entry>SQL_BIT</entry>
<entry>smallint </entry>
</row>
<row>
<entry>SQL_TINYINT</entry>
<entry>smallint</entry>
</row>
<row>
<entry>SQL_SMALLINT</entry>
<entry>smallint </entry>
</row>
<row>
<entry>SQL_INTEGER</entry>
<entry>integer </entry>
</row>
<row>
<entry>SQL_BIGINT</entry>
<entry>decimal (20)</entry>
</row>
<row>
<entry>SQL_DECIMAL</entry>
<entry/>
</row>
<row>
<entry>SQL_NUMERIC</entry>
<entry>
<para>smallint if precision < 5 and scale = 0</para>
<para>integer if precision < 10 and scale = 0</para>
<para>double precision if precision < 16</para>
<para>decimal (precision, scale) otherwise</para>
</entry>
</row>
<row>
<entry>SQL_REAL</entry>
<entry>real</entry>
</row>
<row>
<entry>SQL_FLOAT</entry>
<entry>double precision</entry>
</row>
<row>
<entry>SQL_DOUBLE</entry>
<entry>double precision</entry>
</row>
<row>
<entry>SQL_BINARY</entry>
<entry>varbinary (precision)</entry>
</row>
<row>
<entry>SQL_VARBINARY</entry>
<entry>varbinary (precision)</entry>
</row>
<row>
<entry>SQL_LONGVARCHAR</entry>
<entry>long varchar</entry>
</row>
<row>
<entry>SQL_LONGVARBINARY</entry>
<entry>long varbinary</entry>
</row>
<row>
<entry>SQL_DATE</entry>
<entry>date </entry>
</row>
<row>
<entry>SQL_TIME</entry>
<entry>time</entry>
</row>
<row>
<entry>SQL_TIMESTAMP</entry>
<entry>datetime</entry>
</row>
</tbody>
</tgroup>
</table>
<note>
<title>Note:</title>
<para>The general case of decimal and numeric both revert to the Virtuoso
decimal type, which is a variable precision decimal floating point.</para>
</note>
</sect3>
<!-- ======================================== -->
<sect3 id="TRANSMODEL">
<title>Transaction Model</title>
<para>
One transaction on the Virtuoso VDBMS server may contain operations
on multiple remote data sources. As a general rule remote connections
are in manual commit mode and Virtuoso either commits or rolls back the
transaction on each of the remote connections as the main transaction
terminates.
</para>
<para>
ODBC does not support two phase commit. Therefore a transaction that
succeeds on one remote party may fail on another.
</para>
<para>
A transaction involving local tables and tables on one remote data source
will always complete correctly since the remote is committed before the
local and the local is rolled back if the remote commit fails.
</para>
<para>
Note that even though the client to Virtuoso connection may be in
autocommit mode the continuing connections will typically not be
autocommitting.
</para>
<para>
A remote connection is in autocommit mode if the Virtuoso connection
is and the statement is passed through unmodified. In all other cases
remote connections are in manual commit mode.
</para>
<para>Virtuoso supports 2PC - Two Phase Commit. See the
<link linkend="twopcimplementation">Distributed Transaction & Two Phase
Commit</link> section for more information.</para>
</sect3>
<!-- ======================================== -->
<sect3 id="VDB_SQLFUNCTIONS">
<title>Virtual Database and SQL Functions</title>
<para>
Different DBMS's support slightly different sets of SQL built-in functions
with slightly differing names. For example, what is called substring on Virtuoso
is called substr on Informix. Virtuoso allows declaring equivalences between local
user-defined or built-in functions and user defined or built-in functions on remote servers.
Knowing that a function exists on a remote database allows passing processing closer to the data,
resulting in evident gains in performance.
</para>
<para>
To declare that substring is called substr on DSN inf10, you can execute:
</para>
<programlisting>
db..vd_pass_through_function ('inf10', 'substring', 'substr');
</programlisting>
<para>
The first argument is the name of the remote database, as used with attach table and related statements.
If user defined functions with qualified names are involved, the names should be qualified in
the vd_pass_through_function call also. If many qualified or unqualified forms of the name are in use,
one should declare the mapping for them all.
</para>
<para>
To verify whether this declaration takes effect, one can use explain to see the query execution plan, for example:
</para>
<programlisting>
explain ('select substring (str, 1, 2) from inf10.sample_table');
</programlisting>
<para>
The declarations are persistent and can be dropped by using a last argument of NULL
for a given function. The declarations are kept at the level of a DSN and not at the level
of the type of DBMS because different instances can have different user defined functions defined.
</para>
</sect3>
<!-- ======================================== -->
<sect3 id="VDB_SQLOPTSTATISTICS">
<title>Virtual Database and SQL Optimizer Statistics</title>
<para>
If a query can be executed in its entirety on a single remote database, then optimizing this query
is exclusively the business of the remote database, as it gets the whole query text.
Virtuoso rewrites some things and suggests a join order but these are not binding on the remote database.
</para>
<para>
If a query involves tables from multiple remote databases or a a mix of local and remote tables,
knowing basic SQL statistics on the tables is essential for producing a meaningful query plan.
Virtuoso has information on indices existing on remote tables and if the remote table is attached
from a known type of DBMS, Virtuoso may read the DBMS specific statistics at attach time.
</para>
<para>
Note that the statistics of the remote database should be up to date before attaching.
</para>
<para>
The function sys_db_stat can be used for forcing a refresh of Virtuoso's statistics on remote tables.
</para>
<programlisting>
sys_db_stat (5, 0)
</programlisting>
<para>
will go through all tables, local and remote. For local tables, it will update statistics with a 5 percent
sampling rate and for remote tables it will refresh the statistics if the type of the host DBMS is among
the supported ones. If the remote DBMS is of an unknown type, Virtuoso will take the count of the remote
table and select the 1000 first rows to get a sample of data lengths and column cardinalities.
This is not very precise but will be better than nothing.
</para>
<para>
In order to force a full read of a remote table for statistics gathering, one can use
</para>
<programlisting>
db..sys_stat_analyze ('fully qualified table name', 0, 0);
</programlisting>
<para>
The table name is case sensitive, with all qualifiers, as it appears in SYS_KEYS and other system tables.
This will read the whole table.
</para>
<para>
Statistics on local as well as remote tables are kept in SYS_COL_STAT.
One may look at this table to see the effects of remote statistics collection.
In special cases, if a special effect is desired or the information is not otherwise available,
as in the case of a very large table on an unsupported type of server, it is possible to manually
update the contents of the table. Shutting down and restarting Virtuoso will force a reload of the statistics information.
</para>
<para>
Presently Oracle, Informix and Virtuoso are supported for direct access to the remote database's
statistics system tables. It is possible to define hook functions for accessing this same information
from any other type of DBMS. Please contact support for instructions on this.
</para>
</sect3>
<!-- ======================================== -->
<sect3 id="VDB_DISTRQUERYOPTM">
<title>Distributed Query Optimization</title>
<para>
When a query contains mixes of tables from different sources, the compiler must decide on an efficient
execution plan that minimizes the number of round trips to remote servers and evaluates query conditions
close to the data when possible. Additionally, any normal query optimization considerations such as choice
of join order and join type apply. See the section on SQL optimization and optimizer hints for more on this.
Additionally, the SQL optimizer uses round trip time statistics for the servers involved in the query.
</para>
<para>
In the following examples, we use the tables r1..t1, r2..t1 and t1, of which r1..t1 is on a server close by,
r2..t1 on a server farther away and t1 on the local server. The column row_no is a unique key and the string1
column is in indexed with 300 distinct values, the column fs1 has 3 distinct values. The tables all have 100000
rows. A round trip to r1 takes 10 ms and a round trip to r2 takes 100 ms.
</para>
<para>
Consider
</para>
<programlisting>
select * from r1..t1 a, t1 b where a.row_no = b.row_no and a.fs1 = 'value1';
</programlisting>
<para>
The compiler notices that 33333 rows will be selected from r1..t1 based on fs1. It will decide to read these
into a hash table, causing one linear scan of r1..t1 with relatively few round trips. Then it will read t1
locally and select the rows for which there is a matching entry in the hash. This is slightly faster
than doing 33333 random lookups of t1. If fewer rows were selected from r1..t1, the compiler would do a
loop join with the local t1 as the inner loop.
</para>
<para>
The absolute worst plan would be a loop join with t1 as the outer loop, with 100000 round trips to r1.
</para>
<para>
Now, if many tables are accessed from the same data source, the compiler tries to bundle these together into one statement. Thus, for:
</para>
<programlisting>
select * from r1..t1 a, r1..t1 b, t1 c where c.string1 = '111' and b.row_no = c.row_no and a.row_no = b.row_no + 1;
</programlisting>
<para>
The compiler will probably do the outer loop for t1, which is expected to select 100000/300 rows. Then it will do a
round trip to r1 with the statement.
</para>
<programlisting>
select * from t1 a, t1 b where a.row_no = b.row_no + 1 and a.row_no = ?.
</programlisting>
<para>
This is likely better than doing the remote part as an outer loop, bringing all the approx 100000 results in.
333 round trips selecting 1 row is better than 100000 rows transferred. If the data source were further away,
this could be otherwise, hence the importance of the round trip time statistic.
</para>
<para>
In distributed queries, the compiler will honor the option (order) and the join types e.hg. table option *(hash)
insofar the tables are local.
</para>
<para>
Thus, if we wrote
</para>
<programlisting>
select * from r1..t1 a, t1 b, r1..t1 c where c.string1 = '111' and b.row_no = c.row_no and a.row_no = b.row_no + 1 option (order);
</programlisting>
<para>
the compiler could not merge the two tablesfrom r1 into a single query because the order were given and there is
an intervening table not located on r1.
</para>
</sect3>
<!-- ======================================== -->
<sect3 id="VDB_ARRAYPARAMETERS">
<title>Use of Array Parameters</title>
<para>
ODBC and other data access API's usually offer a mechanism for executing a single parametrized
statement multiple times with a single client-server round trip. This is usually called support of array parameters.
</para>
<para>
Virtuoso can make use of array parameter support in ODBC drivers when available. Consider the statement:
</para>
<programlisting>
insert into r1..t1 select * from t1;
</programlisting>
<para>
Without array parameters, this would make a round trip to r1 for each row inn t1. With array parameters enabled,
with a batch size of 100, this would make only 1000 round trips for 100000 rows, resulting in dramatic increase
in performance. Typically, if the remote server is on the same machine, array parameters make such batch operations
about 3x faster. If the remote is farther away, the gain is greater.
</para>
<para>
Array parameters are used if the remote database and its ODBC driver support them. The feature is globally disabled
in the default configuration because some ODBC drivers falsely claim to support array parameters. To enable this
feature, the the ArrayOptimization entry in the [VDB] section of the ini file to 1. To set the batch size, use the
NumArrayParameters setting. 100 is a reasonable value.
</para>
<para>
Some ODBC drivers also support array parameters for select statements. To enable using this, you can set the
ArrayOptimization setting to 2. This may however not work with some drivers even if DML (insert/update/delete)
statements do work with array parameters.
</para>
</sect3>
<!-- ======================================== -->
<sect3 id="TIMESTAMP_AUTOINCREMENT">
<title>Timestamps & Autoincrement</title>
<para>
A transaction timestamp is not the same across the transaction
if the transaction has branches in different databases.
</para>
<para>
The data type and precision of a time stamp will also vary between
different types of databases.
</para>
<para>
Hence timestamp columns coming from tables on different servers are not
comparable for equality.
</para>
<para>
In inserts and updates of remote tables timestamps are assigned by
the database where the table in question is physically located.
</para>
<para>
Identity or autoincrement columns are likewise handled by the database holding the
remote table.
</para>
<para>
Note that MS SQL Server and Virtuoso describe a timestamp column as a binary column in ODBC
catalog and meta data calls. Thus remote SQL Server or Virtuoso timestamps will not appear as timestamps at all.
</para>
<para>
In the case of a Virtuoso remote database the binary timestamp can be cast into a DATETIME data type
and it will appear as a meaningful datetime.
</para>
</sect3>
<!-- ======================================== -->
<sect3 id="vdbSTOREDPROCS">
<title>VDB Stored Procedures & Functions</title>
<para>These procedures allow you to manually manage remote data sources and their tables.</para>
<itemizedlist>
<listitem><link linkend="fn_vd_remote_data_source">vd_remote_data_source()</link></listitem>
<listitem><link linkend="fn_vd_remote_table">vd_remote_table()</link></listitem>
<listitem><link linkend="fn_rexecute">rexecute()</link></listitem>
<listitem><link linkend="fn_rnext">rnext()</link></listitem>
<listitem><link linkend="fn_rmoreresults">rmoreresults()</link></listitem>
<listitem><link linkend="fn_rclose">rclose()</link></listitem>
<listitem><link linkend="fn_rstmtexec">rstmtexec()</link></listitem>
</itemizedlist>
<para>Functions capable of returning a result-set make use of the
<parameter>results_set</parameter> parameter. To prevent them from returning
a result-set, the <parameter>results_set</parameter> parameter should be set to
'null'. If Virtuoso finds an awaiting parameter to contain <parameter>results_set</parameter>
it will fetch the result set regardless of <parameter>cursor_handle</parameter>
parameter.</para>
<para>Unless explicitly granted, only the DBA group is permitted to use the
<function>rexecute()</function> to maintain security. Caution is required
here since any user granted use of <function>rexecute()</function> has
full control of the remote data source set-up by the DBA, however limited
to the overall abilities of the remote user on the remote data source.
Users can be granted and denied access to this function using the following
commands:</para>
<programlisting><![CDATA[
GRANT REXECUTE ON '<attached_dsn_name>' TO <user_name>
REVOKE REXECUTE ON '<attached_dsn_name>' FROM <user_name>
]]></programlisting>
<para>The following remote catalogue functions help you to obtain information about the
remote datasources that you are using. These could be especially useful in Virtuoso PL
later on if you are not able to know everything about the remote tables ahead of time for
the ATTACH TABLE statement</para>
<itemizedlist>
<listitem><link linkend="fn_sql_data_sources">sql_data_sources()</link></listitem>
<listitem><link linkend="fn_sql_tables">sql_tables()</link></listitem>
<listitem><link linkend="fn_sql_columns">sql_columns()</link></listitem>
<listitem><link linkend="fn_sql_statistics">sql_statistics()</link></listitem>
<listitem><link linkend="fn_sql_primary_keys">sql_primary_keys()</link></listitem>
</itemizedlist>
</sect3>
<!-- ======================================== -->
<sect3 id="MANSETRDS">
<title>Manually Setting Up A Remote Data Source</title>
<para>
Defining a remote table involves declaring the table as a local table
and then defining the data source if not already defined and associating
the new table with the remote data source.
</para>
<para>
The data source on which a table resides is declared at the table
level. This has no connection to the table's qualifier.
</para>
<para>
Assume a remote ODBC data source named test containing a table xyz
declared as follows:
</para>
<example id="VDOCS-TRANS-01">
<title>Example:</title>
<programlisting>
CREATE TABLE XYZ (
A INTEGER,
B INTEGER,
C INTEGER,
PRIMARY KEY (A));
</programlisting>
</example>
<para>
To defined this as a remote table on the data source Virtuoso, first
define the table locally, using the above CREATE TABLE statement above.
</para>
<para>
Then define the data source:
</para>
<programlisting>
DB..vd_remote_data_source ('test', '', 'sa','');
</programlisting>
<para>
And the table:
</para>
<programlisting>
DB..vd_remote_table ('test', 'DB.DBA.XYZ', 'master.dbo.XYZ');
</programlisting>
<para>
This assumes that the remote data source has a login 'sa' with an empty
password and no special connection string options. The table names in
vd_remote_table have to be fully qualified. We here assume that the
Virtuoso table was created by DBA in under the default qualifier DB and
the remote XYZ was created by dbo in master.
</para>
<para>
The vd_remote_table declaration does not affect statements or procedures
compiled prior to the declaration.
</para>
<para>
Additional indices of remote tables may optionally be defined. They do
not affect the operation of the SQL compiler. The remote data source
makes the choice of index based on the order by clause in the statement
passed through.
</para>
</sect3>
<!-- ======================================== -->
<sect3 id="BUGSlimits">
<title>Caveats</title>
<itemizedlist mark="bullet">
<listitem>
<para>
Never attempt to attach a local table as a remote. The server will hang if it tries
to make a remote commit on itself.
</para>
</listitem>
<listitem>
<para>
If the schema of the remote table is changed it will need to be re-attached to Virtuoso.
</para>
</listitem>
<listitem>
<para>
The Virtuoso server treats dots (.) in the double-quotes escaped names as name element separators.
For example : the table name "a.b.c" is treated as "a"."b"."c" .
Because of this remote tables with dots in their table name (like tables from MS Text driver) require the dot
inside the table name to be replaced with the VDB "non-delimiting-dot" (\x0A) and the
vd_attach_table (in dsn varchar, in remote_name varchar, in local_name varchar, in uid varchar, in pwd varchar)
to be used instead of ATTACH TABLE statement.
</para>
<para>
The statement ATTACH TABLE "datafile.txt" as 'test' from 'text' user 'a' password 'b' should become :
</para>
<programlisting>
vd_attach_table ('text', 'datafile\x0Atxt', 'test', 'a', 'b');
</programlisting>
</listitem>
</itemizedlist>
<para>When Virtuoso interacts with a table or view attached from a remote
data source, it must be able to uniquely identify each row of the query.
At the attach time Virtuoso will query remote data source for the tables
primary keys and indices. These will be used to construct a copy of the
table definition in Virtuoso which is then used in reference to the remote
data source. At query time this information is used as much as possible.
This information may need to be supplemented by calls to SQLStatistics() for
further indicies or primary key information, as a last resort Virtuoso will
use SQLColAttribute() to determine which columns are SQL_DESC_SEARCHABLE.
</para>
</sect3>
</sect2>
|