1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667
  
     | 
    
      \documentclass[a4paper,11pt]{article}
\usepackage{hyperref,color,url,alltt,parskip}
%\usepackage{Sweave}
\usepackage{Rd}  % mainly for upright quotes.
\usepackage{graphicx}
%\newcommand{\code}[1]{\texttt{#1}}
%\newcommand{\pkg}[1]{\texttt{#1}}
%\newcommand{\var}[1]{\emph{#1}}
%\newcommand{\link}[1]{#1}
%\newcommand{\sQuote}[1]{`#1'}
\newenvironment{smallexample}{\begin{alltt}\scriptsize}{\end{alltt}}
\newenvironment{example}{\begin{alltt}\small}{\end{alltt}}
\setcounter{topnumber}{2}
\def\topfraction{1.0}
\setcounter{bottomnumber}{1}
\def\bottomfraction{.3}
\setcounter{totalnumber}{3}
\def\textfraction{0}
\def\floatpagefraction{0.8}
\setcounter{dbltopnumber}{2}
\usepackage{color}
\definecolor{Blue}{rgb}{0,0,0.8}
\hypersetup{%
colorlinks,%
plainpages=true,%
linkcolor=black,%
citecolor=black,%
urlcolor=Blue,%
pdfstartview=FitH,%
pdfview={XYZ null null null},%
%pdfpagemode=UseNone,%
pdftitle={ODBC Connectivity for R},%
pdfauthor={B. D. Ripley}%
}
\begin{document}
%\VignetteIndexEntry{ODBC Connectivity}
%\VignetteDepends{RODBC}
%\VignetteKeywords{databases}
%\VignetteKeywords{ODBC}
%\VignettePackage{RODBC}
%\SweaveOpts{engine=R,eps=FALSE}
\title{ODBC Connectivity}
\author{by Brian Ripley\\
  Department of Statistics, University of Oxford\\
\url{ripley@stats.ox.ac.uk}
}
\maketitle
\sloppy
\noindent
Package \pkg{RODBC} implements ODBC database connectivity.  It was
originally written by Michael Lapsley (St George's Medical School,
University of London) in the early days of R (1999), but after he
disappeared in 2002, it was rescued and since much extended by Brian
Ripley.  Version 1.0-1 was released in January 2003, and \pkg{RODBC}
is nowadays a mature and much-used platform for interfacing R to
database systems.
Thanks to Marc Schwartz for contributing some of the experiences
here.  See also the archives of the \code{R-sig-db} mailing list.
\section{ODBC Concepts}
ODBC aims to provide a common API for access to SQL\footnote{SQL is a
  language for querying and managing data in databases---see
  \url{http://en.wikipedia.org/wiki/SQL}.}-based database management
systems (DBMSs) such as MySQL\footnote{and its fork, MariaDB},
PostgreSQL, Microsoft Access and SQL Server, DB2, Oracle and SQLite.
It originated on Windows in the early 1990s, but ODBC \emph{driver
  managers} \code{unixODBC} and \code{iODBC} are nowadays available on
a wide range of platforms (and a version of \code{iODBC} shipped with
versions of Mac OS X prior to 10.9).  The connection to the particular
DBMS needs an \emph{ODBC driver}: these may come with the DBMS or the
ODBC driver manager or be provided separately by the DBMS developers,
and there are third-party\footnote{but there are close links between
  \code{unixODBC} and Easysoft, and \code{iODBC} and OpenLink.}
developers such as Actual Technologies, Easysoft and OpenLink. (This
means that for some DBMSs there are several different ODBC drivers
available, and they can behave differently.)
Microsoft provides drivers on Windows for non-SQL database systems
such as DBase and FoxPro, and even for flat files and Excel
spreadsheets. Actual Technologies sell a driver for Mac OS X that
covers (some) Excel spreadsheets and flat files.
A connection to a specific database is called a \emph{Data Source
  Name} or DSN (see
\url{http://en.wikipedia.org/wiki/Database_Source_Name}). See
Appendix~B for how to set up DSNs on your system.  One of the greatest
advantages of ODBC is that it is a cross-platform client-server
design, so it is common to run R on a personal computer and access
data on a remote server whose OS may not even be known to the end
user.  This does rely on suitable ODBC drivers being available on the
client: they are for the major cross-platform DBMSs, and some vendors
provide `bridge' drivers, so that for example a `bridge' ODBC driver
is run on a Linux client and talks to the Access ODBC driver on a
remote Windows machine.
ODBC provides an abstraction that papers over many of the differences
between DBMSs.  That abstraction has developed over the years, and
\pkg{RODBC} works with ODBC version 3.  This number describes both the
API (most drivers nowadays work with API 3.51 or 3.52) and
capabilities.  The latter allow ODBC drivers to implement newer
features partially or not at all, so some drivers are much more
capable than others: in the main \pkg{RODBC} works with basic
features.  ODBC is a superset of the ISO/IEC 9075-3:1995 SQL/CLI
standard.
A somewhat biased overview of ODBC on Unix-alikes can be found at
\url{http://www.easysoft.com/developer/interfaces/odbc/linux.html}.
\section{Basic Usage}
Two groups of functions are provided in \pkg{RODBC}.  The mainly
internal \code{odbc*} commands implement low-level access to C-level
ODBC functions with similar\footnote{in most cases with prefix
  \code{SQL} replacing \code{odbc}.} names.  The \code{sql*} functions
operate at a higher level to read, save, copy and manipulate data
between data frames and SQL tables.  The two low-level functions which
are commonly used make or break a connection.
\subsection{Making a connection}
ODBC works by setting up a \emph{connection} or \emph{channel} from
the client (here \pkg{RODBC}) to the DBMSs as specified in the DSN.
Such connections are normally used throughout a session, but should be
closed explicitly at the end of the session---however \pkg{RODBC} will
clear up after you if you forget (with a warning that might not be
seen in a GUI environment).  There can be many simultaneous
connections.
The simplest way to make a connection is
\begin{example}
library(RODBC)
ch <- odbcConnect("\var{some\_dsn}")
\end{example}
and when you are done with it,
\begin{example}
close(ch)
# or if you prefer
odbcClose(ch)
\end{example}
The connection object \code{ch} is how you specify one of potentially
many open connections, and is the first argument to all other
\pkg{RODBC} functions.  If you forget the details, printing it will
give some summary information.
If the DBMS user and password are needed and not stored in the DSN,
they can be supplied by e.g.{}
\begin{example}
ch <- odbcConnect("\var{some\_dsn}", uid = "\var{user}", pwd = "\var{****}")
\end{example}
Users of the R GUI under Windows\footnote{This does not work from
  \code{Rterm.exe}.} have another possibility: if an incompletely
specified DSN is given, the driver-specific Data Source dialog box
will pop up to allow it to be completed.
% Something about security here?
More flexibility is available \emph{via} function
\code{odbcDriverConnect}, which works with a \emph{connection string}.
At its simplest it is
\begin{example}
"DSN=\var{dsn};UID=\var{uid};PWD=\var{pwd}"
\end{example}
but it can be constructed without a DSN by specifying a driver
directly \emph{via} \code{DRIVER=}, and more (in some cases many more)
driver-specific parameters can be given.  See the documentation for
the driver (and Appendix~A) for more details.
\subsection{Reading from a database}
where `database' can be interpreted very widely, including for example
Excel spreadsheets and directories of flat files.
The simplest and most common use of \pkg{RODBC} is to extract data
from databases held on central database servers.  Such access is
read-only, and this can be enforced by settings in the DSN or
\emph{via} permission settings (also known as \emph{privileges}) on
the database.
To find out what tables are accessible from a connection \code{ch}, use
\begin{example}
sqlTables(ch)
\end{example}
Some drivers will return all visible table-like objects, not just
those owned by you.  In that case you may want to restrict the scope
by e.g.{}
\begin{example}
sqlTables(ch, tableType = "TABLE")
sqlTables(ch, schema = "\var{some\_pattern}")
sqlTables(ch, tableName = "\var{some\_pattern}")
\end{example}
The details are driver-specific but in most cases
\code{\var{some\_pattern}} can use wildcards\footnote{these are the
  SQL wildcards used for example in \code{LIKE} clauses.} with
\emph{underscore} matching a single character and \emph{percent}
matching zero or more characters.  Since underscore is a valid
character in a table name it can be handled literally by preceding it
by a backslash---but it is rarely necessary to do so.
A table can be retrieved as a data frame by
\begin{example}
res <- sqlFetch(ch, "\var{table\_name}")
\end{example}
If it has many rows it can be retrieved in sections by
\begin{example}
res <- sqlFetch(ch, "\var{table\_name}", max = \var{m})
res <- sqlFetchMore(ch, "\var{table\_name}", max = \var{m})
\dots
\end{example}
It is often necessary to reduce the data to be transferred: we have
seen how to subset rows, but it can be more effective to restrict the
columns or to return only rows meeting some conditions.  To find out
what columns are available, use \code{sqlColumns}, for example
\begin{smallexample}
> sqlColumns(ch, "USArrests")
   TABLE\_CAT TABLE\_SCHEM TABLE\_NAME COLUMN\_NAME DATA\_TYPE TYPE\_NAME COLUMN\_SIZE
1    ripley        <NA>  USArrests       State        12  varchar         255
2    ripley        <NA>  USArrests      Murder         8   double          15
3    ripley        <NA>  USArrests     Assault         4  integer          10
4    ripley        <NA>  USArrests    UrbanPop         4  integer          10
5    ripley        <NA>  USArrests        Rape         8   double          15
\dots
\end{smallexample}
Then an \emph{SQL Query} can be used to return part of the table, for
example (MySQL on Linux)
\begin{example}
> sqlQuery(sh, paste("SELECT State, Murder FROM USArrests",
+                    "WHERE Rape > 30 ORDER BY Murder"))
       State Murder
1   Colorado    7.9
2    Arizona    8.1
3 California    9.0
4     Alaska   10.0
5 New Mexico   11.4
6   Michigan   12.1
7     Nevada   12.2
8    Florida   15.4
\end{example}
Note that although there are standards for SQL, all the major
producers of DBMSs have their own dialects, so for example on the
Oracle and DB2 systems we tested this query had to be given as
\begin{example}
> sqlQuery(ch, paste('SELECT "State", "Murder" FROM "USArrests"',
+                    'WHERE "Rape" > 30 ORDER BY "Murder"'))
\end{example}
or even in upper case.  Describing how to extract data from databases
is the \emph{forte} of the SQL language, and doing so efficiently is
the aim of many of the DBMSs, so this is a very powerful tool.  To
learn SQL it is best to find a tutorial specific to the dialect you
will use; for example Chapter~3 of the MySQL manual is a tutorial.  A
basic tutorial which covers some common dialects\footnote{MySQL,
  Oracle and SQL Server.}  can be found at
\url{http://www.1keydata.com/sql/sql.html}: tutorials on how to
perform common tasks in several commonly used DBMSs are available at
\url{http://sqlzoo.net/}.
% <<>>=
% library(RODBC)
% channel <- odbcConnect("test")
% sqlSave(channel, USArrests, rownames = "State", verbose = TRUE)
% sqlQuery(channel, paste("select State, Murder from USArrests",
%                         "where Rape > 30 order by Murder"))
% sqlFetch(channel, "USArrests", rownames = "State")
% sqlDrop(channel, "USArrests")
% close(channel)
% @
\subsection{Table Names}
SQL-92 expects both table and column names to be alphanumeric plus
underscore, and \pkg{RODBC} does not in general support vendor
extensions (for example Access allows spaces).  There are some
system-specific quoting schemes: Access and Excel allow table names to
be enclosed in \code{[ ]} in SQL queries, MySQL (by default) quotes
\emph{via} backticks, and most other systems use the ANSI SQL standard
of double quotes.
%More recent SQL standards allow \code{\$} and \code{\#} under some
%circumstances.
The \code{odbcConnnect} function allows the specification of the
quoting rules for names \pkg{RODBC} itself sends, but sensible
defaults\footnote{backticks for MySQL, \code{[ ]} for the Access and
  Excel convenience wrappers, otherwise ANSI double quotes.} are
selected.  Users do need to be aware of the quoting issue when
writing queries for \code{sqlQuery} themselves.
Note the underscore is a wildcard character in table names for some of
the functions, and so may need to be escaped (by backslash) at times.
Normally table names containing a period are interpreted as references
to another schema (see below): this can be suppressed by opening the
connection with argument \code{interpretDot = FALSE}.
\subsection{Types of  table}
The details are somewhat DBMS-specific, but `tables' usually means
`tables, views or similar objects'.
In some systems `tables' are physical objects (files) that actually
store data---Mimer calls these \emph{base tables}.  For these other
`tables' can be derived that present information to the user, usually
called `views'.  The principal distinctions between a (base) table and
a view are
\begin{itemize}
  \item Using \code{DROP} on a table removes the data, whereas using
    it on a view merely removes the convenient access to a
    representation of the data.
  \item The access permission (\emph{privilege}) of a view can be very
    different from those of a table: this is commonly used to hide
    sensitive information.
\end{itemize}
A view can contain a subset of the information available in a single
table or combine information from two or more tables.
Further, some DBMSs distinguish between tables and views generated by
ordinary users and \emph{system tables} used by the DBMS itself.
Where present, this distinction is reflected in the result of
\code{sqlTable()} calls.
Some DBMSs support \emph{synonyms} and/or \emph{aliases} which are
simply alternative names for an existing table/view/synonym, often
those in other schemas (see below).
Typically tables, views, synonyms and aliases share a name space and
so must have a name that is unique (in the enclosing schema where
schemas are implemented).
\section{Writing to a Database}
To create or update a table in a database some more details need to be
considered.  For some systems, all table and column names need to be
lower case (e.g.{} PostgreSQL, MySQL on Windows) or upper case (e.g.{}
some versions of Oracle).  To make this a little easier, the
\code{odbcConnect} function allows a remapping of table names to be
specified, and this happens by default for DBMSs where remapping is
known to be needed.
The main tool to create a table is \code{sqlSave}.  It is safest to
use this after having removed any existing table of the same name,
which can be done by
\begin{example}
sqlDrop(ch, "\var{table\_name}", errors = FALSE)
\end{example}
Then in the simplest usage
\begin{example}
sqlSave(ch, \var{some\_data\_frame})
\end{example}
creates a new table whose name is the name of the data frame (remapped
to upper or lower case as needed) and with first column
\code{rownames} the row names of the data frame, and remaining columns
the columns of the data frame (with names remapped as necessary).
For the many options, see the help page.
\code{sqlSave} works well when asked to write integer, numeric and
reasonable-length\footnote{which of course depends on the DBMS.
  Almost all have an implementation of \code{varchar} that allows up
  to 255 bytes or characters, and some have much larger limits.
  Calling \code{sqlTypeInfo} will tell you about the data type
  limits.} character strings to the database.  It needs some help with
other types of columns in mapping to the DBMS-specific types of
column.  For some drivers it can do a good job with date and date-time
columns; in others it needs some hints (and e.g.{} for Oracle dates
are stored as date-times).  The files in the \code{RODBC/tests}
directory in the sources and the installed file \code{tests.R} provide
some examples.  One of the options is the \code{fast} argument: the
default is \code{fast = TRUE} which transfers data in binary format: the
alternative is \code{fast = FALSE} which transfer data as character
strings a row at a time---this is slower but can work better with some
drivers (and worse with others).
The other main tool for writing is \code{sqlUpdate} which is used to
change rows in an existing table.  Note that \pkg{RODBC} only does
this in a simple fashion, and on up-market DBMSs it may be better to
set cursors and use direct SQL queries, or at least to control
transactions by calls to \code{odbcSetAutoCommit} and
\code{odbcEndTran}.  The basic operation of \code{sqlUpdate} is to
take a data frame with the same column names (up to remapping) as some
or all of the columns of an existing table: the values in the data
frame are then used either to replace entries or to create new rows in
the table.
Rows in a DBMS table are in principle unordered and so cannot be
referred to by number: the sometimes tricky question is to know what
rows are to replaced.  We can help the process by giving one or more
\code{index} columns whose values must match: for a data frame the row
names are often a good choice.  If no \code{index} argument is
supplied, a suitable set of columns is chosen based on the properties
of the table.
\subsection{Primary keys and indices}
When a table is created (or afterwards) it can be given additional
information to enable it to be used effectively or efficiently.
\emph{Primary keys} are one (usually) or more columns that provide a
reliable way to reference rows in the table: values of the primary key
must be unique and not \code{NULL} (SQL parlance for `missing').
Primary keys in one table are also used as \emph{foreign keys} in
another table: this ensure that e.g.{} values of \code{customer\_id}
only take values which are included in the primary key column of that
name in table \code{customers}.  Support of foreign keys is patchy:
some DBMSs (e.g,{} MySQL prior to 6.0) accept specifications but
ignore them.
\pkg{RODBC} allows primary keys to be set as part of the
\code{sqlSave()} function when it creates a table: otherwise they can
be set by \code{sqlQuery()} in DBMS-specific ways (usually by
\code{ALTER TABLE}).
Columns in a table can be declared as \code{UNIQUE}: primary keys and
such columns are usually used as the basis for table indices, but
other indices (sometimes called \emph{secondary indices}) can be
declared by a \code{CREATE INDEX} SQL command.  Whether adding primary
keys or other indices has any effect on performance depends on the
DBMS and the query.
\section{Data types}
This can be confusing: R has data types (including \code{character},
\code{double}, \code{integer} and various classes including
\code{Date} and \code{POSIXct}), ODBC has both C and SQL data types,
the SQL standards have data types and so do the various DBMSs
\emph{and they all have different names} and different usages of the
same names.
Double- and single-precision numeric values and 32- and 16-bit
integers (only) are transferred as binary values, and all other types
as character strings.  However, unless \code{as.is = TRUE},
\code{sqlGetResults} (used by all the higher-level functions to return
a data frame) converts character data to an date/date-time class or
\emph{via} \code{type.convert}.
You can find out the DBMS names for the data types used in the columns
of a table by a call to \code{sqlColumns}, and further information is
given on those types in the result of \code{sqlTypeInfo}.  For example
in MySQL,
\begin{smallexample}
  TABLE_CAT TABLE_SCHEM TABLE_NAME COLUMN_NAME DATA_TYPE TYPE_NAME COLUMN_SIZE
1    ripley        <NA>  USArrests       State        12   varchar         255
2    ripley        <NA>  USArrests      Murder         8    double          15
3    ripley        <NA>  USArrests     Assault         4   integer          10
4    ripley        <NA>  USArrests    UrbanPop         4   integer          10
5    ripley        <NA>  USArrests        Rape         8    double          15
  BUFFER_LENGTH DECIMAL_DIGITS NUM_PREC_RADIX NULLABLE REMARKS COLUMN_DEF
1           255             NA             NA        0                 ''
2             8             NA             NA        1               <NA>
3             4              0             10        1               <NA>
4             4              0             10        1               <NA>
5             8             NA             NA        1               <NA>
  SQL_DATA_TYPE SQL_DATETIME_SUB CHAR_OCTET_LENGTH ORDINAL_POSITION IS_NULLABLE
1            12               NA               255                1          NO
2             8               NA                NA                2         YES
3             4               NA                NA                3         YES
4             4               NA                NA                4         YES
5             8               NA                NA                5         YES
\end{smallexample}
This gives the DBMS data by name and by number (twice, once the number
used in the DBMS and once that used by SQL---they agree here).  Other
things of interest here are the column size, which gives the maximum
size of the character representation, and the two columns about
`nullable' which indicate if the column is allowed to contain missing
values (SQL \code{NULL}s).
The result of \code{sqlTypeInfo} has 19 columns and in the version of
MySQL used here, 52 types.  We show a small subset of the more common types:
\begin{smallexample}
> sqlTypeInfo(channel)[<...>, c(1:3,7,16)]
     TYPE_NAME DATA_TYPE COLUMN_SIZE NULLABLE SQL_DATATYPE
1         bit        -7           1        1           -7
2     tinyint        -6           3        1           -6
6      bigint        -5          19        1           -5
18       text        -1       65535        1           -1
19 mediumtext        -1    16777215        1           -1
20   longtext        -1  2147483647        1           -1
22       char         1         255        1            1
23    numeric         2          19        1            2
24    decimal         3          19        1            3
25    integer         4          10        1            4
37   smallint         5           5        1            5
41     double         6          15        1            6
43      float         7           7        1            7
45     double         8          15        1            8
47       date        91          10        1            9
48       time        92           8        1            9
49       year         5           4        1            5
50   datetime        93          21        1            9
51  timestamp        93          14        0            9
52    varchar        12         255        1           12
\end{smallexample}
Note that there are both duplicate type names and duplicate type numbers.
Most DBMSs started with their own data types and later mapped the
standard SQL data types on to them, although these may only be
partially implemented.  Some DBMSs allow user-defined data types, for
example enumerations.
Commonly used data types fall into a number of groups:
\begin{description}
\item[Character types] Character types can be classified three ways:
  fixed or variable length, by the maximum size and by the character
  set used.  The most commonly used types\footnote{the SQL names for
  these are \code{CHARACTER VARYING} and \code{CHARACTER}, but these
  are too cumbersome for routine use.} are \code{varchar} for short
  strings of variable length (up to some maximum) and \code{char} for
  short strings of fixed length (usually right-padded with spaces).
  The value of `short' differs by DBMS and is at least 254, often a
  few thousand---often other types will be available for longer
  character strings.  There is a sanity check which will allow only
  strings of up to 65535 bytes when reading: this can be removed by
  recompiling \pkg{RODBC}.
  Many other DBMSs have separate types to hold Unicode character
  strings, often with names like \code{nvarchar} or \code{wvarchar}.
  Note that currently \pkg{RODBC} only uses the current locale for
  character data, which could be UTF-8 (and will be on Mac OS X and in
  many cases on Linux and other Unix-alikes), but is never UCS-2 as
  used on Windows.  So if character data is stored in the database in
  Unicode, it will be translated (with a possible loss of information)
  in non-Unicode locales.  (This may change in future versions of
  \pkg{RODBC}.)
  Some DBMSs such as PostgreSQL and SQL Server allow variable-length
  character strings of length only limited by resources.  These do not
  fit well with the ODBC model that requires buffers to be allocated
  to transfer character data, and so such types may be subjected (by
  the ODBC driver) to a fixed limit or not work at all.
\item[Integer types] Most DBMSs have types for 32-bit (\code{integer},
  synomyn \code{int}) and 16-bit (\code{smallint}) integers.  Some,
  including MySQL, also have unsigned versions and 1-bit, 8-bit and
  64-bit integer types: these further types would usually be
  transferred as character strings and converted on reading to an
  \code{integer} or \code{double} vector.
  Type names \code{int2}, \code{int4} and \code{int8} are common as
  synonyms for the basic type names.
  The SQL standard does not require \code{integer} and \code{smallint}
  to be binary (rather than decimal) types, but they almost always are
  binary.
  Note that 64-bit integers will be transferred as character strings
  and read by \code{sqlGetResults} as character vectors or (for
  $2^{31} \le |x| < 2^{53}$) as \code{double} vectors.
\item[Floating-point types] The basic SQL floating-point types are 8
  and 7 for double- and single-precision binary types.  The SQL names
  are \code{double precision} and \code{real}, but beware of the
  variety of names.  Type~6 is \code{float} in the standard, but is
  used by some DBMSs\footnote{In Oracle the \code{FLOAT} type is a
    decimal and not a binary type.} for single-precision and by some
  for double-precision: the forms \code{float(24)} and
  \code{float(53)} are also commonly supported.
  You should not assume that these types can store \code{Inf},
  \code{-Inf} or \code{NaN}, but they often can.
\item[Other numeric types] It is common to store decimal quantities in
  databases (e.g.{} currency amounts) and types 2 and 3 are for
  decimals. Some DBMSs have specialized types to handle currencies,
  e.g.{} \code{money} in SQL Server.
  Decimal types have a \emph{precision} (the maximum number of
  significant decimal digits) and \emph{scale} (the position of the
  decimal point).  \code{numeric} and \code{decimal} are usually
  synonymous, but the distinction in the standards is that for
  \code{numeric} the precision is exact whereas for \code{decimal} the
  DBMS can use a larger value than that specified.
  % e.g. Mimer
  Some DBMSs have a type \code{integer(\var{p})} to represent up to
  \code{\var{p}} decimal digits, and this may or may not be distinct
  from \code{decimal(\var{p}, 0)}.
  % 'currently' is from 5.1.x
  DBMSs do not necessarily fully implement decimal types, e.g.{} MySQL
  currently stores them in binary and used to store them as character
  strings.
\item[Dates and times] The handling of dates and times is very much
  specific to the DBMS.  Some allow fractional seconds in date-times,
  and some do not; some store timezones with date-times or always use
  UTC and some do not, and so on.  Usually there are also types for
  time intervals.
  All such types are transferred as character strings in \pkg{RODBC}.
\item[Binary types]
  These are less common, and unsupported by \pkg{RODBC} prior to
  version 1.3-0.  They parallel character types in that they are a
  sequence of bytes of fixed or variable length, sometimes with
  additional types for long sequences: there are separate ODBC types
  for \code{SQL\_BINARY}, \code{SQL\_VARBINARY} and
  \code{SQL\_LONGVARBINARY}.
  Binary types can currently only be read as such, and they are
  returned as column of class \code{"ODBC\_binary"} which is a list of
  raw vectors.
\end{description}
% An example is BOOLEAN in Mimer
It is possible (but rare) for the DBMS to support data types that the
ODBC driver cannot handle.
\subsection{Data types when saving a data frame}
When \code{sqlSave} creates a table, there is some choice as to the
SQL data types used.
The default is to select the SQL data type from the R type via the
\code{typeInfo} argument to \code{sqlSave}.  If this is not supplied
(usual) a default mapping is looked up using \code{getSqlTypeInfo()}
or by interrogating \code{\link{sqlTypeInfo()}}.  This will almost
always produce the correct mapping for numeric, integer and character
columns of up to 254 characters (or bytes).  In other cases (include
dates and date-times) the desired SQL type can be specified for each
column \emph{via} the argument \code{varTypes}, a named character
vector with names corresponding to (some of) the names in the data
frame to be saved.
Only a very few DBMSs have a logical data type and the default mapping
is to store R logical vectors as \code{varchar(5)}.  For
others DBMSs \code{BIT}, \code{TINYINT} or an enumeration type could
be used (but the column may be need to be converted to and from a
suitable representation).  For example, in MySQL we could use
\code{enum('FALSE', 'TRUE')}, but this is actually stored as
\code{char(5)}.  Note that to represent \code{NA} the SQL data type
chosen needs to be nullable, which \code{BIT} often is not.
(Mimer has a nullable data type \code{BOOLEAN} but this is not
supported by the ODBC client.)
\subsection{SQLite}
SQLite's concept of `data type' is anomalous: version~3 does recognize
types of data (in version~2 everything was a character string), but it
does not have a fixed type for a column in a table (although the type
specified in the \code{CREATE TABLE} statement is a `recommended' type
for the values of that column).  Every value is categorized as null,
integer (of length 1, 2, 3, 4, 6 or 8 bytes), double, text (UTF-8 or
UTF-16) or BLOB (a sequence of bytes).  This does not fit well with
the ODBC interface which pre-determines a type for each column before
reading or writing it: the `SQLite ODBC' driver falls back to a
\code{SQL\_VARCHAR} or \code{SQL\_LONGVARCHAR} type if the column type
is not available.
\subsection{ODBC data types}
ODBC defines two sets of data types: \emph{SQL data types} and \emph{C
  data types}. SQL data types indicate the data types of data stored
at the data source using standard names. C data types indicate the
data types used in the compiled code in the application (here
\pkg{RODBC}) when transferring data and are the same for all drivers.
The ODBC SQL data types are abstractions of the data types discussed
above with names like \code{SQL\_INTEGER}.  They include
\code{SQL\_LONGVARCHAR} for large character types and
\code{SQL\_WVARCHAR} for Unicode character types.  It is usually these
types that are returned (by number) in the \code{SQL\_DATA\_TYPE}
column of the result of \code{sqlColumns} and \code{SQL\_DATATYPE}
column of the result of \code{sqlTypeInfo}.  The mapping from names to
numbers is given in table~1.
\begin{table}[tbp]
\qquad{\small\tt \begin{tabular}{lrlr}
SQL\_CHAR&1&SQL\_LONGVARCHAR&-1\\
SQL\_NUMERIC&2&SQL\_BINARY&-2\\
SQL\_DECIMAL&3&SQL\_VARBINARY&-3\\
SQL\_INTEGER&4&SQL\_LONGVARBINARY&-4\\
SQL\_SMALLINT&5&SQL\_BIGINT&-5\\
SQL\_FLOAT&6&SQL\_TINYINT&-6\\
SQL\_REAL&7&SQL\_BIT&-7\\
SQL\_DOUBLE&8&SQL\_WCHAR&-8\\
SQL\_DATETIME&9&SQL\_WVARCHAR&-9\\
SQL\_INTERVAL&10&SQL\_WLONGVARCHAR&-10\\
SQL\_TIMESTAMP&11&SQL\_GUID&-11\\
SQL\_VARCHAR&12\\
SQL\_TYPE\_DATE&91\\
SQL\_TYPE\_TIME&92\\
SQL\_TYPE\_TIMESTAMP&93
\end{tabular}}
\caption{Mapping between ODBC SQL data type names and numbers.  (GUIDs
  are 16-byte numbers, Microsoft's implementation of UUIDs.)}
\end{table}
The only ODBC C data types currently used by \pkg{RODBC} are
\code{SQL\_C\_DOUBLE}, \code{SQL\_C\_SLONG} (32-bit signed integers)
and \code{SQL\_C\_CHAR} for reading and writing, and
\code{SQL\_C\_FLOAT} (single-precision), \code{SQL\_C\_SSHORT} (16-bit
signed integers) and \code{SQL\_C\_BINARY} for reading from the
database.
\url{http://msdn.microsoft.com/en-us/library/ms713607%28VS.85%29.aspx}
is the defintiive source of information about ODBC data types.
\section{Schemas and Catalogs}
This is a more technical section: few users will need to deal with
these concepts.
\sQuote{Schemas}\footnote{which is the usual plural in this technial
  usage, athough \emph{schemata} is more usual in English.} are
collections of objects (such as tables and views) within a database
that are supported by some DBMSs: often a separate schema is
associated with each user (and \sQuote{schema} in ODBC~3 replaced
\sQuote{owner} in ODBC~2).  In SQL-92, schemas are collected in a
\sQuote{catalog} which is often implemented as a database.  Where
schemas are implemented, there is a \emph{current schema} used to find
unqualified table names, and tables in other schemas can be referred
to within SQL queries using the \code{\var{schema}.\var{table}}
notation.  You can think of a schema as analogous to a name space; it
allows related objects to be grouped together without worrying about
name clashes with other groups.  (Some DBMSs will search for
unqualified table names in a search path: see the detailed
descriptions below.)
Note that \sQuote{schema} is used in another sense in the database
literature, for the design of a database and in particular of tables,
views and privileges.
Here are some details of various DBMSs' interpretations of
\code{catalog} and \code{schema} current at the time of writing (mid
2009).  (These descriptions are simplistic, and in some cases
experimental observations.)
\begin{itemize}
  \item SQLite uses dotted names for alternative databases that
    are attached by an \code{ATTACH DATABASE} command.\footnote{and
      may be subsequently detached by a \code{DETACH DATABASE} command}
    There is a search path of databases, so it is only necessary to
    use the dotted name notation when there are tables of the same
    name on attached databases.  The initial database is known as
    \code{main} and that used for temporary tables as \code{temp}.
  \item MySQL uses \code{catalog} to refer to a database.  In MySQL's
    parlance, `schema' is a little-used synonym for \sQuote{database}.
  \item PostgreSQL only allows a session to access one database, and
    does not use `catalog' except to refer to the current database.
    Version 7.3 introduced schemas---users can create their own
    schemas with a \code{CREATE SCHEMA} query.  Tables are by default
    in the \code{public} schema, and unqualified table names are
    searched for along a \sQuote{search path} of schemas (by
    default, containing \code{public}).
  \item Oracle uses schemas as synonymous with \sQuote{owner}
    (also known as \sQuote{user}).  There is no way for a user to
    create additional schemas (that is not what \code{CREATE SCHEMA}
    does in Oracle).
  \item IBM DB2 uses schemas as name spaces for objects that may lie on
    different databases: using \emph{aliases} allows objects to be in
    more than one schema.  The initial current schema is named the
    same as the user (\code{SQLID} in DB2 parlance), but users can
    create additional schemas with \code{CREATE SCHEMA} statements.
  \item Microsoft SQL Server 2008 uses both \code{catalog} and
    \code{schema}, \code{catalog} for the database and \code{schema}
    for the type of object, e.g. \code{"sys"} for most of the system
    tables/views and (default) \code{"dbo"} for user tables.  Further
    schemas can be created by users.  The default schema for a user
    can be set when the user is created and changed \emph{via}
    \code{ALTER USER}.
    Prior to SQL Server 2005, `schema' meant `user', and the search
    path for unqualified names was the database user then \code{"dbo"}.
  \item The Microsoft Excel and Access ODBC drivers do not use
    schemas, but do use \code{catalog} to refer to other
    database/spreadsheet files.
  \item Mimer (\url{www.mimer.com}) uses schemas which are normally
    the same as users (which it calls \emph{IDENT}s), but users can
    create additional schemas with \code{CREATE SCHEMA} statements.
    There are also system schemas.  Mimer uses `schemata' as the
    plural of schema.
\end{itemize}
It is often possible to use \code{sqlTables} to list the available
catalogs or schemas: see its help page for the driver-specific
details.
\pkg{RODBC} usually works with tables in the current schema, but unless
the connection was opened with \code{interpretDot = FALSE} most
functions will attempt to interpret the `dotted name' notation.
The interpretation depends on the DBMS: the SQL-92 meaning is
\code{\var{schema}.\var{table}} and this is accepted by PostgreSQL,
Microsoft SQL Server, Oracle, DB2 and Mimer.  However, MySQL uses
\code{\var{database}.\var{table}}, and the functions
try\footnote{currerntly this is stymied by bugs in the ODBC driver,
  so \code{SQLColumns} is unable to report on tables in specified
  databases.}  that interpretation if they recognize a MySQL driver.
Some DBMSs allow more than two components, but these are not currently
supported by the \pkg{RODBC} functions.
Functions \code{sqlTables}, \code{sqlColumns} and
\code{sqlPrimaryKeys} have arguments \code{catalog} and \code{schema}
which in principle allow tables in other schemas to be listed or
examined: however these are only partially implemented in many current
ODBC drivers.  See the help page for \code{sqlTables} for some further
details.
For other uses, the trick is to select the schema(s) you want to use,
which is done \emph{via} an SQL statement sent by \code{sqlQuery}.
For Oracle you can set the default schema (owner) by
\begin{example}
\code{ALTER SESSION SET CURRENT\_SCHEMA = \var{schema}}
\end{example}
% http://sqlzoo.net/howto/source/z.dir/tip988922/oracle
whereas for PostgreSQL the search path can be changed \emph{via}
\begin{example}
\code{SET search\_path TO \var{schema1},\var{schema2}}.
\end{example}
% see also https://stat.ethz.ch/pipermail/r-help/2008-May/161304.html
In DB2, creating an alias in the current schema can be used to access
tables in other schemas, and a \code{CURRENT SCHEMA} query can be used
to change the current schema.  In MySQL and SQL Server a database can
be selected by a \code{USE \var{database}} query.
\section{Internationalization Issues}
Internationalization issues are made more complex by ODBC being a
client-server system, and the ODBC client (\pkg{RODBC}) and the server
may be running on different machines with different OSes on different
continents. So the client may need some help.
In most cases numeric data are transferred to and from R in binary
form, so the representation of the decimal point is not an issue.  But
in some cases it could be (e.g.{} decimal rather than binary SQL data types
will be transferred as character strings) and then the decimal point
to be used will be taken from \code{options("dec")}: if unset this is
set when \pkg{RODBC} is loaded from the setting of the current locale
on the machine running R (\emph{via} \code{Sys.localeconv}).
Some ODBC drivers (e.g.{} for SQL Server, Oracle) allow the locale (`NLS')
to be used for numeric values to be selected for the connection.
The other internationalization issue is the character encoding used.
When R and the DBMS are running on the same machine this is unlikely
to be an issue, and in many cases the ODBC driver has some options to
translate character sets.  SQL is an ANSI (US) standard, and DBMSs
tended to assume that character data was ASCII or perhaps 8-bit.  More
recently DBMSs have started to (optionally or by default) to store
data in Unicode, which unfortunately means UCS-2 on Windows and UTF-8
elsewhere.  So cross-OS solutions are not guaranteed to work, but most
do.
Encoding issues are best resolved in the ODBC driver or in DBMS
settings.  In the unusual case that this cannot be done, the
\code{DBMSencoding} argument to \code{odbcDriverConnect} allows for
recoding when sending data to or from the ODBC driver and thence the DBMS.
\section{Excel Drivers}
The Microsoft Excel ODBC drivers (Windows only) have a number of
peculiarities which mean that it should be used with care.
It seems that its concept of a `table' is principally a \emph{named
  range}. It treats worksheets as system tables, and appends a dollar
to their name (making then non-standard SQL table names: the quoting
convention used is to enclose such names in square brackets).
Column names are taken as the first row of the named range/worksheet.
Non-standard SQL names are allowed here too, but the driver maps
\code{.} to \code{\#} in column names.  Annoyingly, \code{sqlTables}
is allowed to select named ranges only by \code{tableType = "TABLE"}
but not to select only worksheets.
There are at least two known problems with reading columns that do not
have a format set \emph{before} data entry, and so start with format
\sQuote{General}.  First, the driver uses the first few rows to
determined the column type, and is over-fond of declaring
\sQuote{Numeric} even when there are non-numeric entries.  The default
number of rows consulted is 8, but attempts to change this in the DSN
setup are ignored.  Second, if a column is declared as \sQuote{Text},
numeric entries will be read as SQL nulls and hence R \code{NA}s.
Unfortunately, in neither case does reformatting the column help.
% http://support.microsoft.com/kb/141284
The connection is by default read-only.  It is possible to de-select
this in the DSN (and the convenience wrapper \code{odbcConnectExcel}
has a \code{readOnly = FALSE} argument to do so), but this does not
support deletion, including SQL \code{DROP}, \code{DELETE},
\code{UPDATE} and \code{ALTER} statements).  In particular,
\code{\link{sqlDrop}} will remove the data in a worksheet but not the
worksheet itself.
% http://publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp?topic=/com.ibm.db2.ii.doc/opt/tlsodb13.htm
% http://www.stata.com/support/faqs/data/odbc_excel.html
The driver does allow a worksheet to be updated by \code{sqlUpdate},
and for a new worksheet (with a different name from existing
worksheets) to be created by \code{sqlSave} (which also creates a
named range).
As far as we know, no similar issues affect the Actual Technologies
Mac OS X Excel driver: however, it allows only read-only access to
Excel files and does not support Excel 2007-and-later \code{.xlsx} files.
\section{DBMS-specific tidbits}
This section covers some useful DBMS-specific SQL commands and other
useful details.
Recent versions of several DBMSs have a schema
\code{INFORMATION\_SCHEMA} that holds many predefined system views.
These include MySQL (the name of a database, mainly populated
beginning with MySQL~5.1), SQL Server and Mimer.
\subsection*{MySQL}
Comments about MySQL are mostly applicable to its forks such as MariaDB.
We have already mentioned \code{USE \var{database}} as the way to change
the database in use.  \code{SHOW DATABASES} lists the databases `for
which you have some kind of privilege', and can have a \code{LIKE}
clause to restrict the result to some pattern of database names.
% MySQL 5.1 manual 12.3
The \code{DESCRIBE \var{table}} command is a compact way to get a
description of a table or view, similar to the most useful parts of
the result of a call to \code{sqlColumns}.  (It is also known as
\code{SHOW COLUMNS FROM \var{table}}.)
\code{SHOW TABLES} is the command to produce a table of the
tables/views on the current database, similar to \code{sqlTables}.
For example,
\begin{example}
> sqlQuery(channel, "USE ripley")
[1] "No Data"
> sqlQuery(channel, "SHOW TABLES")
  Tables_in_ripley
1        USArrests
> sqlQuery(channel, "DESCRIBE USArrests")
     Field         Type Null Key Default Extra
1    State varchar(255)   NO PRI      NA    NA
2   Murder       double  YES          NA    NA
3  Assault      int(11)  YES          NA    NA
4 UrbanPop      int(11)  YES          NA    NA
5     Rape       double  YES          NA    NA
\end{example}
\noindent
\code{SHOW FULL TABLES} gives an additional additional column
\code{Table\_type}, the types of the tables/views.
There is useful information for end users in the
\code{INFORMATION\_SCHEMA} \emph{database}, much more extensively as
from MySQL~5.1.
Some of the non-standard behaviour can be turned off, e.g.{} starting
MySQL with \code{--sql-mode=ANSI} gives closer conformance to the
standard, and this can be set for a single session by
\begin{example}
SET SESSION sql\_mode='ANSI'
\end{example}
To change just the behaviour of quotes (to use double quotes in place
of backticks) replace \code{ANSI} by \code{ANSI\_QUOTE}.
% manual section 10.4
The maximum size of a \code{char} column is 255 characters.  That of a
\code{varchar} column is up to 65535 characters (but there is a limit of
65535 bytes on the total size of a row), and those with a maximum of
255 or less are stored more efficiently.  Types \code{text},
\code{mediumtext} and \code{longtext} can hold more, and are not
subject to the row-size limit (\code{text} has default maximum size
65535, the default \pkg{RODBC} limit on transfers).
There are \code{binary}, \code{varbinary} and \code{blob} types which
are very similar to their character counterparts but with lengths in bytes.
\subsection*{PostgreSQL}
Table \code{pg\_tables} lists all tables in all schemas; you probably
want to filter on \code{tableowner='\var{current\_user}'}, e.g.{}
\begin{smallexample}
> sqlQuery(channel, "select * from pg_tables where tableowner='ripley'")
  schemaname tablename tableowner tablespace hasindexes hasrules hastriggers
1     public     dtest     ripley         NA          0        0           0
\end{smallexample}
There are both ANSI and Unicode versions of the ODBC driver on Windows:
they provide many customizations. One of these is read-only access,
another is if system tables are reported by \code{sqlTables}.
% manual chapter 8
The default size of a \code{varchar} column is unlimited, but those
with maximum length of 126 bytes or less are stored more efficiently.
However, the ODBC interface has limits, which can be set in the
configuration options.  These include the maximum sizes for
\code{varchar} (default 254) and \code{longvarchar} (default 8190),
and how to handle unknown column sizes (default as the maximum), and
whether `Text' is taken as \code{varchar} or \code{longvarchar} (which
affects the reported maximum size for a \code{varchar} column).
There is a single binary data type, \code{bytea}.
\subsection*{SQLite}
These comments are only about SQLite~3.x.
Table \code{sqlite\_master} lists tables and indices, and the
\code{sql} column gives the SQL command used.  E.g.{}
\begin{example}
> tmp <- sqlQuery(channel, "select * from sqlite_master")
> tmp[, "sql"] <- substr(tmp[, "sql"], 1, 16)
> tmp
   type                         name  tbl_name rootpage              sql
1 table                    USArrests USArrests        2 CREATE TABLE "US
2 index sqlite_autoindex_USArrests_1 USArrests        4             <NA>
\end{example}
My current versions of Christian Werner's SQLite~ODBC driver store
character data in the current locale's charset (e.g.{} UTF-8) on
Unix-alikes and by default in Unicode (UCS-2) on Windows (unless
de-selected in the DSN configuration).
The default collation for text data is byte-by-byte comparisons, so
avoid comparing non-ASCII character data in SQLite.
Actual Technologies sell an SQLite driver for Mac OS X which requires
\code{believeNRows = FALSE} and has a number of other issues including
that it seems not to support dropping tables.  (Christian Werner's
SQLite~ODBC driver was easy to install from the sources and worked
correctly.)
Version of the SQLite~ODBC driver since  0.87 have segfaulted on
the test suite.
\subsection*{Oracle}
Tables \code{cat}, \code{user\_table} and \code{user\_catalog} contain
useful information on tables.  Information on columns is in
\code{all\_tab\_columns}, e.g.{}
\begin{example}
> sqlQuery(channel,
           "select * from all\_tab\_columns where table_name='USArrests'")
   OWNER TABLE_NAME COLUMN_NAME DATA_TYPE DATA_TYPE_MOD
1 RIPLEY  USArrests       State  VARCHAR2            NA
2 RIPLEY  USArrests      Murder     FLOAT            NA
3 RIPLEY  USArrests     Assault    NUMBER            NA
4 RIPLEY  USArrests    UrbanPop    NUMBER            NA
5 RIPLEY  USArrests        Rape     FLOAT            NA
...
\end{example}
The Windows ODBC driver we tested had an option for a read-only connection.
% SQL Language Reference Manual chapter 2
Oracle's character data types are \code{CHAR}, \code{VARCHAR2}
(character set specified when the database was created) and
\code{NCHAR}, \code{NVARCHAR2} (Unicode), as well as \code{CLOB} and
\code{NCLOB} for large character strings.  For the non-Unicode types
the units of length are either bytes or charactor (set as a default
for the database) but can be overriden by adding a \code{BYTE} or
\code{CHAR} qualifier.  The limits are 4000 bytes apart from for
\code{CLOB} and \code{NCLOB}, which have very high limits.
There are \code{RAW} and \code{BLOB} data types.
\subsection*{DB2}
%http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp
Schema \code{syscat} contains many views with information about
tables: for example view \code{syscat.tables} lists all tables, and
\begin{example}
> sqlQuery(channel,
           "select * from syscat.columns where tabname='USArrests'")
  TABSCHEMA   TABNAME  COLNAME COLNO TYPESCHEMA TYPENAME LENGTH SCALE
1  RIPLEY   USArrests    State     0   SYSIBM    VARCHAR    255     0
2  RIPLEY   USArrests   Murder     1   SYSIBM     DOUBLE      8     0
3  RIPLEY   USArrests  Assault     2   SYSIBM    INTEGER      4     0
4  RIPLEY   USArrests UrbanPop     3   SYSIBM    INTEGER      4     0
5  RIPLEY   USArrests     Rape     4   SYSIBM     DOUBLE      8     0
...
\end{example}
%http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=/com.ibm.db2.luw.sql.ref.doc/doc/r0008483.html
The \code{CHAR} type can have size up to 254 bytes: the maximum size
of the \code{VARCHAR} type is 32762 bytes.  For larger character
strings there is the \code{CLOB} type (up to 2Gb).  These types can be
used to store data in a MBCS, including various Unicode encodings.
There are corresponding \code{BINARY}, \code{VARBINARY} and
\code{BLOB} data types.
\subsection*{SQL Server}
There are several hundred views in schemas \code{INFORMATION\_SCHEMA}
and \code{sys} which will be listed by \code{sqlTables} and also by
the stored procedure \code{sp\_tables}.  Another way to list tables is
\begin{example}
SELECT * FROM sysobjects WHERE xtype='U'
\end{example}
where the condition restricts to user tables.
\code{USE \var{database}} changes the database in use.
% http://msdn.microsoft.com/en-us/library/ms187752.aspx
Types \code{char} and \code{varchar} have a maximum specified size of
8000 bytes.  It is possible to use \code{varchar(max)} (previously
known as \code{text}) for a limit of 2Gb, but this may not work well
with the ODBC interface.  The Unicode types \code{nchar} and
\code{nvarchar} have a maximum specified size of 4000 characters:
again there is \code{nvarchar(max)} (formerly \code{ntext}).
There are corresponding \code{binary} and \code{varbinary} data types
(with \code{image} as an earlier name for \code{varbinary(max)}).
\subsection*{Mimer}
There are tens of views in schema \code{INFORMATION\_SCHEMA} which can
be read by SQL \code{SELECT} queries of the form
\begin{example}
SELECT \var{column-list}
FROM INFORMATION_SCHEMA.\var{view-name}
WHERE \var{condition}
\end{example}
See the Mimer SQL Reference Manual chapter on Data Dictionary views
for full details: two views are \code{TABLES} and \code{VIEWS}.
A session can be set to be read-only by the SQL command \code{SET
  SESSION READ ONLY}.
Mimer uses Latin-1 for its default character types but Unicode types
(\code{NCHAR} and \code{NVARCHAR}) are also available.  Unsurprisingly
given that the company is Swedish, different collations are allowed
for both Latin-1 and Unicode character types.
% reference manual chapter 6
The \code{char} and \code{varchar} columns have a maximum size of
15000 bytes: the \code{clob} data type is available for larger
character columns.  The \code{nchar} and \code{nvarchar} columns have
a maximum size of 5000 characters: the \code{nclob} data type is
available for larger Unicode columns.
There are corresponding \code{binary}, \code{varbinary} and
\code{blob} binary data types.
\newpage
\appendix
\section{Installation}
\pkg{RODBC} is simple to install, and binary distributions are
available for Mac OS X and Windows from CRAN.
To install from the sources, an \emph{ODBC Driver Manager} is
required.  Windows normally comes with one (it is part of MDAC and can
be installed separately if required).  Mac OS X since 10.2 has shipped
with \code{iODBC} (\url{http://www.iodbc.org}, this is also available
for other Unix-alikes) but from 10.9 the headers are no longer
included in the OS X SDK.  At present the CRAN OS X binary package is
compiled against iODBC.
For other systems the driver manager of choice is likely to be
\code{unixODBC}, part of almost all Linux distributions and with
sources downloadable from \url{http://www.unixODBC.org}.  In Linux
binary distributions it is likely that package \code{unixODBC-devel}
or \code{unixodbc-dev} or some such will be needed.
Both \code{unixODBC} and \code{iODBC} can be installed from the
sources under OS X 10.9: the latter requires configuring with
\code{--disable-gui}.  They need ODBC drivers compiled for the driver
manager in use.
In most cases the package's \code{configure} script will find the
driver manager files, and the package will install with no extra
settings.  However, if further information is required, use
\code{--with-odbc-include} and \code{--with-odbc-lib} or environment
variables \code{ODBC\_INCLUDE} and \code{ODBC\_LIBS} to set the include
and library paths as needed.  A specific ODBC driver manager can be
specified by the \code{--with-odbc-manager} \code{configure} option,
with likely values \code{odbc} or \code{iodbc}: if this is done for
\code{odbc} and the program \code{odbc\_config} is found, it is used to
set the libpath as a last resort (it is often wrong), and to add any
additional \code{CFLAGS}.
\subsection*{Sources of drivers}
Keeping track of ODBC drivers is a never-ending task, and this section
is no longer actively maintained.  URIs are liable to move or disappear.
A list of drivers for \code{unixODBC} is maintained\footnote{that the
  author works for Easysoft is conspicuous.} at
\url{http://www.unixodbc.org/drivers.html}.  \code{unixODBC} ships
with a number of drivers (although in most cases the DBMS vendor's
driver is preferred)---these include for MySQL, PostgreSQL, Mimer and
flat files.
MySQL provides drivers under the name `Connector/ODBC' (formerly
MyODBC') in source form, and binaries for all common 32-bit and most
64-bit R platforms.  These are said to work also with MariaDB.
PostgreSQL has an associated project at
\url{http://pgfoundry.org/projects/psqlodbc/} and another project for
at \url{http://pgfoundry.org/projects/odbcng/}.  (Documentation for
\code{psqlodbc} is currently hard to find, but there is some in the
PostgreSQL 7.2 manual at
\url{http://www.postgresql.org/docs/7.2/static/odbc.html} from before
it was unbundled.)  There are drivers for Unix-alikes and Windows --
64-bit Windows support is available as from PostgreSQL 9.0.
An SQLite ODBC driver for Unix-alikes, including Mac OS X, and (32-
and 64-bit) Windows is available from
\url{http://www.ch-werner.de/sqliteodbc/}.
Oracle provides ODBC drivers as a supplement to its `Instant Client'
for some of its platforms (including 32/64-bit Windows and Linux but
not currently Mac OS X).  See
\url{http://www.oracle.com/technetwork/database/features/instant-client/index-097480.html}
%\url{http://www.oracle.com/technology/software/tech/oci/instantclient/}.
One quirk of the Windows drivers is that the Oracle binaries must be
in the path, so \code{PATH} should include e.g.{}
\verb|c:\Oracle\bin|.
For IBM's DB2, search its site for drivers for `ODBC and CLI'. There
are some notes about using this under Linux at
\url{http://www.unixodbc.org/doc/db2.html}.
% http://developer.mimer.se/odbc/
Mimer (\url{www.mimer.com}) is a cross-platform DBMS with integral
ODBC support, so
\begin{quote}
  `The Mimer SQL setup process automatically installs an ODBC driver
  when the Mimer SQL client is installed on any Windows or UNIX platform.'
\end{quote}
The `HowTos' at \url{http://developer.mimer.se/howto/index.tml}
provide some useful hints.
Some details of the 32-bit Microsoft `ODBC Desktop Database Drivers'
(for Access, Excel, Paradox, dBase and text files on Windows) can be
found at
\url{http://msdn.microsoft.com/en-us/library/ms709326%28VS.85%29.aspx}.
There is also a Visual FoxPro driver and an (outdated) Oracle driver.
32-bit Windows drivers for Access 2007 and Excel 2007 are bundled with
Office 2007 but can be installed separately \emph{via} the installer
\code{AccessDatabaseEngine.exe} available from
%\url{http://www.microsoft.com/downloads/details.aspx?FamilyID=7554f536-8c28-4598-9b72-ef94e038c891&DisplayLang=en}.
\url{http://www.microsoft.com/en-us/download/details.aspx?id=23734}.
The Access/Excel 2010 versions at
%\url{http://www.microsoft.com/downloads/details.aspx?familyid=C06B8369-60DD-4B64-A44B-84B371EDE16D&displaylang=en}
\url{http://www.microsoft.com/en-us/download/details.aspx?id=13255}
have a 64-bit version: however the 64-bit drivers cannot be installed
alongside 32-bit versions of Office (as far as we know, and definitely
not for Office 2007).
For recent versions of Mac OS X, low-cost and easy-to-use \code{iODBC}
drivers are available from
\url{http://www.actualtech.com/products.php}: these cover
MySQL/PostgreSQL/SQLite (one driver), SQL Server/Sybase, Oracle, and a
read-only driver for Access and related formats (including Access 2007
and Excel, but not Excel 2007).  That SQLite driver needs
\code{believeNRows = FALSE} set.  Currently at least, installing those
drivers on 10.9 installs \code{iODBC}.
Mac OS X drivers for the MySQL, PostgreSQL and the major commercial
databases are available from \url{http://uda.openlinksw.com/}.
\subsection*{Specifying ODBC drivers}
The next step is to specify the ODBC drivers to be used for specific
DBMSs.  On Windows installing the drivers will register them
automatically.  This might happen as part of the installation on other
systems, but usually does not.
Both \code{unixODBC} and \code{iODBC} store information on drivers in
configuration files, normally system-wide in \code{/etc/odbcinst.ini}
and per-user in \verb|~/.odbcinst.ini|.  However, the system location
can vary, and on systems with \code{unixODBC} can be found by at the
Unix command line by one of
\begin{example}
\$  odbcinst -j
\$  odbc\_config --odbcinstini
\end{example}
For \code{iODBC} use \code{iodbc\_config}: on Mac OS X the system
location used by Apple was \code{/Library/ODBC/odbcinst.ini}.
\begin{figure}
\begin{example}
\$ cat /etc/odbcinst.ini
[MySQL]
Description    = ODBC 3.51.26 for MySQL
Driver         = /usr/lib64/libmyodbc3.so
FileUsage      = 1
[MySQL ODBC 5.1 Driver]
Description     = ODBC 5.1.05 for MySQL
Driver          = /usr/lib64/libmyodbc5.so
UsageCount      = 1
[PostgreSQL]
Description     = ODBC for PostgreSQL
Driver          = /usr/lib64/psqlodbc.so
FileUsage       = 1
[sqlite3]
Description = sqliteodbc
Driver = /usr/local/lib64/libsqlite3odbc.so
Setup = /usr/local/lib64/libsqlite3odbc.so
FileUsage = 1
\end{example}
\caption{A system ODBC driver file from a \code{x86\_64} Fedora 10 Linux
  system using \code{unixODBC}.}
\label{fig:odbc1}
\end{figure}
The format can be seen from figure~\ref{fig:odbc1}.  (\code{unixODBC}
allows \code{Driver64} here to allow for different paths on 32-bit and
64-bit platforms sharing a file system.)  The MySQL and PostgreSQL
drivers were installed from the Fedora RPMs
\code{mysql-connector-odbc} and \code{postgresql-odbc}, and also from
the \code{mysql-connector-odbc} RPM in the MySQL distribution (which
inserted the entry in the driver file).
The MySQL manual gives detailed information (including screenshots) of
installing its drivers and setting up DSNs that may also be
informative to users of other DBMSs.
\clearpage
\section{Specifying DSNs}
The ODBC driver managers have `User DSNs' and `System DSNs': these
differ only in where the information is stored, the first on a
per-user basis and the second for all users of the system.
Windows has a GUI\footnote{Extra care is needed on a 64-bit version of
  Windows, as this GUI shows only 64-bit settings for ODBC, including
  drivers and DSNs.  If you are running 32-bit R (and hence 32-bit
  ODBC) on 64-bit Windows, you need the 32-bit version of the GUI at
  something like
  \texttt{c:\textbackslash{}Windows\textbackslash{}SysWOW64\textbackslash{}odbcad32.exe}
  -- and beware that both 32- and 64-bit versions are called
  \texttt{odbcad32.exe}.} to set up DSNs, called something like `Data
Sources (ODBC)' under `Administrative Tools' in the Control Panel. You
can add, remove and edit (`configure') DSNs there (see figure~2).
When adding a DSN, first select the ODBC driver and then complete the
driver-specific dialog box.  There will usually be an option to test
the DSN and it is wise to do so.
\begin{figure}
\centerline{\includegraphics[width=12cm]{winDSN1}}
\bigbreak
\centerline{\includegraphics[width=10cm]{winDSN2}}
\caption{(Top) The main Data Sources (ODBC) dialog box from a Windows XP system.
(Bottom) The dialog box to select a driver that comes up when the
\textsf{Add} button is clicked.}
\label{fig:1}
\end{figure}
If \texttt{Rgui} is to be used on Windows, incomplete DSNs can be
created and the dialog box will be brought up for completion when
\code{odbcConnect} is called---this can be helpful to avoid storing
passwords in the Windows Registry or to allow alternate users or
databases.  On that platform, calling \code{odbcDriverConnect()} with
no arguments will bring up the main ODBC Data Sources dialog box to
allow a DSN to be constructed on the fly.
Mac OS X prior to 10.6 came with a very similar GUI (figure~3) found
at \textsf{Applications / Utilities / ODBC Administrator}.
This is now available as a download from
\url{http://support.apple.com/kb/DL895}.  Another GUI for OS X is
available from \url{http://www.odbcmanager.net/index.php}.
\begin{figure}
\centerline{\includegraphics[width=12cm]{macODBC}}
\bigbreak
\centerline{\includegraphics[width=12cm]{macAccess}}
\caption{(Top) The main ODBC Administrator dialog box from a 
  Mac OS X 10.5 system.
(Bottom) A page of the  dialog box to specify a DSN for the Actual
Technologies Access/Excel driver.}
\label{fig:1b}
\end{figure}
% Both \code{unixODBC} and \code{iODBC} provide GUIs (which might be
% packaged separately in binary distributions) to create DSNs, and
% \code{iODBC} also has a web-grounded DSN administrator.  UnixODBC's GUI
% is currently called \texttt{ODBCConfig} (see figure~4), and there is a
% KDE control widget called \texttt{DataManager} to manage both ODBC
% drivers and DSNs.  See the \code{unixODBC} user manual at
% \url{http://www.unixodbc.org/doc/UserManual/}. (On Fedora these are in
% the \code{unixODBC-kde} RPM.  It has been announced that they will
% become separate projects after \code{unixODBC 2.2.14}.)
\begin{figure}
\centerline{\includegraphics[width=12cm]{linuxDSN}}
\bigbreak
\centerline{\includegraphics[width=12cm]{linuxDSNsqlite}}
\caption{The dialog box of \code{ODBCconfig} on Fedora 10 Linux, and
  the \textsf{Configure} screen for the SQLite driver.}
\label{fig:2}
\end{figure}
On Unix-alikes DSNs can also be specified in files (and the graphical
tools just manipulate these files).  The system-wide file is usually
\code{/etc/odbc.ini} and the per-user
file\footnote{\texttt{\textasciitilde/Library/ODBC/odbc.ini} on Mac OS
  X.} \verb|~/.odbc.ini|.  Some examples of the format are shown
figure~\ref{fig:odbc2}.
\begin{figure}
\begin{example}
[test_mysql]
Description     = test MySQL
Driver          = MySQL
Trace           = No
Server          = localhost
Port            = 3306
Database        = test
[test_mysql5]
Description     = myodbc5
Driver          = MySQL ODBC 5.1 Driver
Server          = gannet
Port            = 3306
Database        = ripley
[test_pg]
Description     = test PostgreSQL
Driver          = PostgreSQL
Trace           = No
TraceFile       =
ServerName      = localhost
UserName        = ripley
Port            = 5432
Socket          =
Database        = testdb
ReadOnly        = 0
[test_sqlite3]
Description     = test SQLite3
Driver          = sqlite3
Database        = /tmp/mysqlite3.db
\end{example}
\caption{A personal (\texttt{\textasciitilde/.odbc.ini}) file from a
  Fedora 10 Linux system using \code{unixODBC}.}
\label{fig:odbc2}
\end{figure}
What fields are supported is driver-specific (and it can be hard to
find documentation). There is no clear distinction between fields that
specify the driver and those which specify the DSN, so any parts of
the driver specification which might differ between connections can be
used in the DSN file.
Things that are often set here are if the connection is read-only
(\code{test\_pg} is \emph{not} readonly) and the character encoding to
be used.
Command-line programs \code{isql} (\code{unixODBC}) and \code{iodbctest}
(\code{iODBC}) can be used to test a DSN that has been created manually in a
file.  The formats are
\begin{example}
\$ isql -v \var{dsn} \var{db_username} \var{db_password}
\$ iodbctest
\end{example}
Both give a command-line SQL interface: use \code{quit} to terminate.
\begin{figure}
\includegraphics[width=6cm]{Access}
\quad
\includegraphics[width=6cm]{MySQL}
\bigbreak
\includegraphics[width=6cm]{Oracle}
\quad
\includegraphics[width=6cm]{SQLserver}
\caption{Parts of the ODBC driver configuration screens on Windows XP
  for Microsoft Access, MySQL Connector/ODBC 5.1, Oracle's ODBC driver
  and Microsoft SQL Server.}
\label{fig:3}
\end{figure}
\clearpage
\section{Internals}
The appendix is in part an \emph{aide memoire} for the maintainer, but
may interest the curious user.
RODBC connection objects are an integer with several attributes: they
are numbered consecutively in the current session.  For example
\begin{smallexample}
> channel <- odbcConnect("test")
> unclass(channel)
[1] 1
attr(,"connection.string")
[1] "DATABASE=ripley;DESCRIPTION=myodbc;DSN=test;OPTION=0;PORT=3306;SERVER=localhost;"
attr(,"handle\_ptr")
<pointer: 0x233e6c0>
attr(,"case")
[1] "nochange"
attr(,"id")
[1] 11371
attr(,"believeNRows")
[1] TRUE
attr(,"colQuote")
[1] "`"
attr(,"tabQuote")
[1] "`"
attr(,"encoding")
[1] ""
attr(,"rows\_at\_time")
[1] 100
attr(,"isMySQL")
[1] FALSE
\end{smallexample}
Most of the attributes record the arguments of
\code{odbcDriverConnect}.  The \code{"connection.string"} attribute is
as returned by \code{SQLDriverConnect} and list driver-specific
parameters separated (and perhaps terminated) by a semicolon.  The
\code{"id"} attribute is a random integer used for integrity checks
(and in particular to reject connection objects should they be saved
and restored in a different session).  The \code{"isMySQL"} attribute
is used both to select the default quote character and the
interpretation of \code{qualifier.table} names.
The main structure of the connection is kept as a C \code{struct}, a
pointer to which is passed around as the R external pointer
\code{"handle\_ptr"}.  This has a finalizer that will close the
connection when there is no longer an R object referring to it
(including at the end of the R session), with a warning unless the
connection has already been closed by \code{close} or
\code{odbcClose}.  In addition, a C-level table keeps the pointers of
the first 1000 connections of an R session, to enable
\code{odbcCloseAll} to close them.
The \code{struct} is currently defined as
\begin{smallexample}
typedef struct rodbcHandle \{
    SQLHDBC     hDbc;         /* connection handle */
    SQLHSTMT    hStmt;        /* statement handle */
    SQLLEN      nRows;        /* number of rows and columns in result set */
    SQLSMALLINT nColumns;
    int         channel;      /* as stored on the R-level object */
    int         id;           /* ditto */
    int         useNRows;     /* value of believeNRows */
    /* entries used to bind data for result sets and updates */
    COLUMNS     *ColData;
    int         nAllocated;
    SQLUINTEGER	rowsFetched;  /* use to indicate the number of rows fetched */
    SQLUINTEGER	rowArraySize; /* use to indicate the number of rows we expect back */
    SQLUINTEGER	rowsUsed;     /* for when we fetch more than we need */
    SQLMSG      *msglist;     /* root of linked list of messages */
    SEXP        extPtr;       /* the external pointer address */
\} RODBCHandle, *pRODBCHandle;
\end{smallexample}
Most ODBC operations work by sending a query, explicitly or implicitly
via e.g.{} \code{sqlColumns}, and this creates a \emph{result set}
which is transferred to an R data frame by \code{sqlGetResults}.
\code{nRows} and \code{nCols} indicate the size of the pending result
set, with \code{nCols = -1} used if there are no pending results.
ODBC works with various \emph{handles}.  There is a \code{SQLHENV}
handle for the environment that \code{RODBC} opens when a connection
is first opened or DSNs are listed---its main use is to request ODBC~3
semantics.  Then each connection has a \code{SQLHDBC} handle, and each
query (statement) a \code{SQLHSTMT} handle.  Argument
\code{literal=TRUE} of \code{sqlTables} and \code{sqlColumns} is used
to set the \verb}SQL_ATTR_METADATA_ID} attribute of the statement
handle to be true.
All the functions\footnote{ \code{odbcQuery}, \code{sqlColumns},
  \code{sqlPrimaryKeys}, \code{sqlTables} and \code{sqlTypeInfo}.}
that create a result set call C function \code{cachenbind}.  This
allocates buffers under the \code{colData} pointer and binds the
result set to them by \code{SQLBindCol}.  Then when
\code{sqlGetResults} calls the C function \code{SQLFetch} or
\code{SQLFetchScroll} the results for one or more (up to
\code{MAX\_ROWS\_FETCH = 1024}) rows are loaded into the buffers and
then copied into R vectors.
Prior to \pkg{RODBC} 1.3-0 the default was to fetch a row at a time,
but it is now to fetch up to 100 rows at a time.  Entries
\code{rowsArraySize} and \code{rowsFetched} are used to indicate how
many rows were requested and how many were available.  Since e.g.{}
\code{sqlFetch} allows a maximum number of rows to be returned in the
data frame, \code{rowsUsed} indicates how many of the rows last
fetched have so far been returned to R.
The buffers are part of the \code{ColData} entry, which is an array of
\code{COLUMNS} structures, one of each column in the result set. These
have the form
\begin{smallexample}
typedef struct cols \{
    SQLCHAR     ColName[256];
    SQLSMALLINT NameLength;
    SQLSMALLINT DataType;
    SQLULEN     ColSize;
    SQLSMALLINT DecimalDigits;
    SQLSMALLINT Nullable;
    char        *pData;
    int         datalen;
    SQLDOUBLE   RData [MAX\_ROWS\_FETCH];
    SQLREAL     R4Data[MAX\_ROWS\_FETCH];
    SQLINTEGER  IData [MAX\_ROWS\_FETCH];
    SQLSMALLINT I2Data[MAX\_ROWS\_FETCH];
    SQLLEN      IndPtr[MAX\_ROWS\_FETCH];
\} COLUMNS;
\end{smallexample}
The first six entries are returned by a call to \code{SQLDescribeCol}:
\code{DataType} is used to select the buffer to use.  There are
separate buffers for double-precision, single-precision, 32-bit and
16-bit integer and character/byte data.  When character/data buffers
are allocated, \code{datalen} records the length allocated per row
(which is based on the value returned as \code{ColSize}).  The
\code{IndPtr} value is used to record the actual size of the item
in the current row for variable length character and binary types, and
for all nullable types the special value \code{SQL\_NULL\_DATA} (-1)
indicates an SQL null value.
The other main C-level operation is to send data to the ODBC driver
for \code{sqlSave} and \code{sqlUpdate}.  These use \code{INSERT INTO}
and \code{UPDATE} queries respectively, and for \code{fast = TRUE} use
parametrized queries.  So we have the queries (split across lines for
display)
\begin{smallexample}
> sqlSave(channel, USArrests, rownames = "State", addPK = TRUE, verbose = TRUE)
Query: CREATE TABLE "USArrests"
  ("State" varchar(255) NOT NULL PRIMARY KEY, "Murder" double, "Assault" integer,
  "UrbanPop" integer, "Rape" double)
Query: INSERT INTO "USArrests"
  ( "State", "Murder", "Assault", "UrbanPop", "Rape" ) VALUES ( ?,?,?,?,? )
Binding: 'State' DataType 12, ColSize 255
Binding: 'Murder' DataType 8, ColSize 15
Binding: 'Assault' DataType 4, ColSize 10
Binding: 'UrbanPop' DataType 4, ColSize 10
Binding: 'Rape' DataType 8, ColSize 15
Parameters:
...
> sqlUpdate(channel, foo, "USArrests", verbose=TRUE)
Query: UPDATE "USArrests" SET "Assault"=? WHERE "State"=?
Binding: 'Assault' DataType 4, ColSize 10
Binding: 'State' DataType 12, ColSize 255
Parameters:
...
\end{smallexample}
At C level, this works by calling \code{SQLPrepare} to record the
insert/update query on the statement handle, then calling
\code{SQLBindParameter} to bind a buffer for each column with values
to be sent, and finally in a loop over rows copying the data into the
buffer and calling \code{SQLExecute} on the statement handle.
The same buffer structure is used as when retrieving result sets.
The difference is that the arguments which were ouptuts from
\code{SQLBindCol} and inputs to \code{SQLBindParameter}, so we need to
use \code{sqlColumns} to retrieve the column characteristics of the
table and pass these down to the C interface.
\end{document}
 
     |