File: RODBC.Rnw

package info (click to toggle)
rodbc 1.3-10-1
  • links: PTS
  • area: main
  • in suites: jessie, jessie-kfreebsd
  • size: 2,032 kB
  • ctags: 103
  • sloc: ansic: 1,199; makefile: 3; sh: 1
file content (1667 lines) | stat: -rw-r--r-- 73,863 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
\documentclass[a4paper,11pt]{article}
\usepackage{hyperref,color,url,alltt,parskip}
%\usepackage{Sweave}
\usepackage{Rd}  % mainly for upright quotes.
\usepackage{graphicx}
%\newcommand{\code}[1]{\texttt{#1}}
%\newcommand{\pkg}[1]{\texttt{#1}}
%\newcommand{\var}[1]{\emph{#1}}
%\newcommand{\link}[1]{#1}
%\newcommand{\sQuote}[1]{`#1'}
\newenvironment{smallexample}{\begin{alltt}\scriptsize}{\end{alltt}}
\newenvironment{example}{\begin{alltt}\small}{\end{alltt}}

\setcounter{topnumber}{2}
\def\topfraction{1.0}
\setcounter{bottomnumber}{1}
\def\bottomfraction{.3}
\setcounter{totalnumber}{3}
\def\textfraction{0}
\def\floatpagefraction{0.8}
\setcounter{dbltopnumber}{2}

\usepackage{color}
\definecolor{Blue}{rgb}{0,0,0.8}
\hypersetup{%
colorlinks,%
plainpages=true,%
linkcolor=black,%
citecolor=black,%
urlcolor=Blue,%
pdfstartview=FitH,%
pdfview={XYZ null null null},%
%pdfpagemode=UseNone,%
pdftitle={ODBC Connectivity for R},%
pdfauthor={B. D. Ripley}%
}

\begin{document}

%\VignetteIndexEntry{ODBC Connectivity}
%\VignetteDepends{RODBC}
%\VignetteKeywords{databases}
%\VignetteKeywords{ODBC}
%\VignettePackage{RODBC}

%\SweaveOpts{engine=R,eps=FALSE}

\title{ODBC Connectivity}
\author{by Brian Ripley\\
  Department of Statistics, University of Oxford\\
\url{ripley@stats.ox.ac.uk}
}
\maketitle
\sloppy

\noindent
Package \pkg{RODBC} implements ODBC database connectivity.  It was
originally written by Michael Lapsley (St George's Medical School,
University of London) in the early days of R (1999), but after he
disappeared in 2002, it was rescued and since much extended by Brian
Ripley.  Version 1.0-1 was released in January 2003, and \pkg{RODBC}
is nowadays a mature and much-used platform for interfacing R to
database systems.

Thanks to Marc Schwartz for contributing some of the experiences
here.  See also the archives of the \code{R-sig-db} mailing list.

\section{ODBC Concepts}

ODBC aims to provide a common API for access to SQL\footnote{SQL is a
  language for querying and managing data in databases---see
  \url{http://en.wikipedia.org/wiki/SQL}.}-based database management
systems (DBMSs) such as MySQL\footnote{and its fork, MariaDB},
PostgreSQL, Microsoft Access and SQL Server, DB2, Oracle and SQLite.
It originated on Windows in the early 1990s, but ODBC \emph{driver
  managers} \code{unixODBC} and \code{iODBC} are nowadays available on
a wide range of platforms (and a version of \code{iODBC} shipped with
versions of Mac OS X prior to 10.9).  The connection to the particular
DBMS needs an \emph{ODBC driver}: these may come with the DBMS or the
ODBC driver manager or be provided separately by the DBMS developers,
and there are third-party\footnote{but there are close links between
  \code{unixODBC} and Easysoft, and \code{iODBC} and OpenLink.}
developers such as Actual Technologies, Easysoft and OpenLink. (This
means that for some DBMSs there are several different ODBC drivers
available, and they can behave differently.)

Microsoft provides drivers on Windows for non-SQL database systems
such as DBase and FoxPro, and even for flat files and Excel
spreadsheets. Actual Technologies sell a driver for Mac OS X that
covers (some) Excel spreadsheets and flat files.

A connection to a specific database is called a \emph{Data Source
  Name} or DSN (see
\url{http://en.wikipedia.org/wiki/Database_Source_Name}). See
Appendix~B for how to set up DSNs on your system.  One of the greatest
advantages of ODBC is that it is a cross-platform client-server
design, so it is common to run R on a personal computer and access
data on a remote server whose OS may not even be known to the end
user.  This does rely on suitable ODBC drivers being available on the
client: they are for the major cross-platform DBMSs, and some vendors
provide `bridge' drivers, so that for example a `bridge' ODBC driver
is run on a Linux client and talks to the Access ODBC driver on a
remote Windows machine.

ODBC provides an abstraction that papers over many of the differences
between DBMSs.  That abstraction has developed over the years, and
\pkg{RODBC} works with ODBC version 3.  This number describes both the
API (most drivers nowadays work with API 3.51 or 3.52) and
capabilities.  The latter allow ODBC drivers to implement newer
features partially or not at all, so some drivers are much more
capable than others: in the main \pkg{RODBC} works with basic
features.  ODBC is a superset of the ISO/IEC 9075-3:1995 SQL/CLI
standard.

A somewhat biased overview of ODBC on Unix-alikes can be found at
\url{http://www.easysoft.com/developer/interfaces/odbc/linux.html}.

\section{Basic Usage}

Two groups of functions are provided in \pkg{RODBC}.  The mainly
internal \code{odbc*} commands implement low-level access to C-level
ODBC functions with similar\footnote{in most cases with prefix
  \code{SQL} replacing \code{odbc}.} names.  The \code{sql*} functions
operate at a higher level to read, save, copy and manipulate data
between data frames and SQL tables.  The two low-level functions which
are commonly used make or break a connection.

\subsection{Making a connection}

ODBC works by setting up a \emph{connection} or \emph{channel} from
the client (here \pkg{RODBC}) to the DBMSs as specified in the DSN.
Such connections are normally used throughout a session, but should be
closed explicitly at the end of the session---however \pkg{RODBC} will
clear up after you if you forget (with a warning that might not be
seen in a GUI environment).  There can be many simultaneous
connections.

The simplest way to make a connection is
\begin{example}
library(RODBC)
ch <- odbcConnect("\var{some\_dsn}")
\end{example}
and when you are done with it,
\begin{example}
close(ch)
# or if you prefer
odbcClose(ch)
\end{example}
The connection object \code{ch} is how you specify one of potentially
many open connections, and is the first argument to all other
\pkg{RODBC} functions.  If you forget the details, printing it will
give some summary information.

If the DBMS user and password are needed and not stored in the DSN,
they can be supplied by e.g.{}
\begin{example}
ch <- odbcConnect("\var{some\_dsn}", uid = "\var{user}", pwd = "\var{****}")
\end{example}
Users of the R GUI under Windows\footnote{This does not work from
  \code{Rterm.exe}.} have another possibility: if an incompletely
specified DSN is given, the driver-specific Data Source dialog box
will pop up to allow it to be completed.
% Something about security here?

More flexibility is available \emph{via} function
\code{odbcDriverConnect}, which works with a \emph{connection string}.
At its simplest it is
\begin{example}
"DSN=\var{dsn};UID=\var{uid};PWD=\var{pwd}"
\end{example}
but it can be constructed without a DSN by specifying a driver
directly \emph{via} \code{DRIVER=}, and more (in some cases many more)
driver-specific parameters can be given.  See the documentation for
the driver (and Appendix~A) for more details.

\subsection{Reading from a database}

where `database' can be interpreted very widely, including for example
Excel spreadsheets and directories of flat files.

The simplest and most common use of \pkg{RODBC} is to extract data
from databases held on central database servers.  Such access is
read-only, and this can be enforced by settings in the DSN or
\emph{via} permission settings (also known as \emph{privileges}) on
the database.

To find out what tables are accessible from a connection \code{ch}, use
\begin{example}
sqlTables(ch)
\end{example}
Some drivers will return all visible table-like objects, not just
those owned by you.  In that case you may want to restrict the scope
by e.g.{}
\begin{example}
sqlTables(ch, tableType = "TABLE")
sqlTables(ch, schema = "\var{some\_pattern}")
sqlTables(ch, tableName = "\var{some\_pattern}")
\end{example}
The details are driver-specific but in most cases
\code{\var{some\_pattern}} can use wildcards\footnote{these are the
  SQL wildcards used for example in \code{LIKE} clauses.} with
\emph{underscore} matching a single character and \emph{percent}
matching zero or more characters.  Since underscore is a valid
character in a table name it can be handled literally by preceding it
by a backslash---but it is rarely necessary to do so.

A table can be retrieved as a data frame by
\begin{example}
res <- sqlFetch(ch, "\var{table\_name}")
\end{example}
If it has many rows it can be retrieved in sections by
\begin{example}
res <- sqlFetch(ch, "\var{table\_name}", max = \var{m})
res <- sqlFetchMore(ch, "\var{table\_name}", max = \var{m})
\dots
\end{example}

It is often necessary to reduce the data to be transferred: we have
seen how to subset rows, but it can be more effective to restrict the
columns or to return only rows meeting some conditions.  To find out
what columns are available, use \code{sqlColumns}, for example
\begin{smallexample}
> sqlColumns(ch, "USArrests")
   TABLE\_CAT TABLE\_SCHEM TABLE\_NAME COLUMN\_NAME DATA\_TYPE TYPE\_NAME COLUMN\_SIZE
1    ripley        <NA>  USArrests       State        12  varchar         255
2    ripley        <NA>  USArrests      Murder         8   double          15
3    ripley        <NA>  USArrests     Assault         4  integer          10
4    ripley        <NA>  USArrests    UrbanPop         4  integer          10
5    ripley        <NA>  USArrests        Rape         8   double          15
\dots
\end{smallexample}
Then an \emph{SQL Query} can be used to return part of the table, for
example (MySQL on Linux)
\begin{example}
> sqlQuery(sh, paste("SELECT State, Murder FROM USArrests",
+                    "WHERE Rape > 30 ORDER BY Murder"))
       State Murder
1   Colorado    7.9
2    Arizona    8.1
3 California    9.0
4     Alaska   10.0
5 New Mexico   11.4
6   Michigan   12.1
7     Nevada   12.2
8    Florida   15.4
\end{example}
Note that although there are standards for SQL, all the major
producers of DBMSs have their own dialects, so for example on the
Oracle and DB2 systems we tested this query had to be given as
\begin{example}
> sqlQuery(ch, paste('SELECT "State", "Murder" FROM "USArrests"',
+                    'WHERE "Rape" > 30 ORDER BY "Murder"'))
\end{example}
or even in upper case.  Describing how to extract data from databases
is the \emph{forte} of the SQL language, and doing so efficiently is
the aim of many of the DBMSs, so this is a very powerful tool.  To
learn SQL it is best to find a tutorial specific to the dialect you
will use; for example Chapter~3 of the MySQL manual is a tutorial.  A
basic tutorial which covers some common dialects\footnote{MySQL,
  Oracle and SQL Server.}  can be found at
\url{http://www.1keydata.com/sql/sql.html}: tutorials on how to
perform common tasks in several commonly used DBMSs are available at
\url{http://sqlzoo.net/}.

% <<>>=
% library(RODBC)
% channel <- odbcConnect("test")
% sqlSave(channel, USArrests, rownames = "State", verbose = TRUE)
% sqlQuery(channel, paste("select State, Murder from USArrests",
%                         "where Rape > 30 order by Murder"))
% sqlFetch(channel, "USArrests", rownames = "State")
% sqlDrop(channel, "USArrests")
% close(channel)
% @

\subsection{Table Names}

SQL-92 expects both table and column names to be alphanumeric plus
underscore, and \pkg{RODBC} does not in general support vendor
extensions (for example Access allows spaces).  There are some
system-specific quoting schemes: Access and Excel allow table names to
be enclosed in \code{[ ]} in SQL queries, MySQL (by default) quotes
\emph{via} backticks, and most other systems use the ANSI SQL standard
of double quotes.

%More recent SQL standards allow \code{\$} and \code{\#} under some
%circumstances.

The \code{odbcConnnect} function allows the specification of the
quoting rules for names \pkg{RODBC} itself sends, but sensible
defaults\footnote{backticks for MySQL, \code{[ ]} for the Access and
  Excel convenience wrappers, otherwise ANSI double quotes.} are
selected.  Users do need to be aware of the quoting issue when
writing queries for \code{sqlQuery} themselves.

Note the underscore is a wildcard character in table names for some of
the functions, and so may need to be escaped (by backslash) at times.

Normally table names containing a period are interpreted as references
to another schema (see below): this can be suppressed by opening the
connection with argument \code{interpretDot = FALSE}.


\subsection{Types of  table}

The details are somewhat DBMS-specific, but `tables' usually means
`tables, views or similar objects'.

In some systems `tables' are physical objects (files) that actually
store data---Mimer calls these \emph{base tables}.  For these other
`tables' can be derived that present information to the user, usually
called `views'.  The principal distinctions between a (base) table and
a view are
\begin{itemize}
  \item Using \code{DROP} on a table removes the data, whereas using
    it on a view merely removes the convenient access to a
    representation of the data.
  \item The access permission (\emph{privilege}) of a view can be very
    different from those of a table: this is commonly used to hide
    sensitive information.
\end{itemize}
A view can contain a subset of the information available in a single
table or combine information from two or more tables.

Further, some DBMSs distinguish between tables and views generated by
ordinary users and \emph{system tables} used by the DBMS itself.
Where present, this distinction is reflected in the result of
\code{sqlTable()} calls.

Some DBMSs support \emph{synonyms} and/or \emph{aliases} which are
simply alternative names for an existing table/view/synonym, often
those in other schemas (see below).

Typically tables, views, synonyms and aliases share a name space and
so must have a name that is unique (in the enclosing schema where
schemas are implemented).


\section{Writing to a Database}

To create or update a table in a database some more details need to be
considered.  For some systems, all table and column names need to be
lower case (e.g.{} PostgreSQL, MySQL on Windows) or upper case (e.g.{}
some versions of Oracle).  To make this a little easier, the
\code{odbcConnect} function allows a remapping of table names to be
specified, and this happens by default for DBMSs where remapping is
known to be needed.

The main tool to create a table is \code{sqlSave}.  It is safest to
use this after having removed any existing table of the same name,
which can be done by
\begin{example}
sqlDrop(ch, "\var{table\_name}", errors = FALSE)
\end{example}
Then in the simplest usage
\begin{example}
sqlSave(ch, \var{some\_data\_frame})
\end{example}
creates a new table whose name is the name of the data frame (remapped
to upper or lower case as needed) and with first column
\code{rownames} the row names of the data frame, and remaining columns
the columns of the data frame (with names remapped as necessary).
For the many options, see the help page.

\code{sqlSave} works well when asked to write integer, numeric and
reasonable-length\footnote{which of course depends on the DBMS.
  Almost all have an implementation of \code{varchar} that allows up
  to 255 bytes or characters, and some have much larger limits.
  Calling \code{sqlTypeInfo} will tell you about the data type
  limits.} character strings to the database.  It needs some help with
other types of columns in mapping to the DBMS-specific types of
column.  For some drivers it can do a good job with date and date-time
columns; in others it needs some hints (and e.g.{} for Oracle dates
are stored as date-times).  The files in the \code{RODBC/tests}
directory in the sources and the installed file \code{tests.R} provide
some examples.  One of the options is the \code{fast} argument: the
default is \code{fast = TRUE} which transfers data in binary format: the
alternative is \code{fast = FALSE} which transfer data as character
strings a row at a time---this is slower but can work better with some
drivers (and worse with others).

The other main tool for writing is \code{sqlUpdate} which is used to
change rows in an existing table.  Note that \pkg{RODBC} only does
this in a simple fashion, and on up-market DBMSs it may be better to
set cursors and use direct SQL queries, or at least to control
transactions by calls to \code{odbcSetAutoCommit} and
\code{odbcEndTran}.  The basic operation of \code{sqlUpdate} is to
take a data frame with the same column names (up to remapping) as some
or all of the columns of an existing table: the values in the data
frame are then used either to replace entries or to create new rows in
the table.

Rows in a DBMS table are in principle unordered and so cannot be
referred to by number: the sometimes tricky question is to know what
rows are to replaced.  We can help the process by giving one or more
\code{index} columns whose values must match: for a data frame the row
names are often a good choice.  If no \code{index} argument is
supplied, a suitable set of columns is chosen based on the properties
of the table.

\subsection{Primary keys and indices}

When a table is created (or afterwards) it can be given additional
information to enable it to be used effectively or efficiently.

\emph{Primary keys} are one (usually) or more columns that provide a
reliable way to reference rows in the table: values of the primary key
must be unique and not \code{NULL} (SQL parlance for `missing').
Primary keys in one table are also used as \emph{foreign keys} in
another table: this ensure that e.g.{} values of \code{customer\_id}
only take values which are included in the primary key column of that
name in table \code{customers}.  Support of foreign keys is patchy:
some DBMSs (e.g,{} MySQL prior to 6.0) accept specifications but
ignore them.

\pkg{RODBC} allows primary keys to be set as part of the
\code{sqlSave()} function when it creates a table: otherwise they can
be set by \code{sqlQuery()} in DBMS-specific ways (usually by
\code{ALTER TABLE}).

Columns in a table can be declared as \code{UNIQUE}: primary keys and
such columns are usually used as the basis for table indices, but
other indices (sometimes called \emph{secondary indices}) can be
declared by a \code{CREATE INDEX} SQL command.  Whether adding primary
keys or other indices has any effect on performance depends on the
DBMS and the query.

\section{Data types}

This can be confusing: R has data types (including \code{character},
\code{double}, \code{integer} and various classes including
\code{Date} and \code{POSIXct}), ODBC has both C and SQL data types,
the SQL standards have data types and so do the various DBMSs
\emph{and they all have different names} and different usages of the
same names.

Double- and single-precision numeric values and 32- and 16-bit
integers (only) are transferred as binary values, and all other types
as character strings.  However, unless \code{as.is = TRUE},
\code{sqlGetResults} (used by all the higher-level functions to return
a data frame) converts character data to an date/date-time class or
\emph{via} \code{type.convert}.

You can find out the DBMS names for the data types used in the columns
of a table by a call to \code{sqlColumns}, and further information is
given on those types in the result of \code{sqlTypeInfo}.  For example
in MySQL,
\begin{smallexample}
  TABLE_CAT TABLE_SCHEM TABLE_NAME COLUMN_NAME DATA_TYPE TYPE_NAME COLUMN_SIZE
1    ripley        <NA>  USArrests       State        12   varchar         255
2    ripley        <NA>  USArrests      Murder         8    double          15
3    ripley        <NA>  USArrests     Assault         4   integer          10
4    ripley        <NA>  USArrests    UrbanPop         4   integer          10
5    ripley        <NA>  USArrests        Rape         8    double          15
  BUFFER_LENGTH DECIMAL_DIGITS NUM_PREC_RADIX NULLABLE REMARKS COLUMN_DEF
1           255             NA             NA        0                 ''
2             8             NA             NA        1               <NA>
3             4              0             10        1               <NA>
4             4              0             10        1               <NA>
5             8             NA             NA        1               <NA>
  SQL_DATA_TYPE SQL_DATETIME_SUB CHAR_OCTET_LENGTH ORDINAL_POSITION IS_NULLABLE
1            12               NA               255                1          NO
2             8               NA                NA                2         YES
3             4               NA                NA                3         YES
4             4               NA                NA                4         YES
5             8               NA                NA                5         YES
\end{smallexample}
This gives the DBMS data by name and by number (twice, once the number
used in the DBMS and once that used by SQL---they agree here).  Other
things of interest here are the column size, which gives the maximum
size of the character representation, and the two columns about
`nullable' which indicate if the column is allowed to contain missing
values (SQL \code{NULL}s).

The result of \code{sqlTypeInfo} has 19 columns and in the version of
MySQL used here, 52 types.  We show a small subset of the more common types:
\begin{smallexample}
> sqlTypeInfo(channel)[<...>, c(1:3,7,16)]
     TYPE_NAME DATA_TYPE COLUMN_SIZE NULLABLE SQL_DATATYPE
1         bit        -7           1        1           -7
2     tinyint        -6           3        1           -6
6      bigint        -5          19        1           -5
18       text        -1       65535        1           -1
19 mediumtext        -1    16777215        1           -1
20   longtext        -1  2147483647        1           -1
22       char         1         255        1            1
23    numeric         2          19        1            2
24    decimal         3          19        1            3
25    integer         4          10        1            4
37   smallint         5           5        1            5
41     double         6          15        1            6
43      float         7           7        1            7
45     double         8          15        1            8
47       date        91          10        1            9
48       time        92           8        1            9
49       year         5           4        1            5
50   datetime        93          21        1            9
51  timestamp        93          14        0            9
52    varchar        12         255        1           12
\end{smallexample}
Note that there are both duplicate type names and duplicate type numbers.

Most DBMSs started with their own data types and later mapped the
standard SQL data types on to them, although these may only be
partially implemented.  Some DBMSs allow user-defined data types, for
example enumerations.

Commonly used data types fall into a number of groups:
\begin{description}
\item[Character types] Character types can be classified three ways:
  fixed or variable length, by the maximum size and by the character
  set used.  The most commonly used types\footnote{the SQL names for
  these are \code{CHARACTER VARYING} and \code{CHARACTER}, but these
  are too cumbersome for routine use.} are \code{varchar} for short
  strings of variable length (up to some maximum) and \code{char} for
  short strings of fixed length (usually right-padded with spaces).
  The value of `short' differs by DBMS and is at least 254, often a
  few thousand---often other types will be available for longer
  character strings.  There is a sanity check which will allow only
  strings of up to 65535 bytes when reading: this can be removed by
  recompiling \pkg{RODBC}.

  Many other DBMSs have separate types to hold Unicode character
  strings, often with names like \code{nvarchar} or \code{wvarchar}.
  Note that currently \pkg{RODBC} only uses the current locale for
  character data, which could be UTF-8 (and will be on Mac OS X and in
  many cases on Linux and other Unix-alikes), but is never UCS-2 as
  used on Windows.  So if character data is stored in the database in
  Unicode, it will be translated (with a possible loss of information)
  in non-Unicode locales.  (This may change in future versions of
  \pkg{RODBC}.)

  Some DBMSs such as PostgreSQL and SQL Server allow variable-length
  character strings of length only limited by resources.  These do not
  fit well with the ODBC model that requires buffers to be allocated
  to transfer character data, and so such types may be subjected (by
  the ODBC driver) to a fixed limit or not work at all.

\item[Integer types] Most DBMSs have types for 32-bit (\code{integer},
  synomyn \code{int}) and 16-bit (\code{smallint}) integers.  Some,
  including MySQL, also have unsigned versions and 1-bit, 8-bit and
  64-bit integer types: these further types would usually be
  transferred as character strings and converted on reading to an
  \code{integer} or \code{double} vector.

  Type names \code{int2}, \code{int4} and \code{int8} are common as
  synonyms for the basic type names.

  The SQL standard does not require \code{integer} and \code{smallint}
  to be binary (rather than decimal) types, but they almost always are
  binary.

  Note that 64-bit integers will be transferred as character strings
  and read by \code{sqlGetResults} as character vectors or (for
  $2^{31} \le |x| < 2^{53}$) as \code{double} vectors.

\item[Floating-point types] The basic SQL floating-point types are 8
  and 7 for double- and single-precision binary types.  The SQL names
  are \code{double precision} and \code{real}, but beware of the
  variety of names.  Type~6 is \code{float} in the standard, but is
  used by some DBMSs\footnote{In Oracle the \code{FLOAT} type is a
    decimal and not a binary type.} for single-precision and by some
  for double-precision: the forms \code{float(24)} and
  \code{float(53)} are also commonly supported.

  You should not assume that these types can store \code{Inf},
  \code{-Inf} or \code{NaN}, but they often can.

\item[Other numeric types] It is common to store decimal quantities in
  databases (e.g.{} currency amounts) and types 2 and 3 are for
  decimals. Some DBMSs have specialized types to handle currencies,
  e.g.{} \code{money} in SQL Server.

  Decimal types have a \emph{precision} (the maximum number of
  significant decimal digits) and \emph{scale} (the position of the
  decimal point).  \code{numeric} and \code{decimal} are usually
  synonymous, but the distinction in the standards is that for
  \code{numeric} the precision is exact whereas for \code{decimal} the
  DBMS can use a larger value than that specified.

  % e.g. Mimer
  Some DBMSs have a type \code{integer(\var{p})} to represent up to
  \code{\var{p}} decimal digits, and this may or may not be distinct
  from \code{decimal(\var{p}, 0)}.

  % 'currently' is from 5.1.x
  DBMSs do not necessarily fully implement decimal types, e.g.{} MySQL
  currently stores them in binary and used to store them as character
  strings.

\item[Dates and times] The handling of dates and times is very much
  specific to the DBMS.  Some allow fractional seconds in date-times,
  and some do not; some store timezones with date-times or always use
  UTC and some do not, and so on.  Usually there are also types for
  time intervals.

  All such types are transferred as character strings in \pkg{RODBC}.

\item[Binary types]
  These are less common, and unsupported by \pkg{RODBC} prior to
  version 1.3-0.  They parallel character types in that they are a
  sequence of bytes of fixed or variable length, sometimes with
  additional types for long sequences: there are separate ODBC types
  for \code{SQL\_BINARY}, \code{SQL\_VARBINARY} and
  \code{SQL\_LONGVARBINARY}.

  Binary types can currently only be read as such, and they are
  returned as column of class \code{"ODBC\_binary"} which is a list of
  raw vectors.
\end{description}

% An example is BOOLEAN in Mimer
It is possible (but rare) for the DBMS to support data types that the
ODBC driver cannot handle.


\subsection{Data types when saving a data frame}

When \code{sqlSave} creates a table, there is some choice as to the
SQL data types used.

The default is to select the SQL data type from the R type via the
\code{typeInfo} argument to \code{sqlSave}.  If this is not supplied
(usual) a default mapping is looked up using \code{getSqlTypeInfo()}
or by interrogating \code{\link{sqlTypeInfo()}}.  This will almost
always produce the correct mapping for numeric, integer and character
columns of up to 254 characters (or bytes).  In other cases (include
dates and date-times) the desired SQL type can be specified for each
column \emph{via} the argument \code{varTypes}, a named character
vector with names corresponding to (some of) the names in the data
frame to be saved.

Only a very few DBMSs have a logical data type and the default mapping
is to store R logical vectors as \code{varchar(5)}.  For
others DBMSs \code{BIT}, \code{TINYINT} or an enumeration type could
be used (but the column may be need to be converted to and from a
suitable representation).  For example, in MySQL we could use
\code{enum('FALSE', 'TRUE')}, but this is actually stored as
\code{char(5)}.  Note that to represent \code{NA} the SQL data type
chosen needs to be nullable, which \code{BIT} often is not.
(Mimer has a nullable data type \code{BOOLEAN} but this is not
supported by the ODBC client.)

\subsection{SQLite}

SQLite's concept of `data type' is anomalous: version~3 does recognize
types of data (in version~2 everything was a character string), but it
does not have a fixed type for a column in a table (although the type
specified in the \code{CREATE TABLE} statement is a `recommended' type
for the values of that column).  Every value is categorized as null,
integer (of length 1, 2, 3, 4, 6 or 8 bytes), double, text (UTF-8 or
UTF-16) or BLOB (a sequence of bytes).  This does not fit well with
the ODBC interface which pre-determines a type for each column before
reading or writing it: the `SQLite ODBC' driver falls back to a
\code{SQL\_VARCHAR} or \code{SQL\_LONGVARCHAR} type if the column type
is not available.


\subsection{ODBC data types}

ODBC defines two sets of data types: \emph{SQL data types} and \emph{C
  data types}. SQL data types indicate the data types of data stored
at the data source using standard names. C data types indicate the
data types used in the compiled code in the application (here
\pkg{RODBC}) when transferring data and are the same for all drivers.

The ODBC SQL data types are abstractions of the data types discussed
above with names like \code{SQL\_INTEGER}.  They include
\code{SQL\_LONGVARCHAR} for large character types and
\code{SQL\_WVARCHAR} for Unicode character types.  It is usually these
types that are returned (by number) in the \code{SQL\_DATA\_TYPE}
column of the result of \code{sqlColumns} and \code{SQL\_DATATYPE}
column of the result of \code{sqlTypeInfo}.  The mapping from names to
numbers is given in table~1.

\begin{table}[tbp]
\qquad{\small\tt \begin{tabular}{lrlr}
SQL\_CHAR&1&SQL\_LONGVARCHAR&-1\\
SQL\_NUMERIC&2&SQL\_BINARY&-2\\
SQL\_DECIMAL&3&SQL\_VARBINARY&-3\\
SQL\_INTEGER&4&SQL\_LONGVARBINARY&-4\\
SQL\_SMALLINT&5&SQL\_BIGINT&-5\\
SQL\_FLOAT&6&SQL\_TINYINT&-6\\
SQL\_REAL&7&SQL\_BIT&-7\\
SQL\_DOUBLE&8&SQL\_WCHAR&-8\\
SQL\_DATETIME&9&SQL\_WVARCHAR&-9\\
SQL\_INTERVAL&10&SQL\_WLONGVARCHAR&-10\\
SQL\_TIMESTAMP&11&SQL\_GUID&-11\\
SQL\_VARCHAR&12\\
SQL\_TYPE\_DATE&91\\
SQL\_TYPE\_TIME&92\\
SQL\_TYPE\_TIMESTAMP&93
\end{tabular}}
\caption{Mapping between ODBC SQL data type names and numbers.  (GUIDs
  are 16-byte numbers, Microsoft's implementation of UUIDs.)}
\end{table}

The only ODBC C data types currently used by \pkg{RODBC} are
\code{SQL\_C\_DOUBLE}, \code{SQL\_C\_SLONG} (32-bit signed integers)
and \code{SQL\_C\_CHAR} for reading and writing, and
\code{SQL\_C\_FLOAT} (single-precision), \code{SQL\_C\_SSHORT} (16-bit
signed integers) and \code{SQL\_C\_BINARY} for reading from the
database.

\url{http://msdn.microsoft.com/en-us/library/ms713607%28VS.85%29.aspx}
is the defintiive source of information about ODBC data types.


\section{Schemas and Catalogs}

This is a more technical section: few users will need to deal with
these concepts.

\sQuote{Schemas}\footnote{which is the usual plural in this technial
  usage, athough \emph{schemata} is more usual in English.} are
collections of objects (such as tables and views) within a database
that are supported by some DBMSs: often a separate schema is
associated with each user (and \sQuote{schema} in ODBC~3 replaced
\sQuote{owner} in ODBC~2).  In SQL-92, schemas are collected in a
\sQuote{catalog} which is often implemented as a database.  Where
schemas are implemented, there is a \emph{current schema} used to find
unqualified table names, and tables in other schemas can be referred
to within SQL queries using the \code{\var{schema}.\var{table}}
notation.  You can think of a schema as analogous to a name space; it
allows related objects to be grouped together without worrying about
name clashes with other groups.  (Some DBMSs will search for
unqualified table names in a search path: see the detailed
descriptions below.)

Note that \sQuote{schema} is used in another sense in the database
literature, for the design of a database and in particular of tables,
views and privileges.

Here are some details of various DBMSs' interpretations of
\code{catalog} and \code{schema} current at the time of writing (mid
2009).  (These descriptions are simplistic, and in some cases
experimental observations.)

\begin{itemize}
  \item SQLite uses dotted names for alternative databases that
    are attached by an \code{ATTACH DATABASE} command.\footnote{and
      may be subsequently detached by a \code{DETACH DATABASE} command}
    There is a search path of databases, so it is only necessary to
    use the dotted name notation when there are tables of the same
    name on attached databases.  The initial database is known as
    \code{main} and that used for temporary tables as \code{temp}.

  \item MySQL uses \code{catalog} to refer to a database.  In MySQL's
    parlance, `schema' is a little-used synonym for \sQuote{database}.

  \item PostgreSQL only allows a session to access one database, and
    does not use `catalog' except to refer to the current database.
    Version 7.3 introduced schemas---users can create their own
    schemas with a \code{CREATE SCHEMA} query.  Tables are by default
    in the \code{public} schema, and unqualified table names are
    searched for along a \sQuote{search path} of schemas (by
    default, containing \code{public}).

  \item Oracle uses schemas as synonymous with \sQuote{owner}
    (also known as \sQuote{user}).  There is no way for a user to
    create additional schemas (that is not what \code{CREATE SCHEMA}
    does in Oracle).

  \item IBM DB2 uses schemas as name spaces for objects that may lie on
    different databases: using \emph{aliases} allows objects to be in
    more than one schema.  The initial current schema is named the
    same as the user (\code{SQLID} in DB2 parlance), but users can
    create additional schemas with \code{CREATE SCHEMA} statements.

  \item Microsoft SQL Server 2008 uses both \code{catalog} and
    \code{schema}, \code{catalog} for the database and \code{schema}
    for the type of object, e.g. \code{"sys"} for most of the system
    tables/views and (default) \code{"dbo"} for user tables.  Further
    schemas can be created by users.  The default schema for a user
    can be set when the user is created and changed \emph{via}
    \code{ALTER USER}.

    Prior to SQL Server 2005, `schema' meant `user', and the search
    path for unqualified names was the database user then \code{"dbo"}.

  \item The Microsoft Excel and Access ODBC drivers do not use
    schemas, but do use \code{catalog} to refer to other
    database/spreadsheet files.

  \item Mimer (\url{www.mimer.com}) uses schemas which are normally
    the same as users (which it calls \emph{IDENT}s), but users can
    create additional schemas with \code{CREATE SCHEMA} statements.
    There are also system schemas.  Mimer uses `schemata' as the
    plural of schema.
\end{itemize}

It is often possible to use \code{sqlTables} to list the available
catalogs or schemas: see its help page for the driver-specific
details.

\pkg{RODBC} usually works with tables in the current schema, but unless
the connection was opened with \code{interpretDot = FALSE} most
functions will attempt to interpret the `dotted name' notation.
The interpretation depends on the DBMS: the SQL-92 meaning is
\code{\var{schema}.\var{table}} and this is accepted by PostgreSQL,
Microsoft SQL Server, Oracle, DB2 and Mimer.  However, MySQL uses
\code{\var{database}.\var{table}}, and the functions
try\footnote{currerntly this is stymied by bugs in the ODBC driver,
  so \code{SQLColumns} is unable to report on tables in specified
  databases.}  that interpretation if they recognize a MySQL driver.
Some DBMSs allow more than two components, but these are not currently
supported by the \pkg{RODBC} functions.

Functions \code{sqlTables}, \code{sqlColumns} and
\code{sqlPrimaryKeys} have arguments \code{catalog} and \code{schema}
which in principle allow tables in other schemas to be listed or
examined: however these are only partially implemented in many current
ODBC drivers.  See the help page for \code{sqlTables} for some further
details.

For other uses, the trick is to select the schema(s) you want to use,
which is done \emph{via} an SQL statement sent by \code{sqlQuery}.
For Oracle you can set the default schema (owner) by
\begin{example}
\code{ALTER SESSION SET CURRENT\_SCHEMA = \var{schema}}
\end{example}
% http://sqlzoo.net/howto/source/z.dir/tip988922/oracle
whereas for PostgreSQL the search path can be changed \emph{via}
\begin{example}
\code{SET search\_path TO \var{schema1},\var{schema2}}.
\end{example}
% see also https://stat.ethz.ch/pipermail/r-help/2008-May/161304.html
In DB2, creating an alias in the current schema can be used to access
tables in other schemas, and a \code{CURRENT SCHEMA} query can be used
to change the current schema.  In MySQL and SQL Server a database can
be selected by a \code{USE \var{database}} query.

\section{Internationalization Issues}

Internationalization issues are made more complex by ODBC being a
client-server system, and the ODBC client (\pkg{RODBC}) and the server
may be running on different machines with different OSes on different
continents. So the client may need some help.

In most cases numeric data are transferred to and from R in binary
form, so the representation of the decimal point is not an issue.  But
in some cases it could be (e.g.{} decimal rather than binary SQL data types
will be transferred as character strings) and then the decimal point
to be used will be taken from \code{options("dec")}: if unset this is
set when \pkg{RODBC} is loaded from the setting of the current locale
on the machine running R (\emph{via} \code{Sys.localeconv}).
Some ODBC drivers (e.g.{} for SQL Server, Oracle) allow the locale (`NLS')
to be used for numeric values to be selected for the connection.

The other internationalization issue is the character encoding used.
When R and the DBMS are running on the same machine this is unlikely
to be an issue, and in many cases the ODBC driver has some options to
translate character sets.  SQL is an ANSI (US) standard, and DBMSs
tended to assume that character data was ASCII or perhaps 8-bit.  More
recently DBMSs have started to (optionally or by default) to store
data in Unicode, which unfortunately means UCS-2 on Windows and UTF-8
elsewhere.  So cross-OS solutions are not guaranteed to work, but most
do.

Encoding issues are best resolved in the ODBC driver or in DBMS
settings.  In the unusual case that this cannot be done, the
\code{DBMSencoding} argument to \code{odbcDriverConnect} allows for
recoding when sending data to or from the ODBC driver and thence the DBMS.

\section{Excel Drivers}

The Microsoft Excel ODBC drivers (Windows only) have a number of
peculiarities which mean that it should be used with care.

It seems that its concept of a `table' is principally a \emph{named
  range}. It treats worksheets as system tables, and appends a dollar
to their name (making then non-standard SQL table names: the quoting
convention used is to enclose such names in square brackets).

Column names are taken as the first row of the named range/worksheet.
Non-standard SQL names are allowed here too, but the driver maps
\code{.} to \code{\#} in column names.  Annoyingly, \code{sqlTables}
is allowed to select named ranges only by \code{tableType = "TABLE"}
but not to select only worksheets.

There are at least two known problems with reading columns that do not
have a format set \emph{before} data entry, and so start with format
\sQuote{General}.  First, the driver uses the first few rows to
determined the column type, and is over-fond of declaring
\sQuote{Numeric} even when there are non-numeric entries.  The default
number of rows consulted is 8, but attempts to change this in the DSN
setup are ignored.  Second, if a column is declared as \sQuote{Text},
numeric entries will be read as SQL nulls and hence R \code{NA}s.
Unfortunately, in neither case does reformatting the column help.
% http://support.microsoft.com/kb/141284

The connection is by default read-only.  It is possible to de-select
this in the DSN (and the convenience wrapper \code{odbcConnectExcel}
has a \code{readOnly = FALSE} argument to do so), but this does not
support deletion, including SQL \code{DROP}, \code{DELETE},
\code{UPDATE} and \code{ALTER} statements).  In particular,
\code{\link{sqlDrop}} will remove the data in a worksheet but not the
worksheet itself.
% http://publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp?topic=/com.ibm.db2.ii.doc/opt/tlsodb13.htm
% http://www.stata.com/support/faqs/data/odbc_excel.html
The driver does allow a worksheet to be updated by \code{sqlUpdate},
and for a new worksheet (with a different name from existing
worksheets) to be created by \code{sqlSave} (which also creates a
named range).

As far as we know, no similar issues affect the Actual Technologies
Mac OS X Excel driver: however, it allows only read-only access to
Excel files and does not support Excel 2007-and-later \code{.xlsx} files.


\section{DBMS-specific tidbits}

This section covers some useful DBMS-specific SQL commands and other
useful details.

Recent versions of several DBMSs have a schema
\code{INFORMATION\_SCHEMA} that holds many predefined system views.
These include MySQL (the name of a database, mainly populated
beginning with MySQL~5.1), SQL Server and Mimer.

\subsection*{MySQL}

Comments about MySQL are mostly applicable to its forks such as MariaDB.

We have already mentioned \code{USE \var{database}} as the way to change
the database in use.  \code{SHOW DATABASES} lists the databases `for
which you have some kind of privilege', and can have a \code{LIKE}
clause to restrict the result to some pattern of database names.

% MySQL 5.1 manual 12.3
The \code{DESCRIBE \var{table}} command is a compact way to get a
description of a table or view, similar to the most useful parts of
the result of a call to \code{sqlColumns}.  (It is also known as
\code{SHOW COLUMNS FROM \var{table}}.)

\code{SHOW TABLES} is the command to produce a table of the
tables/views on the current database, similar to \code{sqlTables}.
For example,
\begin{example}
> sqlQuery(channel, "USE ripley")
[1] "No Data"
> sqlQuery(channel, "SHOW TABLES")
  Tables_in_ripley
1        USArrests
> sqlQuery(channel, "DESCRIBE USArrests")
     Field         Type Null Key Default Extra
1    State varchar(255)   NO PRI      NA    NA
2   Murder       double  YES          NA    NA
3  Assault      int(11)  YES          NA    NA
4 UrbanPop      int(11)  YES          NA    NA
5     Rape       double  YES          NA    NA
\end{example}

\noindent
\code{SHOW FULL TABLES} gives an additional additional column
\code{Table\_type}, the types of the tables/views.

There is useful information for end users in the
\code{INFORMATION\_SCHEMA} \emph{database}, much more extensively as
from MySQL~5.1.

Some of the non-standard behaviour can be turned off, e.g.{} starting
MySQL with \code{--sql-mode=ANSI} gives closer conformance to the
standard, and this can be set for a single session by
\begin{example}
SET SESSION sql\_mode='ANSI'
\end{example}
To change just the behaviour of quotes (to use double quotes in place
of backticks) replace \code{ANSI} by \code{ANSI\_QUOTE}.

% manual section 10.4
The maximum size of a \code{char} column is 255 characters.  That of a
\code{varchar} column is up to 65535 characters (but there is a limit of
65535 bytes on the total size of a row), and those with a maximum of
255 or less are stored more efficiently.  Types \code{text},
\code{mediumtext} and \code{longtext} can hold more, and are not
subject to the row-size limit (\code{text} has default maximum size
65535, the default \pkg{RODBC} limit on transfers).

There are \code{binary}, \code{varbinary} and \code{blob} types which
are very similar to their character counterparts but with lengths in bytes.

\subsection*{PostgreSQL}

Table \code{pg\_tables} lists all tables in all schemas; you probably
want to filter on \code{tableowner='\var{current\_user}'}, e.g.{}
\begin{smallexample}
> sqlQuery(channel, "select * from pg_tables where tableowner='ripley'")
  schemaname tablename tableowner tablespace hasindexes hasrules hastriggers
1     public     dtest     ripley         NA          0        0           0
\end{smallexample}

There are both ANSI and Unicode versions of the ODBC driver on Windows:
they provide many customizations. One of these is read-only access,
another is if system tables are reported by \code{sqlTables}.

% manual chapter 8
The default size of a \code{varchar} column is unlimited, but those
with maximum length of 126 bytes or less are stored more efficiently.
However, the ODBC interface has limits, which can be set in the
configuration options.  These include the maximum sizes for
\code{varchar} (default 254) and \code{longvarchar} (default 8190),
and how to handle unknown column sizes (default as the maximum), and
whether `Text' is taken as \code{varchar} or \code{longvarchar} (which
affects the reported maximum size for a \code{varchar} column).

There is a single binary data type, \code{bytea}.


\subsection*{SQLite}

These comments are only about SQLite~3.x.

Table \code{sqlite\_master} lists tables and indices, and the
\code{sql} column gives the SQL command used.  E.g.{}
\begin{example}
> tmp <- sqlQuery(channel, "select * from sqlite_master")
> tmp[, "sql"] <- substr(tmp[, "sql"], 1, 16)
> tmp
   type                         name  tbl_name rootpage              sql
1 table                    USArrests USArrests        2 CREATE TABLE "US
2 index sqlite_autoindex_USArrests_1 USArrests        4             <NA>
\end{example}

My current versions of Christian Werner's SQLite~ODBC driver store
character data in the current locale's charset (e.g.{} UTF-8) on
Unix-alikes and by default in Unicode (UCS-2) on Windows (unless
de-selected in the DSN configuration).

The default collation for text data is byte-by-byte comparisons, so
avoid comparing non-ASCII character data in SQLite.

Actual Technologies sell an SQLite driver for Mac OS X which requires
\code{believeNRows = FALSE} and has a number of other issues including
that it seems not to support dropping tables.  (Christian Werner's
SQLite~ODBC driver was easy to install from the sources and worked
correctly.)

Version of the SQLite~ODBC driver since  0.87 have segfaulted on
the test suite.

\subsection*{Oracle}

Tables \code{cat}, \code{user\_table} and \code{user\_catalog} contain
useful information on tables.  Information on columns is in
\code{all\_tab\_columns}, e.g.{}
\begin{example}
> sqlQuery(channel,
           "select * from all\_tab\_columns where table_name='USArrests'")
   OWNER TABLE_NAME COLUMN_NAME DATA_TYPE DATA_TYPE_MOD
1 RIPLEY  USArrests       State  VARCHAR2            NA
2 RIPLEY  USArrests      Murder     FLOAT            NA
3 RIPLEY  USArrests     Assault    NUMBER            NA
4 RIPLEY  USArrests    UrbanPop    NUMBER            NA
5 RIPLEY  USArrests        Rape     FLOAT            NA
...
\end{example}

The Windows ODBC driver we tested had an option for a read-only connection.

% SQL Language Reference Manual chapter 2
Oracle's character data types are \code{CHAR}, \code{VARCHAR2}
(character set specified when the database was created) and
\code{NCHAR}, \code{NVARCHAR2} (Unicode), as well as \code{CLOB} and
\code{NCLOB} for large character strings.  For the non-Unicode types
the units of length are either bytes or charactor (set as a default
for the database) but can be overriden by adding a \code{BYTE} or
\code{CHAR} qualifier.  The limits are 4000 bytes apart from for
\code{CLOB} and \code{NCLOB}, which have very high limits.

There are \code{RAW} and \code{BLOB} data types.

\subsection*{DB2}
%http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp

Schema \code{syscat} contains many views with information about
tables: for example view \code{syscat.tables} lists all tables, and
\begin{example}
> sqlQuery(channel,
           "select * from syscat.columns where tabname='USArrests'")
  TABSCHEMA   TABNAME  COLNAME COLNO TYPESCHEMA TYPENAME LENGTH SCALE
1  RIPLEY   USArrests    State     0   SYSIBM    VARCHAR    255     0
2  RIPLEY   USArrests   Murder     1   SYSIBM     DOUBLE      8     0
3  RIPLEY   USArrests  Assault     2   SYSIBM    INTEGER      4     0
4  RIPLEY   USArrests UrbanPop     3   SYSIBM    INTEGER      4     0
5  RIPLEY   USArrests     Rape     4   SYSIBM     DOUBLE      8     0
...
\end{example}

%http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=/com.ibm.db2.luw.sql.ref.doc/doc/r0008483.html
The \code{CHAR} type can have size up to 254 bytes: the maximum size
of the \code{VARCHAR} type is 32762 bytes.  For larger character
strings there is the \code{CLOB} type (up to 2Gb).  These types can be
used to store data in a MBCS, including various Unicode encodings.

There are corresponding \code{BINARY}, \code{VARBINARY} and
\code{BLOB} data types.

\subsection*{SQL Server}

There are several hundred views in schemas \code{INFORMATION\_SCHEMA}
and \code{sys} which will be listed by \code{sqlTables} and also by
the stored procedure \code{sp\_tables}.  Another way to list tables is
\begin{example}
SELECT * FROM sysobjects WHERE xtype='U'
\end{example}
where the condition restricts to user tables.

\code{USE \var{database}} changes the database in use.

% http://msdn.microsoft.com/en-us/library/ms187752.aspx
Types \code{char} and \code{varchar} have a maximum specified size of
8000 bytes.  It is possible to use \code{varchar(max)} (previously
known as \code{text}) for a limit of 2Gb, but this may not work well
with the ODBC interface.  The Unicode types \code{nchar} and
\code{nvarchar} have a maximum specified size of 4000 characters:
again there is \code{nvarchar(max)} (formerly \code{ntext}).

There are corresponding \code{binary} and \code{varbinary} data types
(with \code{image} as an earlier name for \code{varbinary(max)}).

\subsection*{Mimer}

There are tens of views in schema \code{INFORMATION\_SCHEMA} which can
be read by SQL \code{SELECT} queries of the form
\begin{example}
SELECT \var{column-list}
FROM INFORMATION_SCHEMA.\var{view-name}
WHERE \var{condition}
\end{example}
See the Mimer SQL Reference Manual chapter on Data Dictionary views
for full details: two views are \code{TABLES} and \code{VIEWS}.

A session can be set to be read-only by the SQL command \code{SET
  SESSION READ ONLY}.

Mimer uses Latin-1 for its default character types but Unicode types
(\code{NCHAR} and \code{NVARCHAR}) are also available.  Unsurprisingly
given that the company is Swedish, different collations are allowed
for both Latin-1 and Unicode character types.

% reference manual chapter 6
The \code{char} and \code{varchar} columns have a maximum size of
15000 bytes: the \code{clob} data type is available for larger
character columns.  The \code{nchar} and \code{nvarchar} columns have
a maximum size of 5000 characters: the \code{nclob} data type is
available for larger Unicode columns.

There are corresponding \code{binary}, \code{varbinary} and
\code{blob} binary data types.

\newpage
\appendix
\section{Installation}

\pkg{RODBC} is simple to install, and binary distributions are
available for Mac OS X and Windows from CRAN.

To install from the sources, an \emph{ODBC Driver Manager} is
required.  Windows normally comes with one (it is part of MDAC and can
be installed separately if required).  Mac OS X since 10.2 has shipped
with \code{iODBC} (\url{http://www.iodbc.org}, this is also available
for other Unix-alikes) but from 10.9 the headers are no longer
included in the OS X SDK.  At present the CRAN OS X binary package is
compiled against iODBC.

For other systems the driver manager of choice is likely to be
\code{unixODBC}, part of almost all Linux distributions and with
sources downloadable from \url{http://www.unixODBC.org}.  In Linux
binary distributions it is likely that package \code{unixODBC-devel}
or \code{unixodbc-dev} or some such will be needed.

Both \code{unixODBC} and \code{iODBC} can be installed from the
sources under OS X 10.9: the latter requires configuring with
\code{--disable-gui}.  They need ODBC drivers compiled for the driver
manager in use.

In most cases the package's \code{configure} script will find the
driver manager files, and the package will install with no extra
settings.  However, if further information is required, use
\code{--with-odbc-include} and \code{--with-odbc-lib} or environment
variables \code{ODBC\_INCLUDE} and \code{ODBC\_LIBS} to set the include
and library paths as needed.  A specific ODBC driver manager can be
specified by the \code{--with-odbc-manager} \code{configure} option,
with likely values \code{odbc} or \code{iodbc}: if this is done for
\code{odbc} and the program \code{odbc\_config} is found, it is used to
set the libpath as a last resort (it is often wrong), and to add any
additional \code{CFLAGS}.


\subsection*{Sources of drivers}

Keeping track of ODBC drivers is a never-ending task, and this section
is no longer actively maintained.  URIs are liable to move or disappear.

A list of drivers for \code{unixODBC} is maintained\footnote{that the
  author works for Easysoft is conspicuous.} at
\url{http://www.unixodbc.org/drivers.html}.  \code{unixODBC} ships
with a number of drivers (although in most cases the DBMS vendor's
driver is preferred)---these include for MySQL, PostgreSQL, Mimer and
flat files.

MySQL provides drivers under the name `Connector/ODBC' (formerly
MyODBC') in source form, and binaries for all common 32-bit and most
64-bit R platforms.  These are said to work also with MariaDB.

PostgreSQL has an associated project at
\url{http://pgfoundry.org/projects/psqlodbc/} and another project for
at \url{http://pgfoundry.org/projects/odbcng/}.  (Documentation for
\code{psqlodbc} is currently hard to find, but there is some in the
PostgreSQL 7.2 manual at
\url{http://www.postgresql.org/docs/7.2/static/odbc.html} from before
it was unbundled.)  There are drivers for Unix-alikes and Windows --
64-bit Windows support is available as from PostgreSQL 9.0.

An SQLite ODBC driver for Unix-alikes, including Mac OS X, and (32-
and 64-bit) Windows is available from
\url{http://www.ch-werner.de/sqliteodbc/}.

Oracle provides ODBC drivers as a supplement to its `Instant Client'
for some of its platforms (including 32/64-bit Windows and Linux but
not currently Mac OS X).  See
\url{http://www.oracle.com/technetwork/database/features/instant-client/index-097480.html}
%\url{http://www.oracle.com/technology/software/tech/oci/instantclient/}.
One quirk of the Windows drivers is that the Oracle binaries must be
in the path, so \code{PATH} should include e.g.{}
\verb|c:\Oracle\bin|.

For IBM's DB2, search its site for drivers for `ODBC and CLI'. There
are some notes about using this under Linux at
\url{http://www.unixodbc.org/doc/db2.html}.

% http://developer.mimer.se/odbc/
Mimer (\url{www.mimer.com}) is a cross-platform DBMS with integral
ODBC support, so
\begin{quote}
  `The Mimer SQL setup process automatically installs an ODBC driver
  when the Mimer SQL client is installed on any Windows or UNIX platform.'
\end{quote}
The `HowTos' at \url{http://developer.mimer.se/howto/index.tml}
provide some useful hints.

Some details of the 32-bit Microsoft `ODBC Desktop Database Drivers'
(for Access, Excel, Paradox, dBase and text files on Windows) can be
found at
\url{http://msdn.microsoft.com/en-us/library/ms709326%28VS.85%29.aspx}.
There is also a Visual FoxPro driver and an (outdated) Oracle driver.

32-bit Windows drivers for Access 2007 and Excel 2007 are bundled with
Office 2007 but can be installed separately \emph{via} the installer
\code{AccessDatabaseEngine.exe} available from
%\url{http://www.microsoft.com/downloads/details.aspx?FamilyID=7554f536-8c28-4598-9b72-ef94e038c891&DisplayLang=en}.
\url{http://www.microsoft.com/en-us/download/details.aspx?id=23734}.

The Access/Excel 2010 versions at
%\url{http://www.microsoft.com/downloads/details.aspx?familyid=C06B8369-60DD-4B64-A44B-84B371EDE16D&displaylang=en}
\url{http://www.microsoft.com/en-us/download/details.aspx?id=13255}
have a 64-bit version: however the 64-bit drivers cannot be installed
alongside 32-bit versions of Office (as far as we know, and definitely
not for Office 2007).

For recent versions of Mac OS X, low-cost and easy-to-use \code{iODBC}
drivers are available from
\url{http://www.actualtech.com/products.php}: these cover
MySQL/PostgreSQL/SQLite (one driver), SQL Server/Sybase, Oracle, and a
read-only driver for Access and related formats (including Access 2007
and Excel, but not Excel 2007).  That SQLite driver needs
\code{believeNRows = FALSE} set.  Currently at least, installing those
drivers on 10.9 installs \code{iODBC}.

Mac OS X drivers for the MySQL, PostgreSQL and the major commercial
databases are available from \url{http://uda.openlinksw.com/}.


\subsection*{Specifying ODBC drivers}

The next step is to specify the ODBC drivers to be used for specific
DBMSs.  On Windows installing the drivers will register them
automatically.  This might happen as part of the installation on other
systems, but usually does not.

Both \code{unixODBC} and \code{iODBC} store information on drivers in
configuration files, normally system-wide in \code{/etc/odbcinst.ini}
and per-user in \verb|~/.odbcinst.ini|.  However, the system location
can vary, and on systems with \code{unixODBC} can be found by at the
Unix command line by one of
\begin{example}
\$  odbcinst -j
\$  odbc\_config --odbcinstini
\end{example}
For \code{iODBC} use \code{iodbc\_config}: on Mac OS X the system
location used by Apple was \code{/Library/ODBC/odbcinst.ini}.

\begin{figure}
\begin{example}
\$ cat /etc/odbcinst.ini
[MySQL]
Description    = ODBC 3.51.26 for MySQL
Driver         = /usr/lib64/libmyodbc3.so
FileUsage      = 1

[MySQL ODBC 5.1 Driver]
Description     = ODBC 5.1.05 for MySQL
Driver          = /usr/lib64/libmyodbc5.so
UsageCount      = 1

[PostgreSQL]
Description     = ODBC for PostgreSQL
Driver          = /usr/lib64/psqlodbc.so
FileUsage       = 1

[sqlite3]
Description = sqliteodbc
Driver = /usr/local/lib64/libsqlite3odbc.so
Setup = /usr/local/lib64/libsqlite3odbc.so
FileUsage = 1
\end{example}
\caption{A system ODBC driver file from a \code{x86\_64} Fedora 10 Linux
  system using \code{unixODBC}.}
\label{fig:odbc1}
\end{figure}
The format can be seen from figure~\ref{fig:odbc1}.  (\code{unixODBC}
allows \code{Driver64} here to allow for different paths on 32-bit and
64-bit platforms sharing a file system.)  The MySQL and PostgreSQL
drivers were installed from the Fedora RPMs
\code{mysql-connector-odbc} and \code{postgresql-odbc}, and also from
the \code{mysql-connector-odbc} RPM in the MySQL distribution (which
inserted the entry in the driver file).

The MySQL manual gives detailed information (including screenshots) of
installing its drivers and setting up DSNs that may also be
informative to users of other DBMSs.

\clearpage
\section{Specifying DSNs}

The ODBC driver managers have `User DSNs' and `System DSNs': these
differ only in where the information is stored, the first on a
per-user basis and the second for all users of the system.

Windows has a GUI\footnote{Extra care is needed on a 64-bit version of
  Windows, as this GUI shows only 64-bit settings for ODBC, including
  drivers and DSNs.  If you are running 32-bit R (and hence 32-bit
  ODBC) on 64-bit Windows, you need the 32-bit version of the GUI at
  something like
  \texttt{c:\textbackslash{}Windows\textbackslash{}SysWOW64\textbackslash{}odbcad32.exe}
  -- and beware that both 32- and 64-bit versions are called
  \texttt{odbcad32.exe}.} to set up DSNs, called something like `Data
Sources (ODBC)' under `Administrative Tools' in the Control Panel. You
can add, remove and edit (`configure') DSNs there (see figure~2).
When adding a DSN, first select the ODBC driver and then complete the
driver-specific dialog box.  There will usually be an option to test
the DSN and it is wise to do so.

\begin{figure}
\centerline{\includegraphics[width=12cm]{winDSN1}}
\bigbreak
\centerline{\includegraphics[width=10cm]{winDSN2}}
\caption{(Top) The main Data Sources (ODBC) dialog box from a Windows XP system.
(Bottom) The dialog box to select a driver that comes up when the
\textsf{Add} button is clicked.}
\label{fig:1}
\end{figure}

If \texttt{Rgui} is to be used on Windows, incomplete DSNs can be
created and the dialog box will be brought up for completion when
\code{odbcConnect} is called---this can be helpful to avoid storing
passwords in the Windows Registry or to allow alternate users or
databases.  On that platform, calling \code{odbcDriverConnect()} with
no arguments will bring up the main ODBC Data Sources dialog box to
allow a DSN to be constructed on the fly.

Mac OS X prior to 10.6 came with a very similar GUI (figure~3) found
at \textsf{Applications / Utilities / ODBC Administrator}.
This is now available as a download from
\url{http://support.apple.com/kb/DL895}.  Another GUI for OS X is
available from \url{http://www.odbcmanager.net/index.php}.

\begin{figure}
\centerline{\includegraphics[width=12cm]{macODBC}}
\bigbreak
\centerline{\includegraphics[width=12cm]{macAccess}}
\caption{(Top) The main ODBC Administrator dialog box from a 
  Mac OS X 10.5 system.
(Bottom) A page of the  dialog box to specify a DSN for the Actual
Technologies Access/Excel driver.}
\label{fig:1b}
\end{figure}


% Both \code{unixODBC} and \code{iODBC} provide GUIs (which might be
% packaged separately in binary distributions) to create DSNs, and
% \code{iODBC} also has a web-grounded DSN administrator.  UnixODBC's GUI
% is currently called \texttt{ODBCConfig} (see figure~4), and there is a
% KDE control widget called \texttt{DataManager} to manage both ODBC
% drivers and DSNs.  See the \code{unixODBC} user manual at
% \url{http://www.unixodbc.org/doc/UserManual/}. (On Fedora these are in
% the \code{unixODBC-kde} RPM.  It has been announced that they will
% become separate projects after \code{unixODBC 2.2.14}.)

\begin{figure}
\centerline{\includegraphics[width=12cm]{linuxDSN}}
\bigbreak
\centerline{\includegraphics[width=12cm]{linuxDSNsqlite}}
\caption{The dialog box of \code{ODBCconfig} on Fedora 10 Linux, and
  the \textsf{Configure} screen for the SQLite driver.}
\label{fig:2}
\end{figure}


On Unix-alikes DSNs can also be specified in files (and the graphical
tools just manipulate these files).  The system-wide file is usually
\code{/etc/odbc.ini} and the per-user
file\footnote{\texttt{\textasciitilde/Library/ODBC/odbc.ini} on Mac OS
  X.} \verb|~/.odbc.ini|.  Some examples of the format are shown
figure~\ref{fig:odbc2}.

\begin{figure}
\begin{example}
[test_mysql]
Description     = test MySQL
Driver          = MySQL
Trace           = No
Server          = localhost
Port            = 3306
Database        = test

[test_mysql5]
Description     = myodbc5
Driver          = MySQL ODBC 5.1 Driver
Server          = gannet
Port            = 3306
Database        = ripley

[test_pg]
Description     = test PostgreSQL
Driver          = PostgreSQL
Trace           = No
TraceFile       =
ServerName      = localhost
UserName        = ripley
Port            = 5432
Socket          =
Database        = testdb
ReadOnly        = 0

[test_sqlite3]
Description     = test SQLite3
Driver          = sqlite3
Database        = /tmp/mysqlite3.db
\end{example}
\caption{A personal (\texttt{\textasciitilde/.odbc.ini}) file from a
  Fedora 10 Linux system using \code{unixODBC}.}
\label{fig:odbc2}
\end{figure}

What fields are supported is driver-specific (and it can be hard to
find documentation). There is no clear distinction between fields that
specify the driver and those which specify the DSN, so any parts of
the driver specification which might differ between connections can be
used in the DSN file.

Things that are often set here are if the connection is read-only
(\code{test\_pg} is \emph{not} readonly) and the character encoding to
be used.

Command-line programs \code{isql} (\code{unixODBC}) and \code{iodbctest}
(\code{iODBC}) can be used to test a DSN that has been created manually in a
file.  The formats are
\begin{example}
\$ isql -v \var{dsn} \var{db_username} \var{db_password}
\$ iodbctest
\end{example}
Both give a command-line SQL interface: use \code{quit} to terminate.

\begin{figure}
\includegraphics[width=6cm]{Access}
\quad
\includegraphics[width=6cm]{MySQL}
\bigbreak
\includegraphics[width=6cm]{Oracle}
\quad
\includegraphics[width=6cm]{SQLserver}
\caption{Parts of the ODBC driver configuration screens on Windows XP
  for Microsoft Access, MySQL Connector/ODBC 5.1, Oracle's ODBC driver
  and Microsoft SQL Server.}
\label{fig:3}
\end{figure}

\clearpage
\section{Internals}

The appendix is in part an \emph{aide memoire} for the maintainer, but
may interest the curious user.

RODBC connection objects are an integer with several attributes: they
are numbered consecutively in the current session.  For example
\begin{smallexample}
> channel <- odbcConnect("test")
> unclass(channel)
[1] 1
attr(,"connection.string")
[1] "DATABASE=ripley;DESCRIPTION=myodbc;DSN=test;OPTION=0;PORT=3306;SERVER=localhost;"
attr(,"handle\_ptr")
<pointer: 0x233e6c0>
attr(,"case")
[1] "nochange"
attr(,"id")
[1] 11371
attr(,"believeNRows")
[1] TRUE
attr(,"colQuote")
[1] "`"
attr(,"tabQuote")
[1] "`"
attr(,"encoding")
[1] ""
attr(,"rows\_at\_time")
[1] 100
attr(,"isMySQL")
[1] FALSE
\end{smallexample}
Most of the attributes record the arguments of
\code{odbcDriverConnect}.  The \code{"connection.string"} attribute is
as returned by \code{SQLDriverConnect} and list driver-specific
parameters separated (and perhaps terminated) by a semicolon.  The
\code{"id"} attribute is a random integer used for integrity checks
(and in particular to reject connection objects should they be saved
and restored in a different session).  The \code{"isMySQL"} attribute
is used both to select the default quote character and the
interpretation of \code{qualifier.table} names.

The main structure of the connection is kept as a C \code{struct}, a
pointer to which is passed around as the R external pointer
\code{"handle\_ptr"}.  This has a finalizer that will close the
connection when there is no longer an R object referring to it
(including at the end of the R session), with a warning unless the
connection has already been closed by \code{close} or
\code{odbcClose}.  In addition, a C-level table keeps the pointers of
the first 1000 connections of an R session, to enable
\code{odbcCloseAll} to close them.

The \code{struct} is currently defined as
\begin{smallexample}
typedef struct rodbcHandle \{
    SQLHDBC     hDbc;         /* connection handle */
    SQLHSTMT    hStmt;        /* statement handle */
    SQLLEN      nRows;        /* number of rows and columns in result set */
    SQLSMALLINT nColumns;
    int         channel;      /* as stored on the R-level object */
    int         id;           /* ditto */
    int         useNRows;     /* value of believeNRows */

    /* entries used to bind data for result sets and updates */
    COLUMNS     *ColData;
    int         nAllocated;
    SQLUINTEGER	rowsFetched;  /* use to indicate the number of rows fetched */
    SQLUINTEGER	rowArraySize; /* use to indicate the number of rows we expect back */
    SQLUINTEGER	rowsUsed;     /* for when we fetch more than we need */

    SQLMSG      *msglist;     /* root of linked list of messages */
    SEXP        extPtr;       /* the external pointer address */
\} RODBCHandle, *pRODBCHandle;
\end{smallexample}
Most ODBC operations work by sending a query, explicitly or implicitly
via e.g.{} \code{sqlColumns}, and this creates a \emph{result set}
which is transferred to an R data frame by \code{sqlGetResults}.
\code{nRows} and \code{nCols} indicate the size of the pending result
set, with \code{nCols = -1} used if there are no pending results.

ODBC works with various \emph{handles}.  There is a \code{SQLHENV}
handle for the environment that \code{RODBC} opens when a connection
is first opened or DSNs are listed---its main use is to request ODBC~3
semantics.  Then each connection has a \code{SQLHDBC} handle, and each
query (statement) a \code{SQLHSTMT} handle.  Argument
\code{literal=TRUE} of \code{sqlTables} and \code{sqlColumns} is used
to set the \verb}SQL_ATTR_METADATA_ID} attribute of the statement
handle to be true.

All the functions\footnote{ \code{odbcQuery}, \code{sqlColumns},
  \code{sqlPrimaryKeys}, \code{sqlTables} and \code{sqlTypeInfo}.}
that create a result set call C function \code{cachenbind}.  This
allocates buffers under the \code{colData} pointer and binds the
result set to them by \code{SQLBindCol}.  Then when
\code{sqlGetResults} calls the C function \code{SQLFetch} or
\code{SQLFetchScroll} the results for one or more (up to
\code{MAX\_ROWS\_FETCH = 1024}) rows are loaded into the buffers and
then copied into R vectors.

Prior to \pkg{RODBC} 1.3-0 the default was to fetch a row at a time,
but it is now to fetch up to 100 rows at a time.  Entries
\code{rowsArraySize} and \code{rowsFetched} are used to indicate how
many rows were requested and how many were available.  Since e.g.{}
\code{sqlFetch} allows a maximum number of rows to be returned in the
data frame, \code{rowsUsed} indicates how many of the rows last
fetched have so far been returned to R.

The buffers are part of the \code{ColData} entry, which is an array of
\code{COLUMNS} structures, one of each column in the result set. These
have the form
\begin{smallexample}
typedef struct cols \{
    SQLCHAR     ColName[256];
    SQLSMALLINT NameLength;
    SQLSMALLINT DataType;
    SQLULEN     ColSize;
    SQLSMALLINT DecimalDigits;
    SQLSMALLINT Nullable;
    char        *pData;
    int         datalen;
    SQLDOUBLE   RData [MAX\_ROWS\_FETCH];
    SQLREAL     R4Data[MAX\_ROWS\_FETCH];
    SQLINTEGER  IData [MAX\_ROWS\_FETCH];
    SQLSMALLINT I2Data[MAX\_ROWS\_FETCH];
    SQLLEN      IndPtr[MAX\_ROWS\_FETCH];
\} COLUMNS;
\end{smallexample}
The first six entries are returned by a call to \code{SQLDescribeCol}:
\code{DataType} is used to select the buffer to use.  There are
separate buffers for double-precision, single-precision, 32-bit and
16-bit integer and character/byte data.  When character/data buffers
are allocated, \code{datalen} records the length allocated per row
(which is based on the value returned as \code{ColSize}).  The
\code{IndPtr} value is used to record the actual size of the item
in the current row for variable length character and binary types, and
for all nullable types the special value \code{SQL\_NULL\_DATA} (-1)
indicates an SQL null value.

The other main C-level operation is to send data to the ODBC driver
for \code{sqlSave} and \code{sqlUpdate}.  These use \code{INSERT INTO}
and \code{UPDATE} queries respectively, and for \code{fast = TRUE} use
parametrized queries.  So we have the queries (split across lines for
display)
\begin{smallexample}
> sqlSave(channel, USArrests, rownames = "State", addPK = TRUE, verbose = TRUE)
Query: CREATE TABLE "USArrests"
  ("State" varchar(255) NOT NULL PRIMARY KEY, "Murder" double, "Assault" integer,
  "UrbanPop" integer, "Rape" double)
Query: INSERT INTO "USArrests"
  ( "State", "Murder", "Assault", "UrbanPop", "Rape" ) VALUES ( ?,?,?,?,? )
Binding: 'State' DataType 12, ColSize 255
Binding: 'Murder' DataType 8, ColSize 15
Binding: 'Assault' DataType 4, ColSize 10
Binding: 'UrbanPop' DataType 4, ColSize 10
Binding: 'Rape' DataType 8, ColSize 15
Parameters:
...

> sqlUpdate(channel, foo, "USArrests", verbose=TRUE)
Query: UPDATE "USArrests" SET "Assault"=? WHERE "State"=?
Binding: 'Assault' DataType 4, ColSize 10
Binding: 'State' DataType 12, ColSize 255
Parameters:
...
\end{smallexample}
At C level, this works by calling \code{SQLPrepare} to record the
insert/update query on the statement handle, then calling
\code{SQLBindParameter} to bind a buffer for each column with values
to be sent, and finally in a loop over rows copying the data into the
buffer and calling \code{SQLExecute} on the statement handle.

The same buffer structure is used as when retrieving result sets.
The difference is that the arguments which were ouptuts from
\code{SQLBindCol} and inputs to \code{SQLBindParameter}, so we need to
use \code{sqlColumns} to retrieve the column characteristics of the
table and pass these down to the C interface.
\end{document}