1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323
|
<?xml version="1.0" encoding='UTF-8'?>
<!DOCTYPE sect1 PUBLIC "-//OASIS//DTD DocBook V4.3//EN"
"http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd">
<sect1 id="threads">
<title>Using MySQL++ in a Multithreaded Program</title>
<para>MySQL++ is not “thread safe” in any
meaningful sense. MySQL++ contains very little code that
actively prevents trouble with threads, and all of it is
optional. We have done some work in MySQL++ to make thread
safety <emphasis>achievable</emphasis>, but it doesn’t come
for free.</para>
<para>The main reason for this is that MySQL++ is
generally I/O-bound, not processor-bound. That is, if
your program’s bottleneck is MySQL++, the ultimate
cause is usually the I/O overhead of using a client-server
database. Doubling the number of threads will just let your
program get back to waiting for I/O twice as fast. Since <ulink
url="http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-1.pdf">threads
are evil</ulink> and generally can’t help MySQL++, the only
optional thread awareness features we turn on in the shipping
version of MySQL++ are those few that have no practical negative
consequences. Everything else is up to you, the programmer, to
evaluate and enable as and when you need it.</para>
<para>We’re going to assume that you either agree with these
views but find yourself needing to use threads for some other
reason, or are foolishly disregarding these facts and are going to
use threads anyway. Our purpose here is limited to setting down
the rules for avoiding problems with MySQL++ in a multi-threaded
program. We won’t go into the broader issues of thread safety
outside the scope of MySQL++. You will need a grounding in threads
in general to get the full value of this advice.</para>
<sect2 id="thread-build">
<title>Build Issues</title>
<para>Before you can safely use MySQL++ with threads, there are
several things you must do to get a thread-aware build:</para>
<orderedlist>
<listitem>
<para><emphasis>Build MySQL++ itself with thread awareness
turned on.</emphasis></para>
<para>On Linux, Cygwin and Unix (OS X, *BSD, Solaris...),
pass the <computeroutput>--enable-thread-check</computeroutput>
flag to the <filename>configure</filename> script. Beware, this
is only a request to the <filename>configure</filename> script
to look for thread support on your system, not a requirement
to do or die: if the script doesn’t find what it needs
to do threading, MySQL++ will just get built without thread
support. See <filename>README-Unix.txt</filename> for more
details.</para>
<para>On Windows, if you use the Visual C++ project files or
the MinGW Makefile that comes with the MySQL++ distribution,
threading is always turned on, due to the nature of
Windows.</para>
<para>If you build MySQL++ in some other way, such as with
Dev-Cpp (based on MinGW) you’re on your own to enable
thread awareness.</para>
</listitem>
<listitem>
<para><emphasis>Link your program to a thread-aware build of the
MySQL C API library.</emphasis></para>
<para>If you use a binary distribution of MySQL on Unixy
systems, you usually get two different versions of the MySQL
C API library, one with thread support and one without. These
are typically called <filename>libmysqlclient</filename> and
<filename>libmysqlclient_r</filename>, the latter being the
thread-safe one. (The “<filename>_r</filename>”
means reentrant.)</para>
<para>If you’re using the Windows binary distribution of
MySQL, there are two versions of the client library, but both
are thread aware. One just has debugging symbols, and the other
doesn’t. See <filename>README-Visual-C++.txt</filename>
or <filename>README-MinGW.txt</filename> for details.</para>
<para>If you build MySQL from source, you might only get
one version of the MySQL C API library, and it can have
thread awareness or not, depending on your configuration
choices. This is the case with Cygwin, where you currently
have no choice but to build the C API library from source. (See
<filename>README-Cygwin.txt</filename>.)</para>
</listitem>
<listitem>
<para><emphasis>Enable threading in your program’s build
options.</emphasis></para>
<para>This is different for every platform, but it’s
usually the case that you don’t get thread-aware builds
by default. Depending on the platform, you might need to change
compiler options, linker options, or both. See your development
environment’s documentation, or study how MySQL++ itself
turns on thread-aware build options when requested.</para>
</listitem>
</orderedlist>
</sect2>
<sect2 id="thread-conn-mgmt">
<title>Connection Management</title>
<para>The MySQL C API underpinning MySQL++ does not allow multiple
concurrent queries on a single connection. You can run into this
problem in a single-threaded program, too, which is why we cover the
details elsewhere, in <xref linkend="concurrentqueries"/>.
It’s a thornier problem when using threads, though.</para>
<para>The simple fix is to just create a separarate <ulink
url="Connection" type="classref"/> object for each thread
that needs to make database queries. This works well if you
have a small number of threads that need to make queries, and
each thread uses its connection often enough that the server
doesn’t time out waiting for queries.<footnote><para>By
default, current MySQL servers have an 8 hour idle timeout on
connections. It’s a configuration option, though, so your
server may be set differently.</para></footnote></para>
<para>If you have lots of threads or the frequency of queries is
low, the connection management overhead will be excessive. To avoid
that, we created the <ulink url="ConnectionPool" type="classref"/>
class. It manages a pool of <classname>Connection</classname>
objects like library books: a thread checks one out, uses it,
and then returns it to the pool as soon as it’s done with
it. This keeps the number of active connections low.</para>
<para><classname>ConnectionPool</classname> has three
methods that you need to override in a subclass to
make it concrete: <methodname>create()</methodname>,
<methodname>destroy()</methodname>, and
<methodname>max_idle_time()</methodname>. These overrides let
the base class delegate operations it can’t successfully do
itself to its subclass. The <classname>ConnectionPool</classname>
can’t know how to <methodname>create()</methodname>
the <classname>Connection</classname> objects, because that
depends on how your program gets login parameters, server
information, etc. <classname>ConnectionPool</classname>
also makes the subclass <methodname>destroy()</methodname>
the <classname>Connection</classname> objects it created; it
could assume that they’re simply allocated on the heap
with <methodname>new</methodname>, but it can’t be sure,
so the base class delegates destruction, too. Finally, the base
class can’t know what the connection idle timeout policy
in the client would make the most sense, so it asks its subclass
via the <methodname>max_idle_time()</methodname> method.</para>
<para><classname>ConnectionPool</classname> also allows you to
override <methodname>release()</methodname>, if needed. For simple
uses, it’s not necessary to override this.</para>
<para>In designing your <classname>ConnectionPool</classname>
derivative, you might consider making it a Singleton (see Gamma
et al.), since there should only be one pool in a program.</para>
<para>Here is an example showing how to use connection pools with
threads:</para>
<programlisting><xi:include href="cpool.txt" parse="text"
xmlns:xi="http://www.w3.org/2001/XInclude"/></programlisting>
<para>The example works with both Windows native
threads and with POSIX threads.<footnote><para>The file
<filename>examples/threads.h</filename> contains a few macros and
such to abstract away the differences between the two threading
models.</para></footnote> Because thread-enabled builds are only
the default on Windows, it’s quite possible for this program
to do nothing on other platforms. See above for instructions on
enabling a thread-aware build.</para>
<para>If you write your code without checks for thread support
like you see in the code above and link it to a build of MySQL++
that isn’t thread-aware, it will still try to run. The
threading mechanisms fall back to a single-threaded mode when
threads aren’t available. A particular danger is that the
mutex lock mechanism used to keep the pool’s internal data
consistent while multiple threads access it will just quietly
become a no-op if MySQL++ is built without thread support. We do
it this way because we don’t want to make thread support
a MySQL++ prerequisite. And, although it would be of limited
value, this lets you use <classname>ConnectionPool</classname>
in single-threaded programs.</para>
<para>You might wonder why we don’t just work around
this weakness in the C API transparently in MySQL++ instead of
suggesting design guidelines to avoid it. We’d like to do
just that, but how?</para>
<para>If you consider just the threaded case, you could argue for
the use of mutexes to protect a connection from trying to execute
two queries at once. The cure is worse than the disease: it turns a
design error into a performance sap, as the second thread is blocked
indefinitely waiting for the connection to free up. Much better to
let the program get the “Commands out of sync” error,
which will guide you to this section of the manual, which tells you
how to avoid the error with a better design.</para>
<para>Another option would be to bury
<classname>ConnectionPool</classname> functionality within MySQL++
itself, so the library could create new connections at need.
That’s no good because the above example is the most complex
in MySQL++, so if it were mandatory to use connection pools, the
whole library would be that much more complex to use. The whole
point of MySQL++ is to make using the database easier. MySQL++
offers the connection pool mechanism for those that really need it,
but an option it must remain.</para>
</sect2>
<sect2 id="thread-helpers">
<title>Helper Functions</title>
<para><classname>Connection</classname> has several thread-related
static methods you might care about when using MySQL++ with
threads.</para>
<para>You can call
<methodname>Connection::thread_aware()</methodname> to
determine whether MySQL++ and the underlying C API library
were both built to be thread-aware. Again, I stress that thread
<emphasis>awareness</emphasis> is not the same thing as thread
<emphasis>safety</emphasis>: it’s still up to you to
make your code thread-safe. If this method returns true, it
just means it’s <emphasis>possible</emphasis> to achieve
thread-safety.</para>
<para>If your program’s connection-management strategy allows
a thread to use a <classname>Connection</classname> object that
another thread created before it creates a connection of its own,
you must call <methodname>Connection::thread_start()</methodname>
from that thread before it does anything with MySQL++. If a
thread creates a new connection before it uses a connection
created by another thread, though, it doesn’t need to call
<methodname>Connection::thread_start()</methodname> because the
per-thread resources this allocates are implicitly created upon
creation of a connection if necessary.</para>
<para>This is why the simple
<classname>Connection</classname>-per-thread strategy
works: each thread that uses MySQL++ creates a connection
in that thread, implicitly allocating the per-thread
resources at the same time. You never need to call
<methodname>Connection::thread_start()</methodname> in this
instance. It’s not harmful to call this function, just
unnecessary.</para>
<para>A good counterexample is using
<classname>ConnectionPool</classname>: you probably do need
to call <methodname>Connection::thread_start()</methodname>
at the start of each worker thread because you can’t
usually tell whether you’re getting a new connection
from the pool, or reusing one that another thread returned
to the pool after allocating it. It’s possible to
conceive of situations where you can guarantee that each pool
user always allocates a fresh connection the first time it
calls <methodname>ConnectionPool::grab()</methodname>,
but thread programming is complex enough that
it’s best to take the safe path and always call
<methodname>Connection::thread_start()</methodname> early in each
worker thread.</para>
<para>Finally, there’s the complementary method,
<methodname>Connection::thread_end()</methodname>. Strictly
speaking, it’s not <emphasis>necessary</emphasis> to call
this. The per-thread memory allocated by the C API is small,
it doesn’t grow over time, and a typical thread is going
to need this memory for its entire run time. Memory debuggers
aren’t smart enough to know all this, though, so they will
gripe about a memory leak unless you call this from each thread
that uses MySQL++ before that thread exits.</para>
<para>Although its name suggests otherwise,
<methodname>Connection::thread_id()</methodname> has nothing to
do with anything in this chapter.</para>
</sect2>
<sect2 id="thread-data-sharing">
<title>Sharing MySQL++ Data Structures</title>
<para>We’re in the process of making it safer to share
MySQL++’s data structures across threads.</para>
<para>By way of illustration, let me explain a problem we had up
until MySQL++ v3.0. When you issue a database query that returns
rows, you also get information about the columns in each row. Since
the column information is the same for each row in the result set,
older versions of MySQL++ kept this information in the result set
object, and each <ulink url="Row" type="classref"/> kept a pointer
back to the result set object that created it so it could access
this common data at need. This was fine as long as each result set
object outlived the <classname>Row</classname> objects it returned.
It required uncommon usage patterns to run into trouble in this area
in a single-threaded program, but in a multi-threaded program it was
easy. For example, there’s frequently a desire to let one
connection do the queries, and other threads process the results.
You can see how avoiding lifetime problems here would require a
careful locking strategy.</para>
<para>We got around this in MySQL++ v3.0 by giving these shared data
structures a lifetime independent of the result set object that
intitially creates it. These shared data structures stick around
until the last object needing them gets destroyed.</para>
<para>Although this is now a solved problem, I bring it up because
there are likely other similar lifetime and sequencing problems
waiting to be discovered inside MySQL++. If you would like to help
us find these, by all means, share data between threads willy-nilly.
We welcome your crash reports on the MySQL++ mailing list. But if
you’d prefer to avoid problems, it’s better to keep all
data about a query within a single thread. Between this and the
previous section’s advice, you should be able to use threads
with MySQL++ without trouble.</para>
</sect2>
</sect1>
|