File: threads.tex

package info (click to toggle)
haskell98-report 20030706-3
  • links: PTS
  • area: main
  • in suites: etch, etch-m68k
  • size: 1,888 kB
  • ctags: 77
  • sloc: haskell: 3,809; makefile: 326; sh: 4
file content (326 lines) | stat: -rw-r--r-- 11,042 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
\documentclass{article}

\usepackage{proof}
\usepackage{code}

\sloppy
\setlength{\parskip}{0.5\baselineskip plus 0.2\baselineskip minus 0.1\baselineskip}
\setlength{\parsep}{\parskip}
\setlength{\topsep}{0cm}
\setlength{\parindent}{0cm}
\renewcommand{\textfraction}{0}
\renewcommand{\topfraction}{1}
\renewcommand{\floatpagefraction}{0.8}
\renewcommand{\dblfloatpagefraction}{0.8}


\newcommand{\NS}{{\cal N}}
%       NS: set of native threads
\newcommand{\HS}{{\cal H}}
%       HS: set of Haskell threads
\newcommand{\hcall}{H}
\newcommand{\fcall}[2]{F^{#1}~#2}
\newcommand{\ret}[1]{RET~#1}

\newcommand{\bound}[1]{B(#1)}
\newcommand{\forkio}[1]{ForkIO(#1)}
\begin{document}


\title{A semantics for foreign threads}

\author{The Haskell FFI team}
\maketitle

\makeatactive

\section{Introduction}

Threads created using OS primitives (for example pthreads on POSIX systems) can
have thread-local state. Several libraries make use of this thread-local state.
Most Haskell systems use just one OS thread to execute all foreign code.
With GHC's threaded RTS, this is not even necessarily the same OS thread all the
time.
Therefore it is not possible to use foreign libraries that depend on thread-local
state reliably.

The following foreign libraries are known to be affected:
\begin{itemize}
\item OpenGL
\item Carbon and Cocoa (Apple's GUI toolkits)
\end{itemize}

\section{Requirements}

\begin{itemize}
\item It should be possible for Haskell code to arrange that a sequence of
calls to a given library are performed by the same native thread and
that if an external library calls into Haskell, then any outgoing calls
from Haskell are performed by the same native thread.

\item The specification should be implementable in a way that allows a lot
of foreign calls to be made with no additional overhead with respect to
GHC's current ``unsafe'' foreign calls.

\item The good performance of the existing lightweight ``green'' threads in
GHC should not be sacrificed. Performance should still OK when using
the new features with only a few threads (i.e. not more than commonly
used from multithreaded C programs).

\item The specification shouldn't explicitly require lightweight ``green''
threads to exist. The specification should be implementable in a simple
and obvious way in haskell systems that always use a 1:1 correspondence
between Haskell threads and OS threads.

\item The specification shouldn't specify which particular OS thread
should be used to execute Haskell code. It should be possible to
implement it with e.g. a Haskell interpreter running in one OS thread
that just uses other OS threads for foreign calls.

\item There should be no unexpected blocking. Safe Foreign calls (i.e. calls not
marked as unsafe) should not cause other threads to block.
\end{itemize}

\newpage
\section{Informal semantics}

Here's the basic idea:
\begin{description}
\item[Haskell threads and OS threads.] \mbox{}\\
\begin{itemize}
\item Every Haskell thread is \emph{either} unbound, \emph{or} bound to a exactly one OS thread.  

\item At most one Haskell thread may be bound to one OS thread.
In particular, @forkIO@ forks a new unbound Haskell thread.

\item A Haskell thread, bound to a new OS thread, can be created with @forkOS@.

\end{itemize}

\item[Foreign interface.] \mbox{}\\
\begin{itemize}
\item No @safe@ vs @threadsafe@ distinction\footnote{``@threadsafe@'' has already
been removed from the current Release Candidate of the FFI addendum}. But we retain
the @safe@/@unsafe@ distinction.
\item A foreign call made by a Haskell thread is (guaranteed to be) made by its bound OS thread, if
any.

\item If a @safe@ foreign call blocks, then no Haskell threads block.  (Remember, every OS thread
has at most one Haskell thread bound to it.)

\item A foreign call \emph{into Haskell} (via @foreign export@ or @foreign import wrapper@) is 
run by a Haskell thread bound to the OS thread that made the call.
\end{itemize}


\item[Open questions and notes.] \mbox{}\\
\begin{itemize}
\item Notice that, there \emph{can} be a 1-1 mapping between Haskell threads
and OS threads.  Furthermore, we can run efficiently on an SMP.
\end{itemize}
\end{description}


\section{Formal semantics}

The syntax of a native thread is this:
$$
\begin{array}{lrcll}
\mbox{Native thread} &  t & ::= & N[S] \\
\\
\mbox{Native thread stack} &  S & ::= & \epsilon & \mbox{Empty}\\
        & & | & \hcall : S  & \mbox{Executing Haskell} \\
        & & | & \fcall{si}{a_{bt}} : S & \mbox{Executing foreign code} \\
        & & | & \bullet & \mbox{Unknown}\\
\\
\mbox{Safety indicator} &  si & ::= & u & \mbox{Unsafe} \\
        & & | & s & \mbox{Safe}
\end{array}
$$
A native thread of form $N[S]$ has thread-id $N$, while $S$ is
an abstraction of its call stack.  If $\hcall$ is on top of the stack,
the thread is willing to execute a Haskell thread. 
If $\fcall{si}{h}$ is
on top of the stack, the thread is in the process of dealing with a call
to a foreign function, which will return its result to the Haskell thread
$h$.  The safety-indicator $si$ is from the FFI spec.

A native thread of form $N[H]$ has a stack that exists only to serve Haskell 
threads, and so can safely block inside a foreign call without mucking anything
else up.  We might call them ``worker threads''.

The syntax of a Haskell thread is this:
$$
\begin{array}{lrcll}
\mbox{Haskell thread} &  h & ::= & (a)_{bt} \\
\\
\mbox{Bound thread id} & bt & ::= & \epsilon & \mbox{Not bound} \\
        & & | & N & \mbox{Bound to native thread N} \\
\\
\mbox{Haskell action} &  a & ::= & p ~@>>@~ a  & \mbox{Sequence} \\
        & & | & \ret  & \mbox{Return from a call into Haskell} \\
\\
\mbox{Primitive action} &  p & ::= & \tau & \mbox{Internal action} \\
        & & | & @forkIO@~a & \mbox{Fork a thread} \\
        & & | & @forkOS@~a & \mbox{Fork a native thread} \\
        & & | & \fcall{si}{f} & \mbox{Foreign call} 
\end{array}
$$
A Haskell thread $h$ of form $(a)_{N}$ has action $a$.  The indicator
$N$ identifies the native thread $N$ to which the Haskell thread is \emph{bound}.

An action $a$ is a sequence of primitive actions, finishing with a 
return of some kind.  A primitive action is either some internal Haskell
thing (such as performing a bit of evaluation, or operating on an @MVar@),
or else it is a call to a foreign function $f$.

We do not model the data passed to, or returned from, a foreign call, nor
any details of what ``internal Haskell'' operations are.  

\subsection{Evolution}

We describe how the system evolves in a very standard way, using 
transition rules, of form
$$
\NS ; \HS ~\Rightarrow~ \NS' ; \HS'
$$
The structural rules are as before.
$$
\begin{array}{rcll}
N[\hcall:S]; (\tau~@>>@~a)_{bt} 
        & \Rightarrow 
        & N[\hcall:S]; (a)_{M} & (INT) \\
\\
N[\hcall:S]; (@forkIO@~b~@>>@~a)_{bt} 
        & \Rightarrow 
        & N[\hcall:S]; (a)_{bt}, (b)_\epsilon & (FORKIO) \\
N[\hcall:S]; (@forkOS@~b~@>>@~a)_{bt}
        & \Rightarrow 
        & N[\hcall:S], M[\hcall,\bullet]; (a)_{M}, (b)_{bt} & (FORKOS) \\
\\
N[\hcall:S]; (\fcall{si}{f}~@>>@~a)_N 
        & \Rightarrow 
        & N[\fcall{si}{a_N}:\hcall:S];  & (FCALL1) \\
N[\hcall]; (\fcall{si}{f}~@>>@~a)_\epsilon 
        & \Rightarrow 
        & N[\fcall{si}{a_\epsilon}:\hcall:S];  & (FCALL2) \\
g\\
N[\fcall{si}{a_{bt}}:S];  
        & \Rightarrow 
        & N[S]; a_{bt} & (FRET) \\
\\
N[\bullet];
        & \Rightarrow 
        & N[\hcall:\bullet];  (f ~@>>@~ \ret{})_{N} & (HCALL1) \\
\\
N[\fcall{s}{a} : S]; 
        & \Rightarrow 
        & N[\hcall : \fcall{s}{a} : S]; ~ (f ~@>>@~ \ret{})_{N} & (HCALL2) \\
 \\
N[\hcall : S]; (\ret{})_N
        & \Rightarrow 
        & N[S]; & (HRET) \\
\\
(nothing)
        & \Rightarrow 
        & N[\hcall]; & (WKR) \\
        & \multicolumn{2}{l}{\mbox{where $N$ is fresh}} \\
\\
(nothing)
        & \Rightarrow 
        & N[\bullet]; & (EXT) \\
        & \multicolumn{2}{l}{\mbox{where $N$ is fresh}} \\
\end{array}
$$

\begin{description}
\item[FORKOS.]  Note that we spawn a new OS thread $M[H,\bullet]$.  The $\bullet$ prevents it
participating in (FCALL2), which might block $M$ inside a foreign call; instead, $M$ must
remain available to participate in (FCALL1), since no other OS thread can do so.

\item[WKR.] This rule models the birth of new worker OS threads, in case they should
all be blocked in a foreign call.
\end{description}

\section{Primitives}

\subsection{forkIO and forkOS}

\begin{quote}
\begin{verbatim}
forkIO :: IO () -> IO ThreadId
forkOS :: IO () -> IO ThreadId
\end{verbatim}
\end{quote}

As described in the formal semantics above.

This document does not specify the meaning of the @ThreadId@ return value. The
definition of what constitutes one Haskell thread used for the return value
need not agree with the definition used for describing the formal semantics.
Questions like ``are thread ids preserved across foreign calls and call-backs''
are outside the scope of this document.

\subsection{isCurrentThreadBound}

\begin{quote}
\begin{verbatim}
isCurrentThreadBound :: IO Bool
\end{verbatim}
\end{quote}

... should return @True@ if and only if it is safe to use foreign calls that
rely on thread-local state. That means it will return True when executed from a
bound Haskell thread. It may also return @True@ for threads that are not bound
according to the above semantics if the run time system is implemented in such
a way that thread-local-state can be used from all threads, e.g. using a 1-1
relationship between Haskell threads and OS threads.

This primitive is intended to make use of forkOS unnecessary when a bound
thread is already available; consider the following utility function:

\begin{quote}
\begin{verbatim}
runInBoundThread :: IO a -> IO a

runInBoundThread action = do
    bound <- isCurrentThreadBound
    if bound
        then action
        else do
            mv <- newEmptyMVar
            forkOS (action >>= putMVar mv)
            takeMVar mv
\end{verbatim}
\end{quote}

\subsection{forkProcess}

\begin{quote}
\begin{verbatim}
forkProcess :: IO (Maybe ProcessID)
\end{verbatim}
\end{quote}

The primitive @forkProcess@ is available in the module @System.Posix.Process@
on Posix platforms only.
It's semantics are anologous to POSIX @fork@; it returns a process
id to the parent, and @Nothing@ to the child. Only the Haskell thread that
@forkProcess@ was called from is duplicated in the child process; it
will be the only thread in the child process.

If @forkProcess@ is called from a bound Haskell thread, the duplicated Haskell
thread in the child process will again be bound and all the thread-local
state will be preserved.

If @forkProcess@ is called from an unbound Haskell thread, the duplicated
Haskell thread in the child process will be unbound.

\paragraph{Issues:}
\begin{itemize}
\item Under what circumstances should the forked process terminate?
\item The action @forkProcessAll@ from the same module should be removed, as
      it seems to be unimplementable in the presence of multiple OS threads.
\end{itemize}

\end{document}