1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326
|
\documentclass{article}
\usepackage{proof}
\usepackage{code}
\sloppy
\setlength{\parskip}{0.5\baselineskip plus 0.2\baselineskip minus 0.1\baselineskip}
\setlength{\parsep}{\parskip}
\setlength{\topsep}{0cm}
\setlength{\parindent}{0cm}
\renewcommand{\textfraction}{0}
\renewcommand{\topfraction}{1}
\renewcommand{\floatpagefraction}{0.8}
\renewcommand{\dblfloatpagefraction}{0.8}
\newcommand{\NS}{{\cal N}}
% NS: set of native threads
\newcommand{\HS}{{\cal H}}
% HS: set of Haskell threads
\newcommand{\hcall}{H}
\newcommand{\fcall}[2]{F^{#1}~#2}
\newcommand{\ret}[1]{RET~#1}
\newcommand{\bound}[1]{B(#1)}
\newcommand{\forkio}[1]{ForkIO(#1)}
\begin{document}
\title{A semantics for foreign threads}
\author{The Haskell FFI team}
\maketitle
\makeatactive
\section{Introduction}
Threads created using OS primitives (for example pthreads on POSIX systems) can
have thread-local state. Several libraries make use of this thread-local state.
Most Haskell systems use just one OS thread to execute all foreign code.
With GHC's threaded RTS, this is not even necessarily the same OS thread all the
time.
Therefore it is not possible to use foreign libraries that depend on thread-local
state reliably.
The following foreign libraries are known to be affected:
\begin{itemize}
\item OpenGL
\item Carbon and Cocoa (Apple's GUI toolkits)
\end{itemize}
\section{Requirements}
\begin{itemize}
\item It should be possible for Haskell code to arrange that a sequence of
calls to a given library are performed by the same native thread and
that if an external library calls into Haskell, then any outgoing calls
from Haskell are performed by the same native thread.
\item The specification should be implementable in a way that allows a lot
of foreign calls to be made with no additional overhead with respect to
GHC's current ``unsafe'' foreign calls.
\item The good performance of the existing lightweight ``green'' threads in
GHC should not be sacrificed. Performance should still OK when using
the new features with only a few threads (i.e. not more than commonly
used from multithreaded C programs).
\item The specification shouldn't explicitly require lightweight ``green''
threads to exist. The specification should be implementable in a simple
and obvious way in haskell systems that always use a 1:1 correspondence
between Haskell threads and OS threads.
\item The specification shouldn't specify which particular OS thread
should be used to execute Haskell code. It should be possible to
implement it with e.g. a Haskell interpreter running in one OS thread
that just uses other OS threads for foreign calls.
\item There should be no unexpected blocking. Safe Foreign calls (i.e. calls not
marked as unsafe) should not cause other threads to block.
\end{itemize}
\newpage
\section{Informal semantics}
Here's the basic idea:
\begin{description}
\item[Haskell threads and OS threads.] \mbox{}\\
\begin{itemize}
\item Every Haskell thread is \emph{either} unbound, \emph{or} bound to a exactly one OS thread.
\item At most one Haskell thread may be bound to one OS thread.
In particular, @forkIO@ forks a new unbound Haskell thread.
\item A Haskell thread, bound to a new OS thread, can be created with @forkOS@.
\end{itemize}
\item[Foreign interface.] \mbox{}\\
\begin{itemize}
\item No @safe@ vs @threadsafe@ distinction\footnote{``@threadsafe@'' has already
been removed from the current Release Candidate of the FFI addendum}. But we retain
the @safe@/@unsafe@ distinction.
\item A foreign call made by a Haskell thread is (guaranteed to be) made by its bound OS thread, if
any.
\item If a @safe@ foreign call blocks, then no Haskell threads block. (Remember, every OS thread
has at most one Haskell thread bound to it.)
\item A foreign call \emph{into Haskell} (via @foreign export@ or @foreign import wrapper@) is
run by a Haskell thread bound to the OS thread that made the call.
\end{itemize}
\item[Open questions and notes.] \mbox{}\\
\begin{itemize}
\item Notice that, there \emph{can} be a 1-1 mapping between Haskell threads
and OS threads. Furthermore, we can run efficiently on an SMP.
\end{itemize}
\end{description}
\section{Formal semantics}
The syntax of a native thread is this:
$$
\begin{array}{lrcll}
\mbox{Native thread} & t & ::= & N[S] \\
\\
\mbox{Native thread stack} & S & ::= & \epsilon & \mbox{Empty}\\
& & | & \hcall : S & \mbox{Executing Haskell} \\
& & | & \fcall{si}{a_{bt}} : S & \mbox{Executing foreign code} \\
& & | & \bullet & \mbox{Unknown}\\
\\
\mbox{Safety indicator} & si & ::= & u & \mbox{Unsafe} \\
& & | & s & \mbox{Safe}
\end{array}
$$
A native thread of form $N[S]$ has thread-id $N$, while $S$ is
an abstraction of its call stack. If $\hcall$ is on top of the stack,
the thread is willing to execute a Haskell thread.
If $\fcall{si}{h}$ is
on top of the stack, the thread is in the process of dealing with a call
to a foreign function, which will return its result to the Haskell thread
$h$. The safety-indicator $si$ is from the FFI spec.
A native thread of form $N[H]$ has a stack that exists only to serve Haskell
threads, and so can safely block inside a foreign call without mucking anything
else up. We might call them ``worker threads''.
The syntax of a Haskell thread is this:
$$
\begin{array}{lrcll}
\mbox{Haskell thread} & h & ::= & (a)_{bt} \\
\\
\mbox{Bound thread id} & bt & ::= & \epsilon & \mbox{Not bound} \\
& & | & N & \mbox{Bound to native thread N} \\
\\
\mbox{Haskell action} & a & ::= & p ~@>>@~ a & \mbox{Sequence} \\
& & | & \ret & \mbox{Return from a call into Haskell} \\
\\
\mbox{Primitive action} & p & ::= & \tau & \mbox{Internal action} \\
& & | & @forkIO@~a & \mbox{Fork a thread} \\
& & | & @forkOS@~a & \mbox{Fork a native thread} \\
& & | & \fcall{si}{f} & \mbox{Foreign call}
\end{array}
$$
A Haskell thread $h$ of form $(a)_{N}$ has action $a$. The indicator
$N$ identifies the native thread $N$ to which the Haskell thread is \emph{bound}.
An action $a$ is a sequence of primitive actions, finishing with a
return of some kind. A primitive action is either some internal Haskell
thing (such as performing a bit of evaluation, or operating on an @MVar@),
or else it is a call to a foreign function $f$.
We do not model the data passed to, or returned from, a foreign call, nor
any details of what ``internal Haskell'' operations are.
\subsection{Evolution}
We describe how the system evolves in a very standard way, using
transition rules, of form
$$
\NS ; \HS ~\Rightarrow~ \NS' ; \HS'
$$
The structural rules are as before.
$$
\begin{array}{rcll}
N[\hcall:S]; (\tau~@>>@~a)_{bt}
& \Rightarrow
& N[\hcall:S]; (a)_{M} & (INT) \\
\\
N[\hcall:S]; (@forkIO@~b~@>>@~a)_{bt}
& \Rightarrow
& N[\hcall:S]; (a)_{bt}, (b)_\epsilon & (FORKIO) \\
N[\hcall:S]; (@forkOS@~b~@>>@~a)_{bt}
& \Rightarrow
& N[\hcall:S], M[\hcall,\bullet]; (a)_{M}, (b)_{bt} & (FORKOS) \\
\\
N[\hcall:S]; (\fcall{si}{f}~@>>@~a)_N
& \Rightarrow
& N[\fcall{si}{a_N}:\hcall:S]; & (FCALL1) \\
N[\hcall]; (\fcall{si}{f}~@>>@~a)_\epsilon
& \Rightarrow
& N[\fcall{si}{a_\epsilon}:\hcall:S]; & (FCALL2) \\
g\\
N[\fcall{si}{a_{bt}}:S];
& \Rightarrow
& N[S]; a_{bt} & (FRET) \\
\\
N[\bullet];
& \Rightarrow
& N[\hcall:\bullet]; (f ~@>>@~ \ret{})_{N} & (HCALL1) \\
\\
N[\fcall{s}{a} : S];
& \Rightarrow
& N[\hcall : \fcall{s}{a} : S]; ~ (f ~@>>@~ \ret{})_{N} & (HCALL2) \\
\\
N[\hcall : S]; (\ret{})_N
& \Rightarrow
& N[S]; & (HRET) \\
\\
(nothing)
& \Rightarrow
& N[\hcall]; & (WKR) \\
& \multicolumn{2}{l}{\mbox{where $N$ is fresh}} \\
\\
(nothing)
& \Rightarrow
& N[\bullet]; & (EXT) \\
& \multicolumn{2}{l}{\mbox{where $N$ is fresh}} \\
\end{array}
$$
\begin{description}
\item[FORKOS.] Note that we spawn a new OS thread $M[H,\bullet]$. The $\bullet$ prevents it
participating in (FCALL2), which might block $M$ inside a foreign call; instead, $M$ must
remain available to participate in (FCALL1), since no other OS thread can do so.
\item[WKR.] This rule models the birth of new worker OS threads, in case they should
all be blocked in a foreign call.
\end{description}
\section{Primitives}
\subsection{forkIO and forkOS}
\begin{quote}
\begin{verbatim}
forkIO :: IO () -> IO ThreadId
forkOS :: IO () -> IO ThreadId
\end{verbatim}
\end{quote}
As described in the formal semantics above.
This document does not specify the meaning of the @ThreadId@ return value. The
definition of what constitutes one Haskell thread used for the return value
need not agree with the definition used for describing the formal semantics.
Questions like ``are thread ids preserved across foreign calls and call-backs''
are outside the scope of this document.
\subsection{isCurrentThreadBound}
\begin{quote}
\begin{verbatim}
isCurrentThreadBound :: IO Bool
\end{verbatim}
\end{quote}
... should return @True@ if and only if it is safe to use foreign calls that
rely on thread-local state. That means it will return True when executed from a
bound Haskell thread. It may also return @True@ for threads that are not bound
according to the above semantics if the run time system is implemented in such
a way that thread-local-state can be used from all threads, e.g. using a 1-1
relationship between Haskell threads and OS threads.
This primitive is intended to make use of forkOS unnecessary when a bound
thread is already available; consider the following utility function:
\begin{quote}
\begin{verbatim}
runInBoundThread :: IO a -> IO a
runInBoundThread action = do
bound <- isCurrentThreadBound
if bound
then action
else do
mv <- newEmptyMVar
forkOS (action >>= putMVar mv)
takeMVar mv
\end{verbatim}
\end{quote}
\subsection{forkProcess}
\begin{quote}
\begin{verbatim}
forkProcess :: IO (Maybe ProcessID)
\end{verbatim}
\end{quote}
The primitive @forkProcess@ is available in the module @System.Posix.Process@
on Posix platforms only.
It's semantics are anologous to POSIX @fork@; it returns a process
id to the parent, and @Nothing@ to the child. Only the Haskell thread that
@forkProcess@ was called from is duplicated in the child process; it
will be the only thread in the child process.
If @forkProcess@ is called from a bound Haskell thread, the duplicated Haskell
thread in the child process will again be bound and all the thread-local
state will be preserved.
If @forkProcess@ is called from an unbound Haskell thread, the duplicated
Haskell thread in the child process will be unbound.
\paragraph{Issues:}
\begin{itemize}
\item Under what circumstances should the forked process terminate?
\item The action @forkProcessAll@ from the same module should be removed, as
it seems to be unimplementable in the presence of multiple OS threads.
\end{itemize}
\end{document}
|