File: threads.tex

package info (click to toggle)
haskell98-report 20030706-3
links: PTS
area: main
in suites: etch, etch-m68k
size: 1,888 kB
ctags: 77
sloc: haskell: 3,809; makefile: 326; sh: 4
file content (326 lines) | stat: -rw-r--r-- 11,042 bytes
parent folder | download | duplicates (2)
\documentclass{article}

\usepackage{proof}
\usepackage{code}

\sloppy
\setlength{\parskip}{0.5\baselineskip plus 0.2\baselineskip minus 0.1\baselineskip}
\setlength{\parsep}{\parskip}
\setlength{\topsep}{0cm}
\setlength{\parindent}{0cm}
\renewcommand{\textfraction}{0}
\renewcommand{\topfraction}{1}
\renewcommand{\floatpagefraction}{0.8}
\renewcommand{\dblfloatpagefraction}{0.8}


\newcommand{\NS}{{\cal N}}
%       NS: set of native threads
\newcommand{\HS}{{\cal H}}
%       HS: set of Haskell threads
\newcommand{\hcall}{H}
\newcommand{\fcall}[2]{F^{#1}~#2}
\newcommand{\ret}[1]{RET~#1}

\newcommand{\bound}[1]{B(#1)}
\newcommand{\forkio}[1]{ForkIO(#1)}
\begin{document}


\title{A semantics for foreign threads}

\author{The Haskell FFI team}
\maketitle

\makeatactive

\section{Introduction}

Threads created using OS primitives (for example pthreads on POSIX systems) can
have thread-local state. Several libraries make use of this thread-local state.
Most Haskell systems use just one OS thread to execute all foreign code.
With GHC's threaded RTS, this is not even necessarily the same OS thread all the
time.
Therefore it is not possible to use foreign libraries that depend on thread-local
state reliably.

The following foreign libraries are known to be affected:
\begin{itemize}
\item OpenGL
\item Carbon and Cocoa (Apple's GUI toolkits)
\end{itemize}

\section{Requirements}

\begin{itemize}
\item It should be possible for Haskell code to arrange that a sequence of
calls to a given library are performed by the same native thread and
that if an external library calls into Haskell, then any outgoing calls
from Haskell are performed by the same native thread.

\item The specification should be implementable in a way that allows a lot
of foreign calls to be made with no additional overhead with respect to
GHC's current ``unsafe'' foreign calls.

\item The good performance of the existing lightweight ``green'' threads in
GHC should not be sacrificed. Performance should still OK when using
the new features with only a few threads (i.e. not more than commonly
used from multithreaded C programs).

\item The specification shouldn't explicitly require lightweight ``green''
threads to exist. The specification should be implementable in a simple
and obvious way in haskell systems that always use a 1:1 correspondence
between Haskell threads and OS threads.

\item The specification shouldn't specify which particular OS thread
should be used to execute Haskell code. It should be possible to
implement it with e.g. a Haskell interpreter running in one OS thread
that just uses other OS threads for foreign calls.

\item There should be no unexpected blocking. Safe Foreign calls (i.e. calls not
marked as unsafe) should not cause other threads to block.
\end{itemize}

\newpage
\section{Informal semantics}

Here's the basic idea:
\begin{description}
\item[Haskell threads and OS threads.] \mbox{}\\
\begin{itemize}
\item Every Haskell thread is \emph{either} unbound, \emph{or} bound to a exactly one OS thread.  

\item At most one Haskell thread may be bound to one OS thread.
In particular, @forkIO@ forks a new unbound Haskell thread.

\item A Haskell thread, bound to a new OS thread, can be created with @forkOS@.

\end{itemize}

\item[Foreign interface.] \mbox{}\\
\begin{itemize}
\item No @safe@ vs @threadsafe@ distinction\footnote{``@threadsafe@'' has already
been removed from the current Release Candidate of the FFI addendum}. But we retain
the @safe@/@unsafe@ distinction.
\item A foreign call made by a Haskell thread is (guaranteed to be) made by its bound OS thread, if
any.

\item If a @safe@ foreign call blocks, then no Haskell threads block.  (Remember, every OS thread
has at most one Haskell thread bound to it.)

\item A foreign call \emph{into Haskell} (via @foreign export@ or @foreign import wrapper@) is 
run by a Haskell thread bound to the OS thread that made the call.
\end{itemize}


\item[Open questions and notes.] \mbox{}\\
\begin{itemize}
\item Notice that, there \emph{can} be a 1-1 mapping between Haskell threads
and OS threads.  Furthermore, we can run efficiently on an SMP.
\end{itemize}
\end{description}


\section{Formal semantics}

The syntax of a native thread is this:
$$
\begin{array}{lrcll}
\mbox{Native thread} &  t & ::= & N[S] \\
\\
\mbox{Native thread stack} &  S & ::= & \epsilon & \mbox{Empty}\\
        & & | & \hcall : S  & \mbox{Executing Haskell} \\
        & & | & \fcall{si}{a_{bt}} : S & \mbox{Executing foreign code} \\
        & & | & \bullet & \mbox{Unknown}\\
\\
\mbox{Safety indicator} &  si & ::= & u & \mbox{Unsafe} \\
        & & | & s & \mbox{Safe}
\end{array}
$$
A native thread of form $N[S]$ has thread-id $N$, while $S$ is
an abstraction of its call stack.  If $\hcall$ is on top of the stack,
the thread is willing to execute a Haskell thread. 
If $\fcall{si}{h}$ is
on top of the stack, the thread is in the process of dealing with a call
to a foreign function, which will return its result to the Haskell thread
$h$.  The safety-indicator $si$ is from the FFI spec.

A native thread of form $N[H]$ has a stack that exists only to serve Haskell 
threads, and so can safely block inside a foreign call without mucking anything
else up.  We might call them ``worker threads''.

The syntax of a Haskell thread is this:
$$
\begin{array}{lrcll}
\mbox{Haskell thread} &  h & ::= & (a)_{bt} \\
\\
\mbox{Bound thread id} & bt & ::= & \epsilon & \mbox{Not bound} \\
        & & | & N & \mbox{Bound to native thread N} \\
\\
\mbox{Haskell action} &  a & ::= & p ~@>>@~ a  & \mbox{Sequence} \\
        & & | & \ret  & \mbox{Return from a call into Haskell} \\
\\
\mbox{Primitive action} &  p & ::= & \tau & \mbox{Internal action} \\
        & & | & @forkIO@~a & \mbox{Fork a thread} \\
        & & | & @forkOS@~a & \mbox{Fork a native thread} \\
        & & | & \fcall{si}{f} & \mbox{Foreign call} 
\end{array}
$$
A Haskell thread $h$ of form $(a)_{N}$ has action $a$.  The indicator
$N$ identifies the native thread $N$ to which the Haskell thread is \emph{bound}.

An action $a$ is a sequence of primitive actions, finishing with a 
return of some kind.  A primitive action is either some internal Haskell
thing (such as performing a bit of evaluation, or operating on an @MVar@),
or else it is a call to a foreign function $f$.

We do not model the data passed to, or returned from, a foreign call, nor
any details of what ``internal Haskell'' operations are.  

\subsection{Evolution}

We describe how the system evolves in a very standard way, using 
transition rules, of form
$$
\NS ; \HS ~\Rightarrow~ \NS' ; \HS'
$$
The structural rules are as before.
$$
\begin{array}{rcll}
N[\hcall:S]; (\tau~@>>@~a)_{bt} 
        & \Rightarrow 
        & N[\hcall:S]; (a)_{M} & (INT) \\
\\
N[\hcall:S]; (@forkIO@~b~@>>@~a)_{bt} 
        & \Rightarrow 
        & N[\hcall:S]; (a)_{bt}, (b)_\epsilon & (FORKIO) \\
N[\hcall:S]; (@forkOS@~b~@>>@~a)_{bt}
        & \Rightarrow 
        & N[\hcall:S], M[\hcall,\bullet]; (a)_{M}, (b)_{bt} & (FORKOS) \\
\\
N[\hcall:S]; (\fcall{si}{f}~@>>@~a)_N 
        & \Rightarrow 
        & N[\fcall{si}{a_N}:\hcall:S];  & (FCALL1) \\
N[\hcall]; (\fcall{si}{f}~@>>@~a)_\epsilon 
        & \Rightarrow 
        & N[\fcall{si}{a_\epsilon}:\hcall:S];  & (FCALL2) \\
g\\
N[\fcall{si}{a_{bt}}:S];  
        & \Rightarrow 
        & N[S]; a_{bt} & (FRET) \\
\\
N[\bullet];
        & \Rightarrow 
        & N[\hcall:\bullet];  (f ~@>>@~ \ret{})_{N} & (HCALL1) \\
\\
N[\fcall{s}{a} : S]; 
        & \Rightarrow 
        & N[\hcall : \fcall{s}{a} : S]; ~ (f ~@>>@~ \ret{})_{N} & (HCALL2) \\
 \\
N[\hcall : S]; (\ret{})_N
        & \Rightarrow 
        & N[S]; & (HRET) \\
\\
(nothing)
        & \Rightarrow 
        & N[\hcall]; & (WKR) \\
        & \multicolumn{2}{l}{\mbox{where $N$ is fresh}} \\
\\
(nothing)
        & \Rightarrow 
        & N[\bullet]; & (EXT) \\
        & \multicolumn{2}{l}{\mbox{where $N$ is fresh}} \\
\end{array}
$$

\begin{description}
\item[FORKOS.]  Note that we spawn a new OS thread $M[H,\bullet]$.  The $\bullet$ prevents it
participating in (FCALL2), which might block $M$ inside a foreign call; instead, $M$ must
remain available to participate in (FCALL1), since no other OS thread can do so.

\item[WKR.] This rule models the birth of new worker OS threads, in case they should
all be blocked in a foreign call.
\end{description}

\section{Primitives}

\subsection{forkIO and forkOS}

\begin{quote}
\begin{verbatim}
forkIO :: IO () -> IO ThreadId
forkOS :: IO () -> IO ThreadId
\end{verbatim}
\end{quote}

As described in the formal semantics above.

This document does not specify the meaning of the @ThreadId@ return value. The
definition of what constitutes one Haskell thread used for the return value
need not agree with the definition used for describing the formal semantics.
Questions like ``are thread ids preserved across foreign calls and call-backs''
are outside the scope of this document.

\subsection{isCurrentThreadBound}

\begin{quote}
\begin{verbatim}
isCurrentThreadBound :: IO Bool
\end{verbatim}
\end{quote}

... should return @True@ if and only if it is safe to use foreign calls that
rely on thread-local state. That means it will return True when executed from a
bound Haskell thread. It may also return @True@ for threads that are not bound
according to the above semantics if the run time system is implemented in such
a way that thread-local-state can be used from all threads, e.g. using a 1-1
relationship between Haskell threads and OS threads.

This primitive is intended to make use of forkOS unnecessary when a bound
thread is already available; consider the following utility function:

\begin{quote}
\begin{verbatim}
runInBoundThread :: IO a -> IO a

runInBoundThread action = do
    bound <- isCurrentThreadBound
    if bound
        then action
        else do
            mv <- newEmptyMVar
            forkOS (action >>= putMVar mv)
            takeMVar mv
\end{verbatim}
\end{quote}

\subsection{forkProcess}

\begin{quote}
\begin{verbatim}
forkProcess :: IO (Maybe ProcessID)
\end{verbatim}
\end{quote}

The primitive @forkProcess@ is available in the module @System.Posix.Process@
on Posix platforms only.
It's semantics are anologous to POSIX @fork@; it returns a process
id to the parent, and @Nothing@ to the child. Only the Haskell thread that
@forkProcess@ was called from is duplicated in the child process; it
will be the only thread in the child process.

If @forkProcess@ is called from a bound Haskell thread, the duplicated Haskell
thread in the child process will again be bound and all the thread-local
state will be preserved.

If @forkProcess@ is called from an unbound Haskell thread, the duplicated
Haskell thread in the child process will be unbound.

\paragraph{Issues:}
\begin{itemize}
\item Under what circumstances should the forked process terminate?
\item The action @forkProcessAll@ from the same module should be removed, as
      it seems to be unimplementable in the presence of multiple OS threads.
\end{itemize}

\end{document}