1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576
|
\achapter{Extraction of programs in Objective Caml and Haskell}
\label{Extraction}
\aauthor{Jean-Christophe Fillitre and Pierre Letouzey}
\index{Extraction}
We present here the \Coq\ extraction commands, used to build certified
and relatively efficient functional programs, extracting them from
either \Coq\ functions or \Coq\ proofs of specifications. The
functional languages available as output are currently \ocaml{},
\textsc{Haskell} and \textsc{Scheme}. In the following, ``ML'' will
be used (abusively) to refer to any of the three.
\paragraph{Differences with old versions.}
The current extraction mechanism is new for version 7.0 of {\Coq}.
In particular, the \FW\ toplevel used as an intermediate step between
\Coq\ and ML has been withdrawn. It is also not possible
any more to import ML objects in this \FW\ toplevel.
The current mechanism also differs from
the one in previous versions of \Coq: there is no more
an explicit toplevel for the language (formerly called \textsc{Fml}).
\asection{Generating ML code}
\comindex{Extraction}
\comindex{Recursive Extraction}
\comindex{Separate Extraction}
\comindex{Extraction Library}
\comindex{Recursive Extraction Library}
The next two commands are meant to be used for rapid preview of
extraction. They both display extracted term(s) inside \Coq.
\begin{description}
\item {\tt Extraction \qualid.} ~\par
Extracts one constant or module in the \Coq\ toplevel.
\item {\tt Recursive Extraction \qualid$_1$ \dots\ \qualid$_n$.} ~\par
Recursive extraction of all the globals (or modules) \qualid$_1$ \dots\
\qualid$_n$ and all their dependencies in the \Coq\ toplevel.
\end{description}
%% TODO error messages
All the following commands produce real ML files. User can choose to produce
one monolithic file or one file per \Coq\ library.
\begin{description}
\item {\tt Extraction "{\em file}"}
\qualid$_1$ \dots\ \qualid$_n$. ~\par
Recursive extraction of all the globals (or modules) \qualid$_1$ \dots\
\qualid$_n$ and all their dependencies in one monolithic file {\em file}.
Global and local identifiers are renamed according to the chosen ML
language to fulfill its syntactic conventions, keeping original
names as much as possible.
\item {\tt Extraction Library} \ident. ~\par
Extraction of the whole \Coq\ library {\tt\ident.v} to an ML module
{\tt\ident.ml}. In case of name clash, identifiers are here renamed
using prefixes \verb!coq_! or \verb!Coq_! to ensure a
session-independent renaming.
\item {\tt Recursive Extraction Library} \ident. ~\par
Extraction of the \Coq\ library {\tt\ident.v} and all other modules
{\tt\ident.v} depends on.
\item {\tt Separate Extraction}
\qualid$_1$ \dots\ \qualid$_n$. ~\par
Recursive extraction of all the globals (or modules) \qualid$_1$ \dots\
\qualid$_n$ and all their dependencies, just as {\tt
Extraction "{\em file}"}, but instead of producing one monolithic
file, this command splits the produced code in separate ML files, one per
corresponding Coq {\tt .v} file. This command is hence quite similar
to {\tt Recursive Extraction Library}, except that only the needed
parts of Coq libraries are extracted instead of the whole. The
naming convention in case of name clash is the same one as
{\tt Extraction Library} : identifiers are here renamed
using prefixes \verb!coq_! or \verb!Coq_!.
\end{description}
The list of globals \qualid$_i$ does not need to be
exhaustive: it is automatically completed into a complete and minimal
environment.
\asection{Extraction options}
\asubsection{Setting the target language}
\comindex{Extraction Language}
The ability to fix target language is the first and more important
of the extraction options. Default is Ocaml.
\begin{description}
\item {\tt Extraction Language Ocaml}.
\item {\tt Extraction Language Haskell}.
\item {\tt Extraction Language Scheme}.
\end{description}
\asubsection{Inlining and optimizations}
Since Objective Caml is a strict language, the extracted
code has to be optimized in order to be efficient (for instance, when
using induction principles we do not want to compute all the recursive
calls but only the needed ones). So the extraction mechanism provides
an automatic optimization routine that will be
called each time the user want to generate Ocaml programs. Essentially,
it performs constants inlining and reductions. Therefore some
constants may not appear in resulting monolithic Ocaml program.
In the case of modular extraction, even if some inlining is done, the
inlined constant are nevertheless printed, to ensure
session-independent programs.
Concerning Haskell, such optimizations are less useful because of
lazyness. We still make some optimizations, for example in order to
produce more readable code.
All these optimizations are controled by the following \Coq\ options:
\begin{description}
\item \comindex{Set Extraction Optimize}
{\tt Set Extraction Optimize.}
\item \comindex{Unset Extraction Optimize}
{\tt Unset Extraction Optimize.}
Default is Set. This control all optimizations made on the ML terms
(mostly reduction of dummy beta/iota redexes, but also simplifications on
Cases, etc). Put this option to Unset if you want a ML term as close as
possible to the Coq term.
\item \comindex{Set Extraction KeepSingleton}
{\tt Set Extraction KeepSingleton.}
\item \comindex{Unset Extraction KeepSingleton}
{\tt Unset Extraction KeepSingleton.}
Default is Unset. Normaly, when the extraction of an inductive type
produces a singleton type (i.e. a type with only one constructor, and
only one argument to this constructor), the inductive structure is
removed and this type is seen as an alias to the inner type.
The typical example is {\tt sig}. This option allows to disable this
optimization when one wishes to preserve the inductive structure of types.
\item \comindex{Set Extraction AutoInline}
{\tt Set Extraction AutoInline.}
\item \comindex{Unset Extraction AutoInline}
{\tt Unset Extraction AutoInline.}
Default is Set, so by default, the extraction mechanism feels free to
inline the bodies of some defined constants, according to some heuristics
like size of bodies, useness of some arguments, etc. Those heuristics are
not always perfect, you may want to disable this feature, do it by Unset.
\item \comindex{Extraction Inline}
{\tt Extraction Inline} \qualid$_1$ \dots\ \qualid$_n$.
\item \comindex{Extraction NoInline}
{\tt Extraction NoInline} \qualid$_1$ \dots\ \qualid$_n$.
In addition to the automatic inline feature, you can now tell precisely to
inline some more constants by the {\tt Extraction Inline} command. Conversely,
you can forbid the automatic inlining of some specific constants by
the {\tt Extraction NoInline} command.
Those two commands enable a precise control of what is inlined and what is not.
\item \comindex{Print Extraction Inline}
{\tt Print Extraction Inline}.
Prints the current state of the table recording the custom inlinings
declared by the two previous commands.
\item \comindex{Reset Extraction Inline}
{\tt Reset Extraction Inline}.
Puts the table recording the custom inlinings back to empty.
\end{description}
\paragraph{Inlining and printing of a constant declaration.}
A user can explicitly ask for a constant to be extracted by two means:
\begin{itemize}
\item by mentioning it on the extraction command line
\item by extracting the whole \Coq\ module of this constant.
\end{itemize}
In both cases, the declaration of this constant will be present in the
produced file.
But this same constant may or may not be inlined in the following
terms, depending on the automatic/custom inlining mechanism.
For the constants non-explicitly required but needed for dependency
reasons, there are two cases:
\begin{itemize}
\item If an inlining decision is taken, whether automatically or not,
all occurrences of this constant are replaced by its extracted body, and
this constant is not declared in the generated file.
\item If no inlining decision is taken, the constant is normally
declared in the produced file.
\end{itemize}
\asubsection{Extra elimination of useless arguments}
\begin{description}
\item \comindex{Extraction Implicit}
{\tt Extraction Implicit} \qualid\ [ \ident$_1$ \dots\ \ident$_n$ ].
This experimental command allows to declare some arguments of
\qualid\ as implicit, i.e. useless in extracted code and hence to
be removed by extraction. Here \qualid\ can be any function or
inductive constructor, and \ident$_i$ are the names of the concerned
arguments. In fact, an argument can also be referred by a number
indicating its position, starting from 1. When an actual extraction
takes place, an error is raised if the {\tt Extraction Implicit}
declarations cannot be honored, that is if any of the implicited
variables still occurs in the final code. This declaration of useless
arguments is independent but complementary to the main elimination
principles of extraction (logical parts and types).
\end{description}
\asubsection{Realizing axioms}\label{extraction:axioms}
Extraction will fail if it encounters an informative
axiom not realized (see Section~\ref{extraction:axioms}).
A warning will be issued if it encounters an logical axiom, to remind
user that inconsistent logical axioms may lead to incorrect or
non-terminating extracted terms.
It is possible to assume some axioms while developing a proof. Since
these axioms can be any kind of proposition or object or type, they may
perfectly well have some computational content. But a program must be
a closed term, and of course the system cannot guess the program which
realizes an axiom. Therefore, it is possible to tell the system
what ML term corresponds to a given axiom.
\comindex{Extract Constant}
\begin{description}
\item{\tt Extract Constant \qualid\ => \str.} ~\par
Give an ML extraction for the given constant.
The \str\ may be an identifier or a quoted string.
\item{\tt Extract Inlined Constant \qualid\ => \str.} ~\par
Same as the previous one, except that the given ML terms will
be inlined everywhere instead of being declared via a let.
\end{description}
Note that the {\tt Extract Inlined Constant} command is sugar
for an {\tt Extract Constant} followed by a {\tt Extraction Inline}.
Hence a {\tt Reset Extraction Inline} will have an effect on the
realized and inlined axiom.
Of course, it is the responsibility of the user to ensure that the ML
terms given to realize the axioms do have the expected types. In
fact, the strings containing realizing code are just copied in the
extracted files. The extraction recognizes whether the realized axiom
should become a ML type constant or a ML object declaration.
\Example
\begin{coq_example}
Axiom X:Set.
Axiom x:X.
Extract Constant X => "int".
Extract Constant x => "0".
\end{coq_example}
Notice that in the case of type scheme axiom (i.e. whose type is an
arity, that is a sequence of product finished by a sort), then some type
variables has to be given. The syntax is then:
\begin{description}
\item{\tt Extract Constant \qualid\ \str$_1$ \ldots \str$_n$ => \str.} ~\par
\end{description}
The number of type variables is checked by the system.
\Example
\begin{coq_example}
Axiom Y : Set -> Set -> Set.
Extract Constant Y "'a" "'b" => " 'a*'b ".
\end{coq_example}
Realizing an axiom via {\tt Extract Constant} is only useful in the
case of an informative axiom (of sort Type or Set). A logical axiom
have no computational content and hence will not appears in extracted
terms. But a warning is nonetheless issued if extraction encounters a
logical axiom. This warning reminds user that inconsistent logical
axioms may lead to incorrect or non-terminating extracted terms.
If an informative axiom has not been realized before an extraction, a
warning is also issued and the definition of the axiom is filled with
an exception labeled {\tt AXIOM TO BE REALIZED}. The user must then
search these exceptions inside the extracted file and replace them by
real code.
\comindex{Extract Inductive}
The system also provides a mechanism to specify ML terms for inductive
types and constructors. For instance, the user may want to use the ML
native boolean type instead of \Coq\ one. The syntax is the following:
\begin{description}
\item{\tt Extract Inductive \qualid\ => \str\ [ \str\ \dots \str\ ]\
{\it optstring}.} ~\par
Give an ML extraction for the given inductive type. You must specify
extractions for the type itself (first \str) and all its
constructors (between square brackets). If given, the final optional
string should contain a function emulating pattern-matching over this
inductive type. If this optional string is not given, the ML
extraction must be an ML inductive datatype, and the native
pattern-matching of the language will be used.
\end{description}
For an inductive type with $k$ constructor, the function used to
emulate the match should expect $(k+1)$ arguments, first the $k$
branches in functional form, and then the inductive element to
destruct. For instance, the match branch \verb$| S n => foo$ gives the
functional form \verb$(fun n -> foo)$. Note that a constructor with no
argument is considered to have one unit argument, in order to block
early evaluation of the branch: \verb$| O => bar$ leads to the functional
form \verb$(fun () -> bar)$. For instance, when extracting {\tt nat}
into {\tt int}, the code to provide has type:
{\tt (unit->'a)->(int->'a)->int->'a}.
As for {\tt Extract Inductive}, this command should be used with care:
\begin{itemize}
\item The ML code provided by the user is currently \emph{not} checked at all by
extraction, even for syntax errors.
\item Extracting an inductive type to a pre-existing ML inductive type
is quite sound. But extracting to a general type (by providing an
ad-hoc pattern-matching) will often \emph{not} be fully rigorously
correct. For instance, when extracting {\tt nat} to Ocaml's {\tt
int}, it is theoretically possible to build {\tt nat} values that are
larger than Ocaml's {\tt max\_int}. It is the user's responsability to
be sure that no overflow or other bad events occur in practice.
\item Translating an inductive type to an ML type does \emph{not}
magically improve the asymptotic complexity of functions, even if the
ML type is an efficient representation. For instance, when extracting
{\tt nat} to Ocaml's {\tt int}, the function {\tt mult} stays
quadratic. It might be interesting to associate this translation with
some specific {\tt Extract Constant} when primitive counterparts exist.
\end{itemize}
\Example
Typical examples are the following:
\begin{coq_example}
Extract Inductive unit => "unit" [ "()" ].
Extract Inductive bool => "bool" [ "true" "false" ].
Extract Inductive sumbool => "bool" [ "true" "false" ].
\end{coq_example}
If an inductive constructor or type has arity 2 and the corresponding
string is enclosed by parenthesis, then the rest of the string is used
as infix constructor or type.
\begin{coq_example}
Extract Inductive list => "list" [ "[]" "(::)" ].
Extract Inductive prod => "(*)" [ "(,)" ].
\end{coq_example}
As an example of translation to a non-inductive datatype, let's turn
{\tt nat} into Ocaml's {\tt int} (see caveat above):
\begin{coq_example}
Extract Inductive nat => int [ "0" "succ" ]
"(fun fO fS n -> if n=0 then fO () else fS (n-1))".
\end{coq_example}
\asubsection{Avoiding conflicts with existing filenames}
\comindex{Extraction Blacklist}
When using {\tt Extraction Library}, the names of the extracted files
directly depends from the names of the \Coq\ files. It may happen that
these filenames are in conflict with already existing files,
either in the standard library of the target language or in other
code that is meant to be linked with the extracted code.
For instance the module {\tt List} exists both in \Coq\ and in Ocaml.
It is possible to instruct the extraction not to use particular filenames.
\begin{description}
\item{\tt Extraction Blacklist \ident \ldots \ident.} ~\par
Instruct the extraction to avoid using these names as filenames
for extracted code.
\item{\tt Print Extraction Blacklist.} ~\par
Show the current list of filenames the extraction should avoid.
\item{\tt Reset Extraction Blacklist.} ~\par
Allow the extraction to use any filename.
\end{description}
For Ocaml, a typical use of these commands is
{\tt Extraction Blacklist String List}.
\asection{Differences between \Coq\ and ML type systems}
Due to differences between \Coq\ and ML type systems,
some extracted programs are not directly typable in ML.
We now solve this problem (at least in Ocaml) by adding
when needed some unsafe casting {\tt Obj.magic}, which give
a generic type {\tt 'a} to any term.
For example, here are two kinds of problem that can occur:
\begin{itemize}
\item If some part of the program is {\em very} polymorphic, there
may be no ML type for it. In that case the extraction to ML works
all right but the generated code may be refused by the ML
type-checker. A very well known example is the {\em distr-pair}
function:
\begin{verbatim}
Definition dp :=
fun (A B:Set)(x:A)(y:B)(f:forall C:Set, C->C) => (f A x, f B y).
\end{verbatim}
In Ocaml, for instance, the direct extracted term would be:
\begin{verbatim}
let dp x y f = Pair((f () x),(f () y))
\end{verbatim}
and would have type:
\begin{verbatim}
dp : 'a -> 'a -> (unit -> 'a -> 'b) -> ('b,'b) prod
\end{verbatim}
which is not its original type, but a restriction.
We now produce the following correct version:
\begin{verbatim}
let dp x y f = Pair ((Obj.magic f () x), (Obj.magic f () y))
\end{verbatim}
\item Some definitions of \Coq\ may have no counterpart in ML. This
happens when there is a quantification over types inside the type
of a constructor; for example:
\begin{verbatim}
Inductive anything : Set := dummy : forall A:Set, A -> anything.
\end{verbatim}
which corresponds to the definition of an ML dynamic type.
In Ocaml, we must cast any argument of the constructor dummy.
\end{itemize}
Even with those unsafe castings, you should never get error like
``segmentation fault''. In fact even if your program may seem
ill-typed to the Ocaml type-checker, it can't go wrong: it comes
from a Coq well-typed terms, so for example inductives will always
have the correct number of arguments, etc.
More details about the correctness of the extracted programs can be
found in \cite{Let02}.
We have to say, though, that in most ``realistic'' programs, these
problems do not occur. For example all the programs of Coq library are
accepted by Caml type-checker without any {\tt Obj.magic} (see examples below).
\asection{Some examples}
We present here two examples of extractions, taken from the
\Coq\ Standard Library. We choose \ocaml\ as target language,
but all can be done in the other dialects with slight modifications.
We then indicate where to find other examples and tests of Extraction.
\asubsection{A detailed example: Euclidean division}
The file {\tt Euclid} contains the proof of Euclidean division
(theorem {\tt eucl\_dev}). The natural numbers defined in the example
files are unary integers defined by two constructors $O$ and $S$:
\begin{coq_example*}
Inductive nat : Set :=
| O : nat
| S : nat -> nat.
\end{coq_example*}
This module contains a theorem {\tt eucl\_dev}, whose type is:
\begin{verbatim}
forall b:nat, b > 0 -> forall a:nat, diveucl a b
\end{verbatim}
where {\tt diveucl} is a type for the pair of the quotient and the
modulo, plus some logical assertions that disappear during extraction.
We can now extract this program to \ocaml:
\begin{coq_eval}
Reset Initial.
\end{coq_eval}
\begin{coq_example}
Require Import Euclid Wf_nat.
Extraction Inline gt_wf_rec lt_wf_rec induction_ltof2.
Recursive Extraction eucl_dev.
\end{coq_example}
The inlining of {\tt gt\_wf\_rec} and others is not
mandatory. It only enhances readability of extracted code.
You can then copy-paste the output to a file {\tt euclid.ml} or let
\Coq\ do it for you with the following command:
\begin{verbatim}
Extraction "euclid" eucl_dev.
\end{verbatim}
Let us play the resulting program:
\begin{verbatim}
# #use "euclid.ml";;
type nat = O | S of nat
type sumbool = Left | Right
val minus : nat -> nat -> nat = <fun>
val le_lt_dec : nat -> nat -> sumbool = <fun>
val le_gt_dec : nat -> nat -> sumbool = <fun>
type diveucl = Divex of nat * nat
val eucl_dev : nat -> nat -> diveucl = <fun>
# eucl_dev (S (S O)) (S (S (S (S (S O)))));;
- : diveucl = Divex (S (S O), S O)
\end{verbatim}
It is easier to test on \ocaml\ integers:
\begin{verbatim}
# let rec nat_of_int = function 0 -> O | n -> S (nat_of_int (n-1));;
val i2n : int -> nat = <fun>
# let rec int_of_nat = function O -> 0 | S p -> 1+(int_of_nat p);;
val n2i : nat -> int = <fun>
# let div a b =
let Divex (q,r) = eucl_dev (nat_of_int b) (nat_of_int a)
in (int_of_nat q, int_of_nat r);;
val div : int -> int -> int * int = <fun>
# div 173 15;;
- : int * int = (11, 8)
\end{verbatim}
Note that these {\tt nat\_of\_int} and {\tt int\_of\_nat} are now
available via a mere {\tt Require Import ExtrOcamlIntConv} and then
adding these functions to the list of functions to extract. This file
{\tt ExtrOcamlIntConv.v} and some others in {\tt plugins/extraction/}
are meant to help building concrete program via extraction.
\asubsection{Extraction's horror museum}
Some pathological examples of extraction are grouped in the file
{\tt test-suite/success/extraction.v} of the sources of \Coq.
\asubsection{Users' Contributions}
Several of the \Coq\ Users' Contributions use extraction to produce
certified programs. In particular the following ones have an automatic
extraction test (just run {\tt make} in those directories):
\begin{itemize}
\item Bordeaux/Additions
\item Bordeaux/EXCEPTIONS
\item Bordeaux/SearchTrees
\item Dyade/BDDS
\item Lannion
\item Lyon/CIRCUITS
\item Lyon/FIRING-SQUAD
\item Marseille/CIRCUITS
\item Muenchen/Higman
\item Nancy/FOUnify
\item Rocq/ARITH/Chinese
\item Rocq/COC
\item Rocq/GRAPHS
\item Rocq/HIGMAN
\item Sophia-Antipolis/Stalmarck
\item Suresnes/BDD
\end{itemize}
Lannion, Rocq/HIGMAN and Lyon/CIRCUITS are a bit particular. They are
examples of developments where {\tt Obj.magic} are needed.
This is probably due to an heavy use of impredicativity.
After compilation those two examples run nonetheless,
thanks to the correction of the extraction~\cite{Let02}.
%%% Local Variables:
%%% mode: latex
%%% TeX-master: "Reference-Manual"
%%% End:
|