1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428
|
Copyright (c) 2001, 2002, Lucent Technologies, Bell Laboratories
author: Matthias Blume (blume@research.bell-labs.com)
This directory contains ML-NLFFI-Gen, a glue-code generator for
the new "NLFFI" foreign function interface. The generator reads
C source code and emits ML code along with a description file for CM.
Compiling this generator requires the C-Kit ($/ckit-lib.cm) to be
installed.
---------------------------------------------------------------------
February 21, 2002: Major changes:
I reworked the glue code generator in a way that lets generated code
scale better -- at the expense of some (mostly academic) generality.
Changes involve the following:
1. The functorization is gone.
2. Every top-level C declaration results in a separate top-level
ML equivalent (implemented by its own ML source file).
3. Incomplete pointer types are treated just like their complete
versions -- the only difference being that no RTTI will be
available for them. In the "light" interface, this rules out
precisely those operations over them that C would disallow.
4. All related C sources must be supplied to ml-nlffigen together.
Types incomplete in one source but complete in another get
automatically completed in a cross-file fashion.
5. The handle for the shared library to link to is now abstracted as
a function closure. Moreover, it must be supplied as a top-level
variable (by the programmer). For this purpose, ml-nlffigen has
corresponding command-line options.
These changes mean that even very large (in number of exported definitions)
libraries such as, e.g., GTK can now be handled gracefully without
reaching the limits of the ML compiler's abilities.
[The example of GTK -- for which ml-nlffigen creates several thousands (!)
of separate ML source files -- puts an unusal burden on CM, though.
However, aside from running a bit longer than usual, CM handles loads
of this magnitute just fine. Stabilizing the resulting library solves
the problem entirely as far as later clients are concerned.]
Sketch of translation- (and naming-) scheme:
struct foo { ... }
--> structure ST_foo in st-foo.sml (not exported)
basic type info (name, size)
& structure S_foo in s-foo.sml
abstract interface to the type
field accessors f_xxx (unless -light)
and f_xxx' (unless -heavy)
field types t_f_xxx
field RTTI typ_f_xxx
& (unless "-nosucvt" was set)
structures IS_foo in <a>/is-foo.sml
(see discussion of struct *foo below)
union foo { ... }
--> structure UT_foo in ut-foo.sml (not exported)
basic type info (name, size)
& structure U_foo in u-foo.sml
abstract interface to the type
field accessors f_xxx (unless -light)
and f_xxx' (unless -heavy)
field types t_f_xxx
field RTTI typ_f_xxx
& (unless "-nosucvt" was set)
structures IU_foo in <a>/iu-foo.sml
(see discussion of union *foo below)
struct { ... }
like struct <n> { ... }, where <n> is a fresh integer or 'bar
if 'struct { ... }' occurs in the context of a
'typedef struct { ... } bar'
union { ... }
like union <n> { ... }, where <n> is a fresh integer or 'bar
if 'union { ... }' occurs in the context of a
'typedef union { ... } bar'
enum foo { ... }
--> structure E_foo in e-foo.sml
external type mlrep with
enum constants e_xxx
conversion functions between tag enum and mlrep
between mlrep and sint
access functions (get/set) that operate on mlrep
(as an alternative to C.Get.enum/C.Set.enum which
operate on sint)
If the command-line optino "-ec" ("-enum-constructors") was set
and the values of all enum constants are different from each
other, then mlrep will be a datatype (thus making it possible
to pattern-match).
enum { ... }
If this construct appears in the context of a surrounding
(non-anonymous) struct or union or typedef, the enumeration gets
assigned an artificial tag (just like similar structs and unions,
see above).
Unless the command-line option "-nocollect" was specified, then
all constants in other (truly) unnamed enumerations will be
collected into a single enumeration represented by structure E_'.
This single enumeration is then treated like a regular enumeration
(including handling of "-ec" -- see above).
The default behavior ("collect") is to assign a fresh integer
tag (again, just like in the struct/union case).
T foo (T, ..., T) (global function/function prototype)
--> structure F_foo in f-foo.sml
containing three/four members:
typ : RTTI
fptr: thunkified fptr representing the C function
maybe f' : light-weight function wrapper around fptr
Turned off by -heavy (see below).
maybe f : heavy-weight function wrapper around fptr
Turned off by -light (see below).
T foo; (global variable)
--> structure G_foo in g-foo.sml
containing three members:
t : type
typ : RTTI
obj : thunkified object representing the C variable
struct foo * (without existing definition of struct foo; incomplete type)
--> an internal structure ST_foo with a type "tag" (just like in
the struct foo { ... } case)
The difference is that no structure S_foo will be generated,
so there is no field-access interface and no RTTI (size or typ)
for this. All "light-weight" functions referring to this
pointer type will be generated, heavy-weight functions will
be generated only if they do not require access to RTTI.
If "-heavy" was specified but a heavy interface function
cannot be generated because of incomplete types, then its
light counterpart will be issued generated anyway.
union foo * Same as with struct foo *, but replace S_foo with U_foo
and ST_foo with UT_foo.
Additional files for implementing function entry sequences are created
and used internally. They do not contribute exports, though.
Command-line options for ml-nlffigen:
General syntax: ml-nlffigen <option> ... [--] <C-file> ...
Environment variables:
Ml-nlffigen looks at the environment variable FFIGEN_CPP to obtain
the template string for the cpp command line. If FFIGEN_CPP is not
set, the template defaults to "gcc -E -U__GNUC__ %o %s > %t".
The actual command line is obtained by substituting occurences of
%s with the name of the source, and %t with the name of a temporary
file holding the pre-processed code.
Options:
-dir <dir> output directory where all generated files are placed
-d <dir> default: "NLFFI-Generated"
-allSU instructs ml-nlffigen to include all structs and unions,
even those that are defined in included files (as opposed
to files explicitly listed as arguments)
default: off
-width <w> sets output line width (just a guess) to <w>
-w <w> default: 75
-smloption <x> instructs ml-nlffigen to include <x> into the list
of options to annotate .sml entries in the generated .cm
file with. By default, the list consists just of "noguid".
-guid Removes the default "noguid" from the list of sml options.
(This re-enables strict handling of type- and object-identity
but can have negative impact on CM cutoff recompilation
performance if the programmer routinely removes the entire
tree of ml-nlffigen-generated files during development.)
(*
-lambdasplit <x> instructs ml-nlffigen to generate "lambdasplit"
-ls <x> options for all ML files (see CM manual for what this means;
it does not currently work anyway because cross-module
inlining is broken).
default: nothing
*)
-target <t> Sets the target to <t> (which must be one of "sparc-unix",
-t <t> "x86-unix", or "x86-win32").
default: current architecture
-light suppress "heavy" versions of function wrappers and
-l field accessors; also resets any earlier -heavy to default
default: not suppressed
-heavy suppress "light" versions of function wrappers and
-h field accessors; also resets any earlier -light to default
default: not suppressed
-namedargs instruct ml-nlffigen to generated function wrappers that
-na use named arguments (ML records) instead of tuples if
there is enough information for this in the C source;
(this is not always very useful)
default: off
-nocollect Do not do the following:
Collect enum constants from truly unnamed enumerations
(those without tags that occur at toplevel or in an
unnamed context, i.e., not in a typedef or another
named struct or union) into a single artificial
enumeration tagged by ' (single apostrohe). The corresponding
ML-side representative will be a structure named E_'.
-enum-constructors
-ec When possible (i.e., if all values of a given enumeration
are different from each other), make the ML representation
type of the enumeration a datatype. The default (and
fallback) is to make that type the same as MLRep.Signed.int.
-libhandle <h> Use the variable <h> to refer to the handle to the
-lh <h> shared library object. Given the constraints of CM, <h>
must have the form of a long ML identifier, e.g.,
MyLibrary.libhandle.
default: Library.libh
-include <f> Mention file <f> in the generated .cm file. This option
-add <f> is necessary at least once for providing the library handle.
It can be used arbitrarily many times, resulting in more
than one such programmer-supplied file to be mentioned.
If <f> is relative, then it must be relative to the directory
specified in the -dir <dir> option.
-cmfile <f> Specify name of the generated .cm file, relative to
-cm <f> the directory specified by the -dir <dir> option.
default: nlffi-generated.cm
-cppopt <o> The string <o> gets added to the list of options to be
passed to cpp (the C preprocessor). The list of options
gets substituted for %o in the cpp command line template.
-U<x> The string -U<x> gets added to the list of cpp options.
-D<x> The string -D<x> gets added to the list of cpp options.
-I<x> The string -I<x> gets added to the list of cpp options.
-version Just write the version number of ml-nlffigen to standard
output and then quit.
-match <r> Normally ml-nlffigen will include ML definitions for a C
-m <r> declaration if the C declaration textually appears in
one of the files specified at the command line. Definitions
in #include-d files will normally not appear (unless
their absence would lead to inconsistencies).
By specifying -match <r>, ml-nlffigen will also include
definitions that occur in recursively #include-d files
for which the AWK-style regular expression <r> matches
their names.
-prefix <p> Generated ML structure names will all have prefix <p>
-p <p> (in addition to the usual "S_" or "U_" or "F_" ...)
-gensym <g> Names "gensym-ed" by ml-nlffigen (for anonymous struct/union/
-g <g> enums) will get an additional suffix _<g>. (This should
be used if output from several indepdendent runs of
ml-nlffigen are to coexist in the same ML program.)
-- Terminate processing of options, remaining arguments are
taken to be C sources.
----------------------------------------------------------------------
Sample usage:
Suppose we have a C interface defined in foo.h.
1. Running ml-nlffigen:
It is best to let a tool such as Unix' "make" handle the invocation of
ml-nlffigen. The following "Makefile" can be used as a template for
other projects:
+----------------------------------------------------------
|FILES = foo.h
|H = FooH.libh
|D = FFI
|HF = ../foo-h.sml
|CF = foo.cm
|
|$(D)/$(CF): $(FILES)
| ml-nlffigen -include $(HF) -libhandle $(H) -dir $(D) -cmfile $(CF) $^
+----------------------------------------------------------
Suppose the above file is stored as "foo.make". Running
$ make -f foo.make
will generate a subdirectory "FFI" full of ML files corresponding to
the definitions in foo.h. Access to the generated ML code is gained
by refering to the CM library FFI/foo.cm; the .cm-file (foo.cm) is
also produced by ml-nlffigen.
2. The ML code uses the library handle specified in the command line
(here: FooH.libh) for dynamic linking. The type of FooH.libh must
be:
FooH.libh : string -> unit -> CMemory.addr
That is, FooH.libh takes the name of a symbol and produces that
symbol's suspended address.
The code that implements FooH.libh must be provided by the programmer.
In the above example, we assume that it is stored in file foo-h.sml.
The name of that file must appear in the generated .cm-file, hence the
"-include" command-line argument.
Notice that the name provided to ml-nlffigen must be relative to the
output directory. Therefore, in our case it is "../foo-h.sml" and not
just foo-h.sml (because the full path would be FFI/../foo-h.sml).
3. To actually implement FooH.libh, use the "DynLinkage" module.
Suppose the shared library's name is "/usr/lib/foo.so". Here is
the corresponding contents of foo-h.sml:
+-------------------------------------------------------------
|structure FooH = struct
| local
| val lh = DynLinkage.open_lib
| { name = "/usr/lib/foo.so", global = true, lazy = true }
| in
| fun libh s = let
| val sh = DynLinkage.lib_symbol (lh, s)
| in
| fn () => DynLinkage.addr sh
| end
| end
|end
+-------------------------------------------------------------
If all the symbols you are linking to are already available within
the ML runtime system, then you don't need to open a new shared
object. As a result, your FooH implementation would look like this:
+-------------------------------------------------------------
|structure FooH = struct
| fun libh s = let
| val sh = DynLinkage.lib_symbol (DynLinkage.main_lib, s)
| in
| fn () => DynLinkage.addr sh
| end
|end
+-------------------------------------------------------------
If the symbols your are accessing are strewn across several separate
shared objects, then there are two possible solutions:
a) Open several shared libraries and perform a trial-and-error search
for every symbol you are looking up. (The DynLinkage module raises
an exception (DynLinkError of string) if the lookup fails. This
could be used to daisy-chain lookup operations.)
[Be careful: Sometimes there are non-obvious inter-dependencies
between shared libraries. Consider using DynLinkage.open_lib'
to express those.]
b) A simpler and more robust way of accessing several shared libraries
is to create a new "summary" library object at the OS level.
Supposed you are trying to access /usr/lib/foo.so and /usr/lib/bar.so.
The solution is to make a "foobar.so" object by saying:
$ ld -shared -o foobar.so /usr/lib/foo.so /usr/lib/bar.so
The ML code then referes to foobar.so and the Linux dynamic loader
does the rest.
4. To put it all together, let's wrap it up in a .cm-file. For example,
if we simply want to directly make the ml-nlffigen-generated definitions
available to the "end user", we could write this wrapper .cm-file
(let's call it foo.cm):
+-------------------------------------------------------------
|library
| library(FFI/foo.cm)
|is
| $/basis.cm
| $/c.cm
| FFI/foo.cm : make (-f foo.make)
+-------------------------------------------------------------
Now, saying
$ sml -m foo.cm
is all one need's to do in order to compile. (CM will automatically
invoke "make", so you don't have to run "make" separately.)
If the goal is not to export the "raw" ml-nlffigen-generated stuff
but rather something more nicely "wrapped", consider writing wrapper
ML code. Suppose you have wrapper definitions for structure Foo_a
and structure Foo_b with code for those in wrap-foo-a.sml and
wrap-foo-b.sml. In this case the corresponding .cm-file would
look like the following:
+-------------------------------------------------------------
|library
| structure Foo_a
| structure Foo_b
|is
| $/basis.cm
| $/c.cm
| FFI/foo.cm : make (-f foo.make)
| wrapper-foo-a.sml
| wrapper-foo-b.sml
+-------------------------------------------------------------
|