- The Design Space
- GHCJSโs FFI
- Lightweight safety checks
- Returning multiple values
- Changes in the FFI System for the JS Backend
Users of GHCJS enjoyed a rich FFI system for foreign JavaScript imports. However, this has changed during our adaptation of GHCJS to GHC 9.x. This short post goes over the GHCJS FFI system, the motivation for these changes and what the changes are. First, we must consider the design space of an FFI system.
The Design Space
FFI code is typically employed in high performance scenarios. Additionally, users of the FFI do not want to deal with the object language the compiler is compiling to. Instead, users want a simple way to call functions from the object language and use them in their own code as normal Haskell functions. However, users of the FFI system do tend to be power users, and so as a design principle we want to expose the tools they need to achieve their performance needs, whatever those needs may be. We can summarize these constraints as follows:
- The FFI must abstract the JavaScript backendโs infidelities away as much as
possible. That is, users of the FFI should need to worry about the
Int64#
representation, but should also be able to simply follow standard patterns we have written inbase
. - The FFI must provide tools to achieve high performance code, even if those tools require up front knowledge of the runtime system to use. However, these tools should not be in the path of least resistance to use the FFI system.
- The FFI must provide a lightweight specification that userโs program against for the JS backend to optimize the imported function and for good error messages for users.
GHCJSโs FFI sets a high (qualitative) benchmark on these three constraints. Letโs inspect them each in detail, in no particular order.
GHCJSโs FFI
In GHCJS, a user could take advantage of JavaScript functions in their Haskell
code using the GHCJSโs FFI. However, the syntax was unique to GHCJS with place
holder variables like one might see in perl, nix, or bash. For example, here is
a foreign import from the base
library for st_size
:
-- base/System/Posix/Internal.hs
-- the JS FFI version
foreign import javascript unsafe "$r1 = h$base_st_size($1_1,$1_2); $r2 = h$ret1;"
st_size :: Ptr CStat -> IO Int64
The syntax is different from what we know and love in the normal Haskell world
but the grammar is straightforward. We declare a foreign import
from javascript
,
state that the import is unsafe
or interruptible
and then provide a string,
h$base_fstat(...)
for the code generator to use when compiling. Compare this
with the C version:
-- base/System/Posix/Internal.hs
-- the C FFI version
foreign import ccall unsafe "HsBase.h __hscore_st_size"
st_size :: Ptr CStat -> IO Int64
And we see that they are similar. The only difference is the strange $n
symbols in the referrent string. Contrast this with the C version, which simply
declares a name.
These symbols are place holder variables with special meaning in GHCJS. There
are two intractable reasons for the placeholder patterns. First, we require
these patterns to work around the limitations of JavaScript as a backend (1).
For example, consider the case where we need to return an Int64#
from an
imported foreign function. In C and Haskell this is not a problem because both
can represent Int64#
natively, however JavaScript only has native support for
32-bit values. Thus, to be able to return an Int64#
we need to have a method to
return two 32-bit numbers. Similarly, in order to apply a function to an Int64#
that function must take at least two arguments, one for the high bits and one
for the low. Second, the referrent string is untyped and can contain arbritrary
JavaScript code. So placeholder patterns provide a simply and lightweight way
for safety checks and eliminate classes of untyped, hard to understand errors.
For example, consider an arity mismatch error between a function definition and
call site. When this happens JavaScript happily continues processing with the
return value from the function application defined as NaN
(of course). Such
arity conflicts can easily occur, especially when dealing with 64-bit values
which require function arity assumptions.
Lightweight safety checks
Lightweight safety checks (3) are done by GHCJS by parsing the names of the place holder variables; each of which follows a specific naming convention. This convention is:
- Argument types:
$n
: Used for unary arguments, i.e., arguments which require only a single register.$n_n
: Used for binary arguments, i.e., arguments which require two registers.$c
: A continuation argument, only valid forinterruptible
foreign functions.
- Return types:
$r
: a unary return$r1
,$r2
: a binary return$r1
,$r2
,$r3_1
,$r3_2
: unboxed tuple return
- Top level patterns:
"&value"
: simply emitted asvalue
by the code generator"someFunction"
: emitted asret = someFunction(...)
, i.e., map the FFI to the result of the function call."$r = $1.f($2)"
: emitted asr1 = a1.f(a2)
, i.e., a combination of a function call and a property access.
With this standard GHCJS then parses the FFI referrent string to ensure that it
conforms to this standard. If not then GHCJS can at least respond to the user
with an ill-formatted FFI message and say precisely where the issue is. For
example, it could respond that only half of an Int64#
is returned based on the
referrent string and the function type.
Returning multiple values
But what of performant code? GHCJS achieves performant FFI by not trying to
abstract away from the runtime system. Instead, an advantage of GHCJSโs FFI is
that we can specify exactly which registers the foreign function should dump its
results or even arbitrary global variables. This places more burden on the user
of the FFI in specific scenarios, but crucially allows the FFI system to get out
of the way of the user. The FFI system also exploits this capability to return
multiple values from a single function call, which is a common need when
compiling to JavaScript. For example, in the above code st_size
is declared to
return an IO Int64
, the JavaScript handler h$base_st_size
returns the Int64
using two registers $r1
and $r2
, but does so through the use of a special
purpose global variable called h$ret1
:
function h$base_st_size(stat, stat_off) {
h$ret1 = (stat.i3[(stat_off>>2)+2]);
return (stat.i3[(stat_off>>2)+1]);
}
The function inputs a pointer and an offset. Pointers in GHCJS are simply
pointers to ByteArrays so the function indexes into the ByteArray and retrieves
and stores the lower 32-bits in h$ret1
, then returns the higher 32-bits
directly. These results are picked up by the FFI code, which performs assignment
to set $r1
to the result of the function call (the higher 32-bits), and set $r2
to the value of h$ret1
(the lower 32-bits). Crucially, the runtime system needs
to do nothing. The registers are already handled ready to be consumed by
whatever the caller of the foreign function will do.
One might consider using a simpler design, which trades register juggling for a
more straightforward representation such as a ByteArray which stores the Int64#
.
However, such a design would trade speed for implementation simplicity. If we
passed ByteArrays then each foreign function would spend time wrapping and
unwrapping the array to get the payload; clearly an undesirable outcome for high
performance code.
Changes in the FFI System for the JS Backend
So we see that GHCJSโs FFI system actually performs quite well in the design
space. Power users are well supported and can leverage enough unsafety to bind
global variables like h$ret1
and specific registers such as $r1
. The system
provides some lightweight checking through parsing. The nuances of the
JavaScript platform are generally abstracted over and the FFI system is tuned
for performance critical scenarios. So why change it?
The short answer is to hit deadlines. By skipping the FFI parsing the JS Backend team was able to produce a working (can output โHello World!โ, and compile GHCโs boot libraries), integrated, JS backend in GHC faster than had we finished the FFI system.
For the time being, we have opted to replaced each foreign function call with a JavaScript fat arrow, for example:
foreign import javascript unsafe "(($1_1,$1_2) => { return h$base_st_size($1_1,$1_2); })"
st_size :: Ptr CStat -> IO Int64
Of course, this situation is untenable, as argued above, FFI code is assumed to be used in performance critical code, and thus any extra overhead, such as a function closure and consequent indirection, must be avoided. But fear not! In the near future weโll be overhauling the FFI system and returning it to its former glory.