Skip to main content

2 posts tagged with "knowledge_engineering"

View All Tags

ยท 8 min read
  1. The Design Space
  2. GHCJSโ€™s FFI
  3. Lightweight safety checks
  4. Returning multiple values
  5. Changes in the FFI System for the JS Backend

Users of GHCJS enjoyed a rich FFI system for foreign JavaScript imports. However, this has changed during our adaptation of GHCJS to GHC 9.x. This short post goes over the GHCJS FFI system, the motivation for these changes and what the changes are. First, we must consider the design space of an FFI system.

The Design Space

FFI code is typically employed in high performance scenarios. Additionally, users of the FFI do not want to deal with the object language the compiler is compiling to. Instead, users want a simple way to call functions from the object language and use them in their own code as normal Haskell functions. However, users of the FFI system do tend to be power users, and so as a design principle we want to expose the tools they need to achieve their performance needs, whatever those needs may be. We can summarize these constraints as follows:

  1. The FFI must abstract the JavaScript backendโ€™s infidelities away as much as possible. That is, users of the FFI should need to worry about the Int64# representation, but should also be able to simply follow standard patterns we have written in base.
  2. The FFI must provide tools to achieve high performance code, even if those tools require up front knowledge of the runtime system to use. However, these tools should not be in the path of least resistance to use the FFI system.
  3. The FFI must provide a lightweight specification that userโ€™s program against for the JS backend to optimize the imported function and for good error messages for users.

GHCJSโ€™s FFI sets a high (qualitative) benchmark on these three constraints. Letโ€™s inspect them each in detail, in no particular order.

GHCJSโ€™s FFI

In GHCJS, a user could take advantage of JavaScript functions in their Haskell code using the GHCJSโ€™s FFI. However, the syntax was unique to GHCJS with place holder variables like one might see in perl, nix, or bash. For example, here is a foreign import from the base library for st_size:

-- base/System/Posix/Internal.hs
-- the JS FFI version
foreign import javascript unsafe "$r1 = h$base_st_size($1_1,$1_2); $r2 = h$ret1;"
st_size :: Ptr CStat -> IO Int64

The syntax is different from what we know and love in the normal Haskell world but the grammar is straightforward. We declare a foreign import from javascript, state that the import is unsafe or interruptible and then provide a string, h$base_fstat(...) for the code generator to use when compiling. Compare this with the C version:

-- base/System/Posix/Internal.hs
-- the C FFI version
foreign import ccall unsafe "HsBase.h __hscore_st_size"
st_size :: Ptr CStat -> IO Int64

And we see that they are similar. The only difference is the strange $n symbols in the referrent string. Contrast this with the C version, which simply declares a name.

These symbols are place holder variables with special meaning in GHCJS. There are two intractable reasons for the placeholder patterns. First, we require these patterns to work around the limitations of JavaScript as a backend (1). For example, consider the case where we need to return an Int64# from an imported foreign function. In C and Haskell this is not a problem because both can represent Int64# natively, however JavaScript only has native support for 32-bit values. Thus, to be able to return an Int64# we need to have a method to return two 32-bit numbers. Similarly, in order to apply a function to an Int64# that function must take at least two arguments, one for the high bits and one for the low. Second, the referrent string is untyped and can contain arbritrary JavaScript code. So placeholder patterns provide a simply and lightweight way for safety checks and eliminate classes of untyped, hard to understand errors. For example, consider an arity mismatch error between a function definition and call site. When this happens JavaScript happily continues processing with the return value from the function application defined as NaN (of course). Such arity conflicts can easily occur, especially when dealing with 64-bit values which require function arity assumptions.

Lightweight safety checks

Lightweight safety checks (3) are done by GHCJS by parsing the names of the place holder variables; each of which follows a specific naming convention. This convention is:

  • Argument types:
    • $n: Used for unary arguments, i.e., arguments which require only a single register.
    • $n_n: Used for binary arguments, i.e., arguments which require two registers.
    • $c: A continuation argument, only valid for interruptible foreign functions.
  • Return types:
    • $r: a unary return
    • $r1, $r2: a binary return
    • $r1, $r2, $r3_1, $r3_2: unboxed tuple return
  • Top level patterns:
    • "&value": simply emitted as value by the code generator
    • "someFunction": emitted as ret = someFunction(...), i.e., map the FFI to the result of the function call.
    • "$r = $1.f($2)": emitted as r1 = a1.f(a2), i.e., a combination of a function call and a property access.

With this standard GHCJS then parses the FFI referrent string to ensure that it conforms to this standard. If not then GHCJS can at least respond to the user with an ill-formatted FFI message and say precisely where the issue is. For example, it could respond that only half of an Int64# is returned based on the referrent string and the function type.

Returning multiple values

But what of performant code? GHCJS achieves performant FFI by not trying to abstract away from the runtime system. Instead, an advantage of GHCJSโ€™s FFI is that we can specify exactly which registers the foreign function should dump its results or even arbitrary global variables. This places more burden on the user of the FFI in specific scenarios, but crucially allows the FFI system to get out of the way of the user. The FFI system also exploits this capability to return multiple values from a single function call, which is a common need when compiling to JavaScript. For example, in the above code st_size is declared to return an IO Int64, the JavaScript handler h$base_st_size returns the Int64 using two registers $r1 and $r2, but does so through the use of a special purpose global variable called h$ret1:

function h$base_st_size(stat, stat_off) {
h$ret1 = (stat.i3[(stat_off>>2)+2]);
return (stat.i3[(stat_off>>2)+1]);
}

The function inputs a pointer and an offset. Pointers in GHCJS are simply pointers to ByteArrays so the function indexes into the ByteArray and retrieves and stores the lower 32-bits in h$ret1, then returns the higher 32-bits directly. These results are picked up by the FFI code, which performs assignment to set $r1 to the result of the function call (the higher 32-bits), and set $r2 to the value of h$ret1 (the lower 32-bits). Crucially, the runtime system needs to do nothing. The registers are already handled ready to be consumed by whatever the caller of the foreign function will do.

One might consider using a simpler design, which trades register juggling for a more straightforward representation such as a ByteArray which stores the Int64#. However, such a design would trade speed for implementation simplicity. If we passed ByteArrays then each foreign function would spend time wrapping and unwrapping the array to get the payload; clearly an undesirable outcome for high performance code.

Changes in the FFI System for the JS Backend

So we see that GHCJSโ€™s FFI system actually performs quite well in the design space. Power users are well supported and can leverage enough unsafety to bind global variables like h$ret1 and specific registers such as $r1. The system provides some lightweight checking through parsing. The nuances of the JavaScript platform are generally abstracted over and the FFI system is tuned for performance critical scenarios. So why change it?

The short answer is to hit deadlines. By skipping the FFI parsing the JS Backend team was able to produce a working (can output โ€œHello World!โ€, and compile GHCโ€™s boot libraries), integrated, JS backend in GHC faster than had we finished the FFI system.

For the time being, we have opted to replaced each foreign function call with a JavaScript fat arrow, for example:

foreign import javascript unsafe "(($1_1,$1_2) => { return h$base_st_size($1_1,$1_2); })"
st_size :: Ptr CStat -> IO Int64

Of course, this situation is untenable, as argued above, FFI code is assumed to be used in performance critical code, and thus any extra overhead, such as a function closure and consequent indirection, must be avoided. But fear not! In the near future weโ€™ll be overhauling the FFI system and returning it to its former glory.

ยท 11 min read
  1. GHC Primitives
    1. The Easy Cases
    2. ByteArray#, MutableByteArray#, SmallArray#, MutableSmallArray#,
    3. Addr# and StablePtr#
    4. Numbers: The Involved Case
      1. Working with 64-bit Types
      2. Unwrapped Number Optimization
    5. But what about the other stuff!

One of the key challenges in any novel backend is representing GHC primitive types in the new backend. For JavaScript, this is especially tricky, as JavaScript only has 8 primitive types and some of those types, such as number do not directly map to any Haskell primitive type, such as Int8#. This post walks through the most important GHC primitives and describes our implementation for each in the JavaScript backend. This post is intended to be an explanation-oriented post, light on details, but just enough to understand how the system works.

GHC Primitives

There are 36 primtypes that GHC defines in primops.txt.pp:

  1. Char#
  2. Int8#, Int16#, Int32#, Int64#, Int#
  3. Word8#, Word16#, Word32#, Word64#, Word#
  4. Double#, Float#,
  5. Array#, MutableArray#,, SmallArray#, SmallMutableArray#
  6. ByteArray#, MutableByteArray#
  7. Addr#
  8. MutVar#, TVar#, MVar#,
  9. IOPort#, State#, RealWorld, ThreadId#
  10. Weak#, StablePtr#, StableName#, Compact#, BCO,
  11. Fun, Proxy#
  12. StackSnapshot#
  13. VECTOR

Some of these are unsupported in the JS-backend, such as VECTOR or lower priority such as StackSnapshot#. Weโ€™ll begin with the easy cases.

The Easy Casesโ€‹

The easy cases are the cases that are implemented as JavaScript objects. In general, this is the big hammer used when nothing else will do. Weโ€™ll expand on the use of objectsโ€”especially representing heap objectsโ€”in a future post, but for the majority of cases we mimic the STG-machine behavior for GHC heap objects using JavaScript heap objects. For example,

var someConstructor =
{ f = // entry function of the datacon worker
, m = 0 // garbage collector mark
, d1 = first arg // First data field for the constructor
, d2 = arity = 2: second arg // second field, or object containing the remaining fields
arity > 2: { d1, d2, ...} object with remaining args (starts with "d1 = x2"!)
}

This is the general recipe; we define a JavaScript object that contains properties which correspond to the entry function of the heap object; in this case that is the entry function, f for a constructor, some meta data for garbage collection m, and pointers to the fields of the constructor or whatever else the heap object might need. Using JavaScript objects allows straightforward translations of several GHC types. For example TVars and MVars:

// stg.js.pp
/** @constructor */
function h$TVar(v) {
TRACE_STM("creating TVar, value: " + h$collectProps(v));
this.val = v; // current value
this.blocked = new h$Set(); // threads that get woken up if this TVar is updated
this.invariants = null; // invariants that use this TVar (h$Set)
this.m = 0; // gc mark
this._key = ++h$TVarN; // for storing in h$Map/h$Set
#ifdef GHCJS_DEBUG_ALLOC
h$debugAlloc_notifyAlloc(this);
#endif
}

// stm.js.pp
function h$MVar() {
TRACE_SCHEDULER("h$MVar constructor");
this.val = null;
this.readers = new h$Queue();
this.writers = new h$Queue();
this.waiters = null; // waiting for a value in the MVar with ReadMVar
this.m = 0; // gc mark
this.id = ++h$mvarId;
#ifdef GHCJS_DEBUG_ALLOC
h$debugAlloc_notifyAlloc(this);
#endif
}

Notice that both implementations defined properties specific to the semantics of the Haskell type. JavaScript functions which create these objects follow the naming convention h$<something> and reside in Shim files. Shim files are JavaScript files that the JS-backend links against and are written in pure JavaScript. This allows us to save some compile time by not generating code which doesnโ€™t change, and decompose the backend into JavaScript modules.

This strategy is also how functions are implemented in the JS-backend. Function objects are generated by StgToJS.Expr.genExpr and StgToJS.Apply.genApp but follow this recipe:

var myFUN =
{ f = <function itself>
, m = <garbage collector mark>
, d1 = free variable 1
, d2 = free variable 2
}

To summarize; for most cases we write custom JavaScript objects which hold whatever machinery is needed as properties to satisfy the expected semantics of the Haskell type. This is the strategy that implements: TVar, MVar, MutVar and Fun.

ByteArray#, MutableByteArray#, SmallArray#, MutableSmallArray#,โ€‹

ByteArray# and friends map to JavaScript's ArrayBuffer object. The ArrayBuffer object provides a fixed-length, raw binary data buffer. To index into the ArrayBuffer we need to know the type of data the buffer is expected to hold. So we make engineering tradeoff; we allocate typed views of the buffer payload once at buffer allocation time. This prevents allocations from views later when we might be handling the buffer in a hot loop, at the cost of slower initialization. For example, consider the mem.js.pp shim, which defines ByteArray#:

// mem.js.pp
function h$newByteArray(len) {
var len0 = Math.max(h$roundUpToMultipleOf(len, 8), 8);
var buf = new ArrayBuffer(len0);
return { buf: buf
, len: len
, i3: new Int32Array(buf)
, u8: new Uint8Array(buf)
, u1: new Uint16Array(buf)
, f3: new Float32Array(buf)
, f6: new Float64Array(buf)
, dv: new DataView(buf)
, m: 0
}
}

buf is the payload of the ByteArray#, len is the length of the ByteArray#. i3 to dv are the views of the payload; each view is an object which interprets the raw data in buf differently according to type. For example, i3 interprets buf as holding Int32, while dv interprets buf as a DataView and so on. The final property, m, is the garbage collector marker.

Addr# and StablePtr#โ€‹

Addr# and StablePtr# are implemented as a pair of ByteArray# and an Int# offset into the array. Weโ€™ll focus on Addr# because StablePtr# is the same implementation, with the exception that the StablePtr# is tracked in the global variable h$stablePtrBuf. Addr#s do not have an explicit constructor, rather they are implicitly constructed. For example, consider h$rts_mkPtr which creates a Ptr that contains an Addr#:

function h$rts_mkPtr(x) {
var buf, off = 0;
if(typeof x == 'string') {

buf = h$encodeUtf8(x);
off = 0;
} else if(typeof x == 'object' &&
typeof x.len == 'number' &&
x.buf instanceof ArrayBuffer) {

buf = x;
off = 0;
} else if(x.isView) {

buf = h$wrapBuffer(x.buffer, true, 0, x.buffer.byteLength);
off = x.byteOffset;
} else {

buf = h$wrapBuffer(x, true, 0, x.byteLength);
off = 0;
}
return (h$c2(h$baseZCGHCziPtrziPtr_con_e, (buf), (off)));
}

The function does some type inspection to check for the special case on string. If we do not have a string then a Ptr, which contains an Addr#, is returned. The Addr# is implicitly constructed by allocating a new ArrayBuffer and an offset into that buffer. The object case is an idempotent check; if the input is already such a Ptr, then just return the input. The cases which do the work are the cases which call to h$wrapBuffer:

// mem.js.pp
function h$wrapBuffer(buf, unalignedOk, offset, length) {
if(!unalignedOk && offset && offset % 8 !== 0) {
throw ("h$wrapBuffer: offset not aligned:" + offset);
}
if(!buf || !(buf instanceof ArrayBuffer))
throw "h$wrapBuffer: not an ArrayBuffer"
if(!offset) { offset = 0; }
if(!length || length < 0) { length = buf.byteLength - offset; }
return { buf: buf
, len: length
, i3: (offset%4) ? null : new Int32Array(buf, offset, length >> 2)
, u8: new Uint8Array(buf, offset, length)
, u1: (offset%2) ? null : new Uint16Array(buf, offset, length >> 1)
, f3: (offset%4) ? null : new Float32Array(buf, offset, length >> 2)
, f6: (offset%8) ? null : new Float64Array(buf, offset, length >> 3)
, dv: new DataView(buf, offset, length)
};
}

h$wrapBuffer is a utility function that does some offset checks and performs the allocation for the typed views as described above.

Numbers: The Involved Caseโ€‹

Translating numbers has three issues. First, JavaScript has no concept of fixed-precision 64-bit types such as Int64# and Word64#. Second, JavaScript bitwise operators only support signed 32-bit values (except the unsigned right shift operator of course). Third, numbers are atomic types and do not require any special properties for correct semantics, thus using wrapping objects gains us nothing at the cost of indirection.

Working with 64-bit Typesโ€‹

To express 64-bit numerics, we simply use two 32-bit numbers, one to express the high bits, one for the low bits. For example, consider comparing two Int64#:

// arith.js.pp
function h$hs_ltInt64(h1,l1,h2,l2) {
if(h1 === h2) {
var l1s = l1 >>> 1;
var l2s = l2 >>> 1;
return (l1s < l2s || (l1s === l2s && ((l1&1) < (l2&1)))) ? 1 : 0;
} else {
return (h1 < h2) ? 1 : 0;
}
}

The less than comparison function expects four inputs, two for each Int64# in Haskell. The first number is represented by h1 and l1 (high and low), and similarly the second number is represented by h2 and l2. The comparison is straightforward, we check equivalence of our high bits, if equal then we check the lower bits while being careful with signedness. No surprises here.

For the bitwise operators we store both Word32# and Word# as 32-bit signed values, and then map any values greater or equal 2^31 bits to negative values. This way we stay within the 32-bit range even though in Haskell these types only support nonnegative values.

Unwrapped Number Optimizationโ€‹

The JS backend uses JavaScript values to represent both Haskell heap objects and unboxed values (note that this isn't the only possible implementation, see 1). As such, it doesn't require that all heap objects have the same representation (e.g. a JavaScript object with a "tag" field indicating its type) because we can rely on JS introspection for the same purpose (especially typeof). Hence this optimization consists in using a more efficient JavaScript type to represent heap objects when possible, and to fallback on the generic representation otherwise.

This optimization particularly applies to Boxed numeric values (Int, Word, Int8, etc.) which can be directly represented with a JavaScript number, similarly to how unboxed Int#, Word#, Int8#, etc. values are represented.

Pros:

  • Fewer allocations and indirections: instead of one JavaScript object with a field containing a number value, we directly have the number value.

Cons:

  • More complex code to deal with heap objects that can have different representations

The optimization is applicable when:

  1. We have a single data type with a single data constructor.
  2. The constructor holds a single field that can only be a particular type.

If these invariants hold then, we remove the wrapping object and instead refer to the value held by the constructor directly. Int8 is the simplest case for this optimization. In Haskell we have:

data Int8 = Int8 Int8#

Notice that this definition satisfies the requirements. A direct translation in the JS backend would be:

// An Int8 Thunk represented as an Object with an entry function, f
// and payload, d1.
var anInt8 = { d1 = <Int8# payload>
, f : entry function which would scrutinize the payload
}

We can operationally distinguish between a Thunk and an Int8 because these will have separate types in the StgToJS GHC pass and will have separate types (object vs number) at runtime. In contrast, in Haskell an Int8 may actually be a Thunk until it is scrutinized and then becomes the Int8 payload (i.e., call-by-need). So this means that we will always know when we have an Int8 rather than a Thunk and therefore we can omit the wrapper object and convert this code to just:

// no object, just payload
var anInt8 = = <Int8# payload>

For the interested reader, this optimization takes place in the JavaScript code generator module GHC.StgToJS.Arg, specifically the functions allocConStatic, isUnboxableCon, and primRepVt.

But what about the other stuff!โ€‹

  • Char#: is represented by a number, i.e., the code point
  • Float#/Double#: Both represented as a JavaScript Double. This means that Float# has excess precision and thus we do not generate exactly the same answers as other platforms which are IEEE754 compliant. Full emulation of single precision Floats does not seem to be worth the effort as of writing. Our implementation represents these in a ByteArray#, where each Float# takes 4 bytes in the ByteArray#. This means that the precision is reduced to a 32-bit Float.

  1. An alternative approach would be to use some JS ArrayBuffers as memory blocks into which Haskell values and heap objects would be allocated. As an example this is the approach used by the Asterius compiler. The RTS would then need to be much more similar to the C RTS and the optimization presented in this section wouldn't apply because we couldn't rely on introspection of JS values.โ†ฉ