Introductionโ
I recently gave a short presentation on the topic of stacks in the GHC JavaScript backend to the GHC team at IOG. This blog post is a summary of the content.
In the context of a program produced by the GHC JavaScript backend, two different types of stack exist: The JavaScript call stack and Haskell lightweight stacks. In this post we will focus mostly on the lightweight stacks.
First we will see why using only the JavaScript call stack is not suitable for running compiled Haskell code. Then we will introduce the calling convention we use for Haskell and see how the lightweight stacks are used for making calls and passing around data. After this, we will explore in more detail how they are used for exception handling and multithreading.
Lightweight Stacksโ
GHC's JavaScript backend translates Haskell, via the intermediate language STG, to JavaScript functions and data. When compiled Haskell code needs to run a computation from a different part of the code, it has to call into the corresponding (JavaScript) function. The code relies on tail calls a lot, and for this reason we cannot use a regular JavaScript function call: JavaScript does not support tail-call optimization and we would run out of stack space quickly.
Instead, we use a technique known as "trampolining" to avoid using space on the JavaScript call stack.
Trampoliningโ
When using trampolining to make a tail call, we don't directly make a function call (which would use JavaScript stack space), instead we return the function itself, and let a "scheduler" do the call for us. This is easiest to show with an example:
// example without trampolining
function example1_direct() {
var arg1 = ... // compute arg1
// make a tail call to xyz (using space on the JavaScript call stack)
return xyz(arg1);
}
example1_direct
above uses a direct call to xyz
. If we change it to use trampolining to call xyz
it looks as follows:
// simplified scheduler function
function scheduler(c) {
// scheduler trampolining loop
while(true) {
c = c();
}
}
// example1 is called from the scheduler loop
function example1_trampoline() {
var arg1 = ... // compute arg1
// use global "register" variable h$r1 for the argument
h$r1 = arg1;
// return a reference to the function to make the call
return xyz;
}
example1_trampoline
only works if it's called from the scheduler loop. The scheduler loop keeps calling functions (c = c();
), so all calls are made directly from the scheduler
function, no growing JavaScript stack.
The scheduler loop cannot deal with function arguments, it calls each c
without any arguments. There's no way to efficiently return both a function and arguments to scheduler
in JavaScript. Allocating an object wrapping the function and arguments for each call would be prohibitively expensive. Therefore we rely on global variables h$r1, h$r2, h$r3, ...
, which we refer to as "registers", to pass the arguments to xyz
.
Continuationsโ
Other situations are a bit more complicated, for example if our code needs to do something with the result of the call, like pattern matching on a data constructor or primitive value. Let's extend our example a little bit:
function abc_direct(arg1) {
var result = ... // compute result
return result;
}
function example2_direct() {
var arg1 = ... // compute arg1
var r = abc_direct(arg1);
if(r > 0) {
return 1;
} else {
return 0;
}
}
Here the call to abc_direct
is not a tail call; example2_direct
has to do something with the result of the call. To make a trampolining version out of example2
, we need to consider the following:
abc
has to return a value to the function that called itexample2
needs to inspect the value returned byabc
.
Returning the result value directly from abc
is not possible: It would end up in the c = c();
loop in scheduler
. We need to return a value, but we can only return a function to make a tail call. Where do we get this function?
This is where the lightweight Haskell stack comes into play. We use the convention that when we are done computing, we "return" the result by making a tail call to a continuation (function) at the top of the lightweight stack, passing the result as an argument.
The stack is stored in the global variable h$stack
, which is a JavaScript array, and we have a global integer variable h$sp
which represents the index of the top of the stack. This means that our continuation is found at h$stack[h$sp]
, the top of the stack. We call it the usual way: Returning the function to the trampoline and using registers (h$r1, h$r2, ...
) for the arguments.
Using this convention, example2
becomes:
function abc_trampoline() {
var arg1 = h$r1;
var result = ... // compute result
// call the continuation on the stack with the result
h$r1 = result;
return h$stack[h$sp];
}
function example2_trampoline() {
var arg1 = ... // compute arg1
// push our continuation
h$sp++;
h$stack[h$sp] = example2_cont;
// tail-call abc through trampoline
h$r1 = arg1;
return abc_trampoline;
}
function example2_cont() {
// pop this continuation from the stack
h$sp--;
// the result is the first register argument
var r = h$r1;
if(r > 0) {
h$r1 = 1;
} else {
h$r1 = 0;
}
// make the tail call with the result in h$r1
return h$stack[h$sp];
}
Having a stack of continuations allows calls of arbitrary depth to be composed.
Stack Framesโ
Sometimes just storing continuations isn't enough. Suppose that we need to use some data in our continuation, for example:
function example3_direct() {
var arg1 = ... // compute arg1
var a = ... // compute a
var b = ... // compute b
var r = abc_direct(arg1);
if(r > 0) {
return a;
} else {
return b;
}
}
Here we need to make sure that a
and b
are available in the continuation. To do so, we save them on the stack as follows:
function example3_trampoline() {
var arg1 = ... // compute arg1
var a = ... // compute a
var b = ... // compute b
// push our continuation, saving a and b
h$sp += 3;
h$stack[h$sp-2] = a;
h$stack[h$sp-1] = b;
h$stack[h$sp] = example3_cont;
// tail-call abc through trampoline
h$r1 = arg1;
return abc_trampoline;
}
function example3_cont {
var r = h$r1;
// restore a and b from stack
var a = h$stack[h$sp-2];
var b = h$stack[h$sp-1];
// pop the stack frame (3 slots: a, b, example3_cont)
h$sp -= 3;
// return our result through the trampoline
if(r > 0) {
h$r1 = a;
} else {
h$r1 = b;
}
return h$stack[h$sp];
}
Now the stack consists of JavaScript functions (continuations), each followed by zero or more slots of data. We refer to the JavaScript function as the header and the data as the payload. The header and payload combined are called a stack frame.
Here is an example of what h$stack
might look like during the execution of example3_trampoline
:
Slot | Value | Frame | Type / Description |
---|---|---|---|
0 | h$done | 1 | function , "finished" frame |
1 | 0 | 2 | number , flags |
2 | h$...zireportError | 2 | object , exception handler |
3 | h$catch_e | 2 | function , exception catch frame |
4 | 4 | 3 | value of a (in example3 above) |
5 | 5 | 3 | value of b (in example3 above) |
6 | example3_cont | 3 | function , example3_cont frame |
Frame 3
is the example3_cont
frame with two slots of payload, pushed by our example. Frames 2
and 1
are added by the runtime system. We will discuss them in more detail later.
Stack Frame Metadataโ
During normal execution of a program, the code that manipulates the stack has knowledge of the specific stack frame it's working with: It knows the size of the payload and which values are stored in each stack slot. However, there are also operations that require dealing with all kinds of unknown stack frames. Exceptions and software transactional memory are the most important ones.
These operations deal with unknown stack frames and sometimes need information about the frame, for example the frame size. Where do we store it?
Every JavaScript function is also an object. This means that we can store arbitrary data in the function's properties. For example for any function f
we can do f.x = 5;
. This is what we use to store the metadata for stack frames.
Stack frames contain at least the following metadata in the header:
Property | Description |
---|---|
f.size | integer, number of payload slots in stack frame |
f.a | integer, number of register variables used for arguments |
example3_cont.size
would be 2
, since it has two slots of payload. The size of the full example3_cont
frame is three slots.
There is also a small number of frames (defined by the runtime system) with the value f.size < 0
. These have the stack frame size stored inside the payload. h$ap_gen
is another special case, where the stack frame payload encodes both frame size and register information. The runtime system function h$stackFrameSize
computes the size of a stack frame:
// compute the size of stack frame h$stack[h$sp] with header f
function h$stackFrameSize(f) {
if(f === h$ap_gen) {
return (h$stack[h$sp - 1] >> 8) + 2;
} else {
var tag = f.size;
if(tag < 0) {
return h$stack[h$sp-1];
} else {
return (tag & 0xff) + 1;
}
}
Here is an example of a stack with an h$ap_gen
stack frame, the payload is stored in slots 4
and 5
, while slot 6
contains the metadata:
Slot | Value | Frame | Type / Description |
---|---|---|---|
0 | h$done | 1 | function , "finished" frame |
1 | 0 | 2 | number , flags |
2 | h$...zireportError | 2 | object , exception handler |
3 | h$catch_e | 2 | function , exception catch frame |
4 | 4 | 3 | payload |
5 | 3 | 3 | payload |
6 | 513 | 3 | number , stack frame metadata (payload size and registers) |
7 | h$ap_gen | 3 | function , h$ap_gen frame |
Exception Handlingโ
Haskell allows exceptions to be thrown within threads and between threads as an alternate way of returning a value. The throw operation transfers control to exception handler in the next catch
frame on the stack.
The catch
frame has two words of payload:
Slot | Value | Frame | Type / Description |
---|---|---|---|
0 | 0 | 1 | number , flags |
1 | h$...zireportError | 1 | object , exception handler |
2 | h$catch_e | 1 | function , exception catch frame |
The code for h$catch_e
is straightforward. It just pops the stack frame and returns to the next frame. This is what happens if no exception has occurred; the program just skips past the exception handler:
function h$catch_e() {
h$sp -= 3;
return h$stack[h$sp];
}
An exception is thrown by the h$throw
function, which unwinds the stack. Its implementation in simplified form looks like this:
// throw exception e
function h$throw(e) {
...
while(h$sp > 0) {
f = h$stack[h$sp];
...
// check for stack frames that need to be handled
if(f === h$catch_e) break;
if(f === h$atomically_e) { ... }
if(f === h$catchStm_e) break;
if(f === h$upd_frame) { /* handle black hole */ }
h$sp -= h$stackFrameSize(f);
}
if(h$sp > 0) {
var maskStatus = h$stack[h$p - 2];
var handler = h$stack[h$sp - 1];
// jump to handler
...
} else {
// no exception handler found, report error
...
}
}
h$throw
keeps removing stack frames from the stack until some frame of interest is found, using h$stackFrameSize
to determine the size of each frame. Eventually, it transfers control to an exception handler or it reports an error if no exception handling frame could be found.
Threadsโ
Concurrent Haskell supports multiple Haskell threads. These Haskell threads are run by one or more system threads. The JavaScript "system" is single-threaded (ignoring Web Workers, which have limited memory sharing), so we have to use a single system thread to run everything. It turns out to be quite straightforward to support multithreading if we use the trampolining calling convention with lightweight stacks. Each Haskell thread gets its own stack and stack pointer.
Each Haskell thread has a thread state object, t
of type h$Thread
. This object contains the stack (t.stack
, Array
) and stack pointer (t.sp
, number
) for the thread, and also keeps track of the thread status, for example whether the thread is finished or masking asynchronous exceptions.
When a thread is created, the h$Thread
object is initialized as follows:
/** @constructor */
function h$Thread() {
// get a unique thread id
this.tid = ++h$threadIdN;
this.status = THREAD_RUNNING;
// some initial error handling frame on the stack
this.stack = [h$done
, 0
, h$baseZCGHCziConcziSynczireportError
, h$catch_e
];
this.sp = 3;
this.excep = []; // waiting async exceptions
this.mask = 0; // async exceptions masked
// (0 unmasked, 1: uninterruptible, 2: interruptible)
this.interruptible = false; // currently in an interruptible operation
...
}
The initial stack contains two stack frames. The top three slots contain a catch
frame with the h$catch_e
header, the h$baseZCGHCziConcziSynczireportError
exception handler and 0
, for the mask state. The last slot of the stack is for h$done
frame, which only has a header and no payload.
Scheduling and Suspending a Threadโ
Now we have potentially multiple h$Thread
objects, each with their own stack. But how do we run the threads?
The trampolining calling convention introduced earlier expects the stack to be stored in h$stack
, with h$sp
the index of the topmost used slot. This means that if we want to run a thread t
, we need to set h$stack
and h$sp
to the values from the h$Thread
object t
:
// scheduling a thread t
function __scheduleThread(t) {
h$currentThread = t;
h$stack = t.stack;
h$sp = t.sp;
...
}
We also store t
itself in the global h$currentThread
variable to keep track of the currently running thread.
While the thread is running, h$stack
and h$sp
are constantly updated by the compiled Haskell code, so they go out of sync with values saved in the h$Thread
object. This means that when we suspend a thread, these need to be copied back into the h$Thread
as follows:
// suspending the current thread
function __suspendCurrentThread() {
// save the global h$stack and h$sp back into the thread state object
h$currentThread.stack = h$stack;
h$currentThread.sp = h$sp;
// set h$currentThread to indicate that no thread is running.
h$currentThread = null;
...
}
Having multiple threads, the trampoline loop looks a bit different. A thread returns a special continuation h$reschedule
to indicate that another thread may now be scheduled. In simplified form, the scheduler looks as follows:
function scheduler() {
while(true) {
// fetch the next thread to run and schedule it
var t = getNextThread();
// if there are no more threads then we're done
if(t === null) return;
__scheduleThread(t);
// start by executing the continuation at the top of the stack
var c = h$stack[h$sp];
// run until the thread indicates that another may now be scheduled
while(c !== h$reschedule) {
c = c();
}
// suspend the thread again
__suspendCurrentThread();
}
}
In practice, the scheduler is quite a bit more complicated. For example it also uses time-based switching, changing to different thread even when h$reschedule
is not returned by the current thread. In that case, the scheduler takes care of saving the thread state onto the stack, using metadata from the continuation.
A discussion of all the ins and outs of the scheduler is beyond the scope of this blog post. But it could be the topic of a follow-up post.
Threads and Asynchronous Exceptionsโ
Haskell exceptions come in two flavours: synchronous exceptions and asynchronous exceptions. Synchronous exceptions always come from the code itself, for example a pattern match failure, a call to error
or undefined
. Running the code again with the same input always produces the same result.
Asynchronous exceptions come from the outside. They can come from other threads (but don't need to) or from the runtime system. Typical reasons for asynchronous exceptions are timeouts and resource exhaustion. It's perfectly possible for the same function with the same input to be aborted with an asynchronous exception the first run, while running to completion the second time. This means that we must be careful preserving any partially completed computation.
Asynchronous exceptions can happen at any time, which can make them quite tricky to deal with. They could leave the program in an inconsistent state if they occur at the wrong time. Therefore, threads can temporarily block asynchronous exceptions, a process called masking. Different masking states are used for indicating whether exceptions are still masked when the thread is performing an interruptible operation.
Here is a comparison of the main differences between synchronous and asynchronous exceptions:
Synchronous | Asynchronous | |
---|---|---|
Thrown using primop | raise# , raiseIO# | killThread# |
Computation on the stack | thunks updated to immediately raise the same exception again | stack captured in heap objects so the computation can be resumed |
Throw to other thread | impossible | push h$raiseAsync_frame frame onto receiving thread's stack (unless masked) |
Throw to current thread | use h$throw immediately | use h$throw immediately (no masking) |
Masking | no masking, thrown immediately | exception saved in h$Thread.excep of receiving thread if masked, the sending thread is blocked until the exception is delivered |
We can see that while on the surface, synchronous and asynchronous exceptions look similar, there are many differences under the hood. Masking requires some additional machinery for thread synchronization and storing exceptions that cannot be delivered yet.
Property | Description |
---|---|
h$Thread.mask | number , mask status indicating unmasked / masked uninterruptible / masked interruptible |
h$Thread.excep | Array , list of unposted asynchronous exceptions and their posting h$Thread objects. When the receiving thread unmasks, the scheduler posts the exceptions to its h$Thread.stack with h$raiseAsync_frame and unblocks the sending threads |
h$Thread.interruptible | boolean , interruptible state of the thread |
Important Functions and Valuesโ
We have seen various global names and code examples so far, but some example code was simplified a little for clarity. This section offers some pointers to get started with exploring the actual code.
Global Variablesโ
Name | Description |
---|---|
h$stack | Array , the stack of the currently running thread |
h$sp | integer , index of the top slot of the stack |
h$r1 , h$r2 , ..., h$r32 | Any , "registers", function arguments (first 32) |
h$regs | Array , function arguments when more than 32 are needed |
h$currentThread | h$Thread , object of currently running Haskell thread. null if no Haskell thread is running |
Running Threadsโ
These functions start a new thread given an IO
action. They also start the main loop of the scheduler.
Name | Description |
---|---|
h$run(c) | Run IO action c in a new lightweight thread |
h$main(c) | Run IO action c in a new lightweight thread, flush buffers and exit the program when finished |
Some Useful Internal Functionsโ
Name | Description |
---|---|
h$stackFrameSize(f) | Compute the size of the stack frame f |
h$throw(e, async) | Throw exception e (object ), async (boolean ) determines whether the exception is asynchronous. |
h$mainloop | The entry point for running Haskell code |
h$runThreadSlice | The actual trampolining loop (for asynchronous threads) |
h$scheduler | Picks the next thread to schedule, schedules it and returns the continuation to call |
h$suspendCurrentThread | Saves the thread state registers onto the stack to allow another thread to be run, used in conjunction with h$restoreThread |
h$reschedule | Tells the scheduler to schedule another thread |
Some Common Stack Framesโ
Name | Description |
---|---|
h$done | Finish the current thread |
h$return | Load a value from the stack into h$r1 |
h$restoreThread | Load values from stack frame into registers, used to restore the thread state |
h$catch_e | Catch an exception |
h$ap_n_m_e | Apply a function to m arguments taking n slots on the stack |
h$raiseAsync_frame | Raise an asynchronous exception received by this thread |
h$ap_gen | Generic apply a function, the number of arguments is in the stack frame |
Conclusionโ
We have introduced the trampolining calling convention used by the JavaScript backend for GHC and the structure of the stacks used by it.
We have seen that stacks of Haskell lightweight threads are represented by JavaScript arrays with the JavaScript backend. The contents on the stack consist of stack frames with a header and a payload. The header of each stack frame contains some metadata so that code for exception handling can traverse the stack and transfer control to an exception handler.