Skip to main content

· 3 min read

This is the May 2022 monthly update from the GHC DevX team at IOG.

JavaScript Backend for GHC

For a few months we have been merging GHCJS (Haskell to JavaScript compiler) into GHC. We set our first milestone to be the ability to compile and to run the usual "Hello World" program. It turned out to be much more involved than we initially thought (requiring FFI support, etc.), but we should be getting there soon.

This month we have made the following progress:

  • RTS: we have modified Hadrian and rts.cabal in order to build a valid native rts unit that GHC can use, in particular containing appropriate header files.

  • linker: the JS linker has been hooked up with GHC's driver. We fixed several panics in the linker due to erroneous symbol generation code. These bugs were introduced while porting the code from the old 8.10 pretty-printing infrastructure to the newer one.

  • boot libraries: the JS backend can now build and link all the boot libraries. Note that we are not claiming that they are all usable yet. In particular complete FFI support is lacking, but the JS backend Hadrian build completes and so we can start using the produced JS cross-compiler.

  • levity polymorphism: building ghc-prim uncovered a lurking bug related to levity polymorphism. It wasn't noticed in GHCJS 8.10 because it is also related to the BoxedRep proposal that introduced a constructor application in a commonly used RuntimeRep.

  • sized literals: support for new sized literals have been added to the code generator.

Now that have achieved a build process that actually produces a JS cross compiler, we are confronting and fixing issues in the produced JavaScript code, such as adding, managing, and debugging CPP conditional compilation blocks in JS shim files. You can follow our progress on our development branch here.

External Static Plugins

GHC doesn't support plugins in cross-compilers #14335. Some time ago, we came up with a solution called "external static plugins" !7377. These are plugins that are directly loaded from shared libaries, bypassing the issue with usual plugins.

Our colleague Shea Levy confirmed that the approach works, backported it to GHC 8.10, and has been working on making it work in stage1 cross-compilers for Windows. Kudos for this work, Shea.

As the current user-interface based on environment variables isn't convenient, we have been working on adding new command-line flags to GHC instead. We expect to propose this for integration into GHC when the new interface will be fully implemented.

Blog posts

Inspired by our friends and colleagues at Well-Typed and Tweag, we have been starting to write blog posts for IOG's engineering blog. They will mostly be about stuff we are working on or that we are interested in. Feel free to send us feedback about these posts and to send us topics you would be interested to read about.

Haskell Optimization Handbook

The "Haskell Optimization Handbook" is an accepted proposal of the Haskell Foundation. Jeff has been working behind the scene to make this proposal concrete. More about this in the upcoming months.

· 11 min read

As part of the integration of GHCJS into GHC as a cross-compilation backend, we've converted the binary serialisation that GHCJS previously used, which was via its Objectable typeclass, into GHC's internal Binary typeclass representation. In doing this, we gain access to instances for serialising many of GHC's internal data types, and, importantly, we can reuse GHC's mechanism for serialising its Name and FastString types, which are written to lookup tables in order to maintain identity, as well as allowing for space savings on disk.

In this post, we will explain how the GHC Binary and GHCJS Objectable approaches work, and compare their tradeoffs.

How GHC Binary Works

Internally, GHC uses the Name data type to track the uniqueness of objects during compilation. Amongst information relating to the definition of a Name within the Haskell source, a Name also contains a Unique integer (the value of which is provided by the complation environment monad). Using this Unique integer, which is unpacked in Name's definition, we can make O(1) equality comparisons without following further memory references - allowing for this operation to be very quick, which will have a large effect on compilation performance given how often it is used.

FastString is used within GHC to store short, string-like data, and, similarly to Name, FastString uses a unique integer to allow for very fast equality comparisons. Primarily, FastString is used to represent variables and other definitions, and is used both in Name as the string-representation of a name with extra information attached, as well as directly, representing names that don't require this extra information, such as local variables.

In GHC's .hi interface files, Name and FastString are serialised differently compared to other data structures. They are written in the main data structure payload as indicies of a table, and these tables contain the actual string-like data of these types. So, an interface file might resemble:

  • Header
    • Magic number for recognising interface files
    • Pointer to Name symbol table
    • Pointer to FastString dictionary
  • Main data structure payload
  • Name symbol table
  • FastString dictionary

Importantly, the FastString dictionary must be written after the Name symbol table, because Names contain FastStrings, so writing the symbol table will expand the dictionary. Additionally, because we only have one buffer, and we don't know the size of the payload until it's written, the tables cannot be written in the header, and instead serialisation code must reserve space for the table pointers and jump back to write the pointers once the table locations are known.

During serialisation, GHC uses mutable data structures to store both the serialised binary buffer, as well as these tables:

data BinHandle
= BinMem { -- binary data stored in an unboxed array
bh_usr :: UserData, -- sigh, need parameterized modules :-)
_off_r :: !FastMutInt, -- the current offset
_sz_r :: !FastMutInt, -- size of the array (cached)
_arr_r :: !(IORef BinArray) -- the array (bounds: (0,size-1))

data UserData =
UserData {
-- for *deserialising* only:
ud_get_name :: BinHandle -> IO Name,
ud_get_fs :: BinHandle -> IO FastString,

-- for *serialising* only:
ud_put_nonbinding_name :: BinHandle -> Name -> IO (),
-- ^ serialize a non-binding 'Name' (e.g. a reference to another
-- binding).
ud_put_binding_name :: BinHandle -> Name -> IO (),
-- ^ serialize a binding 'Name' (e.g. the name of an IfaceDecl)
ud_put_fs :: BinHandle -> FastString -> IO ()

Here, we see that various functions are stored in the handle structure, to be later referenced by their respective types in their GHC.Utils.Binary.Binary typeclass instances. Notice that the instance of Binary Name references ud_put_nonbinding_name and ud_get_name. Similarly, the Binary FastString instance uses ud_put_fs and ud_get_fs.

class Binary a where
put_ :: BinHandle -> a -> IO ()
put :: BinHandle -> a -> IO (Bin a)
get :: BinHandle -> IO a

instance Binary FastString where
put_ bh f =
case getUserData bh of
UserData { ud_put_fs = put_fs } -> put_fs bh f

get bh =
case getUserData bh of
UserData { ud_get_fs = get_fs } -> get_fs bh

instance Binary Name where
put_ bh name =
case getUserData bh of
UserData{ ud_put_nonbinding_name = put_name } -> put_name bh name

get bh =
case getUserData bh of
UserData { ud_get_name = get_name } -> get_name bh

In GHC.Iface.Binary, helper types and functions are defined to store the Name symbol table and FastString dictionary in a mutable data structure. Here, putFastString is intended to be partially applied - passing it an appropriately initialised BinDictionary so that the resulting function can be stored in the us_put_fs field of the UserData. allocateFastString does the low-level work here, incrementing the index and modifying the mutable map (stored as a UniqFM, which is map keyed on types that contain Uniques - recalling that these are used for fast equality comparisons):

data BinDictionary = BinDictionary {
bin_dict_next :: !FastMutInt, -- The next index to use
bin_dict_map :: !(IORef (UniqFM FastString (Int,FastString)))
-- indexed by FastString

putFastString :: BinDictionary -> BinHandle -> FastString -> IO ()
putFastString dict bh fs = allocateFastString dict fs >>= put_ bh

allocateFastString :: BinDictionary -> FastString -> IO Word32
allocateFastString BinDictionary { bin_dict_next = j_r,
bin_dict_map = out_r} f = do
out <- readIORef out_r
let !uniq = getUnique f
case lookupUFM_Directly out uniq of
Just (j, _) -> return (fromIntegral j :: Word32)
Nothing -> do
j <- readFastMutInt j_r
writeFastMutInt j_r (j + 1)
writeIORef out_r $! addToUFM_Directly out uniq (j, f)
return (fromIntegral j :: Word32)

Later, in GHC.Iface.Binary, getWithUserData and putWithUserData will structure the header, and initialise the UserData functions to write to/read from mutable tables. Notice that we must first reserve header space for pointers to the lookup tables, as well as initialise the mutable tables, write these initialised structures to the UserData (for example, we see the previous putFastString partially applied here), then write the main payload, then write the lookup tables (Name symbol table first, because writing this can add to the FastString dictionary), and finally jump back to fill in the pointers to these tables:

putWithUserData :: Binary a => TraceBinIFace -> BinHandle -> a -> IO ()
putWithUserData traceBinIface bh payload = do
-- Remember where the dictionary pointer will go
dict_p_p <- tellBin bh
-- Placeholder for ptr to dictionary
put_ bh dict_p_p

-- Remember where the symbol table pointer will go
symtab_p_p <- tellBin bh
put_ bh symtab_p_p
-- Make some initial state
symtab_next <- newFastMutInt 0
symtab_map <- newIORef emptyUFM
let bin_symtab = BinSymbolTable {
bin_symtab_next = symtab_next,
bin_symtab_map = symtab_map }
dict_next_ref <- newFastMutInt 0
dict_map_ref <- newIORef emptyUFM
let bin_dict = BinDictionary {
bin_dict_next = dict_next_ref,
bin_dict_map = dict_map_ref }

-- Put the main thing,
bh <- return $ setUserData bh $ newWriteState (putName bin_dict bin_symtab)
(putName bin_dict bin_symtab)
(putFastString bin_dict)
put_ bh payload

-- Write the symtab pointer at the front of the file
symtab_p <- tellBin bh -- This is where the symtab will start
putAt bh symtab_p_p symtab_p -- Fill in the placeholder
seekBin bh symtab_p -- Seek back to the end of the file

-- Write the symbol table itself
symtab_next <- readFastMutInt symtab_next
symtab_map <- readIORef symtab_map
putSymbolTable bh symtab_next symtab_map
case traceBinIface of
QuietBinIFace -> return ()
TraceBinIFace printer ->
printer (text "writeBinIface:" <+> int symtab_next
<+> text "Names")

-- NB. write the dictionary after the symbol table, because
-- writing the symbol table may create more dictionary entries.

-- Write the dictionary pointer at the front of the file
dict_p <- tellBin bh -- This is where the dictionary will start
putAt bh dict_p_p dict_p -- Fill in the placeholder
seekBin bh dict_p -- Seek back to the end of the file

-- Write the dictionary itself
dict_next <- readFastMutInt dict_next_ref
dict_map <- readIORef dict_map_ref
putDictionary bh dict_next dict_map
case traceBinIface of
QuietBinIFace -> return ()
TraceBinIFace printer ->
printer (text "writeBinIface:" <+> int dict_next
<+> text "dict entries")

In summary, we see a number of structural characteristics of code using GHC's Binary implementation:

  • Use of a single buffer means that the lookup tables can't be written in the header, so we have to reserve space for table pointers in the header, and jump back once we know where they will be located in order to write the pointers to the buffer. Essentially, an ordering of file sections is enforced by the data dependencies of the payload containing Names and FastStrings, and Names containing FastStrings - which means these must be written in this order, but reading must be done in the reverse order, causing the need for pointers in the header.
  • Jumping around in binary buffers results in weakly enforced types and fiddly, code that Haskell's type system isn't able to help us debug
  • Must carry about read/write functions for the lookup table types (Name and FastString), which are undefined during the opposite serialisation stage, and are hard-coded into the handle, reducing extensibility.

How Objectable Works

In comparison, GHCJS previously involved using instances of the Objectable typeclass to serialise its interface files:

import qualified Data.Binary as DB

data SymbolTable
= SymbolTable !Int !(Map ShortText Int)
deriving (Show)

type PutSM = St.StateT SymbolTable DB.PutM -- FIXME: StateT isn't strict enough apparently
type PutS = PutSM ()
type GetS = ReaderT ObjEnv DB.Get

class Objectable a where
put :: a -> PutS
get :: GetS a
putList :: [a] -> PutS
putList = putListOf put
getList :: GetS [a]
getList = getListOf get

Here we see that GHCJS has opted for a different approach that avoids the mutable buffer by instead using Data.Binary instances that work via concatenating lazy ByteStrings. Additionally, the mutable tables are replaced with a State monad that holds the symbol table as a Map structure.

Because Data.Binary forms lazy ByteStrings, it's trivial to serialise the individual parts of the interface file and later concatenate these using ByteString's monoid instance - allowing for all of the sections of the file to be defined declaratively at the top-level of the function in order of their appearance within the file.

:: ModuleName -- ^ module
-> SymbolTable -- ^ final symbol table
-> Deps -- ^ dependencies
-> [([ShortText],ByteString)] -- ^ serialized units and their exported symbols, the first unit is module-global
-> ByteString
object' mod_name st0 deps0 os = hdr <> symbs <> deps1 <> idx <> mconcat (map snd os)
hdr = putHeader (Header (moduleNameTag mod_name) (bl symbs) (bl deps1) (bl idx))
bl = fromIntegral . B.length
deps1 = putDepsSection deps0
(sti, idx) = putIndex st0 os
symbs = putSymbolTable sti

In summary, the use of multiple ByteString sections that are later concatenated offer several different structural characteristics compared to the use of a single mutable buffer:

  • The final ordering of the sections is flexible, because they are serialsied separately, so any data dependencies don't introduce ordering in the file - which we see in the where clause of object'
  • Types are more strongly enforced because imperative seekBin instructions aren't required. However, each section is still deserialised by taking a substring of the file to be read as that section type. Of course, all serialisation eventually results in raw binary, so the simplification of concatenating the sections into the final file without jumping around limits the places that bugs can hide
  • Visually, the ordering of the sections within the final file is very clear - we see in object' that every section is simply listed in order on one line, concatenated together.


Making use of GHC's existing infrastructure lets the GHCJS backend to make use of the FastString and Name data types, as well as allowing for the removal of a significant amount of now-redundant code.

Additionally, interface file generation using GHC's Binary appears to be very fast - for example, attempts to hide the handle behind a reader monad significantly reduce the compiler's performance as measured by CI. Speculatively, looking at the generated core, this could be because the optimiser has a much better time with the style of IO code that is used - rather than being a limitation of more abstacted approaches.

The comparison provided the GHCJS's old approach makes it clear that GHC's Binary implementation, while very useful, has potential to be improved in both readability and extensiblity. However, because CI has shown that serialisation performance has a significant effect on overall compilation performance, this tradeoff must be considered when making any changes. Potentially, these readability shortfalls in GHC's implementation might just be the result of legacy code, and so benchmarks of other approaches, such as Data.Binary, should be used to guide future work in improving the readability and flexibility of GHC's serialisation without sacrificing performance.

· 7 min read


At IOG DevX we have been working on integrating various bits of GHCJS into GHC, with the goal of having a fully working JavaScript backend for the 9.6 release. For some parts this has mostly consisted of an update of the code to use the newer GHC API and dependencies. Other bits, like the Template Haskell runner, need more work.

This post gives an overview of the existing approaches for running Template Haskell in GHC based cross compilers and our plan for the JavaScript backend. Hopefully we can revisit this topic once all the work has been done, and see what exactly we ended up with.

The GHCJS Template Haskell Runner

When I first worked on Template Haskell (TH) support for GHCJS, there was no mechanism to combine Template Haskell with cross compilation in GHC.

Normally, Template Haskell is run by loading library code directly into the GHC process and using the bytecode interpreter for the current module. Template Haskell can directly access GHC data structures through the Q monad. Clearly this would not be possible for GHCJS: We only have JavaScript code available for the libraries and the organization of the JavaScript data structures is very different from what GHC uses internally.

So I had to look for an alternative. Running Template Haskell consists of two parts:

  1. loading/executing the TH code
  2. handling compiler queries from the TH code, for example looking up names or types

Running the TH code can be done by first compiling the Haskell to JavaScript and then using the JavaScript eval feature.

Template Haskell code can query the compiler using the Quasi typeclass. I noticed that none of the methods required passing around functions or complicated data structures, so it would be possible to serialize each request and response and send it to another process.

So I went ahead and implemented this approach with a script thrunner.js to load and start the code in a node.js server, a message type with serialization, and a new instance of the Quasi typeclass to handle the communication with the compiler via the messages. This is still what's in use by GHCJS to this day. Every time GHCJS encounters Template Haskell, it starts a thrunner process and the compiler communicates with it over a pipe.

After starting thrunner.js GHCJS sends the Haskell parts of the Template Haskell runnner to the script. This includes the runtime system and the implementation of the Quasi typeclass and communication protocol. After that, the TH session starts. A typical TH session looks as follows:

RunTH THExp <js code> <source location>
LookupName (Just <name-string>)
LookupName' (Just <name>)
Reify <name>
Reify' <name-info>
RunTH' <result>
RunTH THDec <js code> <source location>
AddTopDecls <declarations>
RunTH' <result>
FinishTH True
FinishTH' <memory-consumption>

Each message is followed up by a corresponding reply. For example, a LookupName' response follows a LookupName request and a RunTH message will eventually generate a RunTH' result. The first RunTH message contains the compiled JavaScript for the Template Haskell code, along with its dependencies. Each subsequent RunTH only includes dependencies that have not already been sent.

The thrunner process stays alive during the compilation of at least an entire module, allowing for persistent state (putQ/getQ).

The GHC External Interpreter

If we build a Haskell program with (cost centre) profiling, the layout of our data structures changes to include bookkeeping of cost centre information. This means that we need a special profiling runtime system to run this code.

What can we do if we want to run our profiled build in GHCi or Template Haskell? We cannot load compiled profiling libraries into GHC directly; its runtime system expects non-profiled code. We could use a profiled version of the compiler itself, but this would make all compilation very slow. Or we could somehow separate the profiled code of our own program from the non-profiled code in the compiler.

This was Simon Marlow's motivation for adapting the GHCJS thrunner approach, integrating in GHC and extending it it to support GHCi and bytecode. This functionality can be activated with the -fexternal-interpreter flag and has been available since GHC version 8.0.1. When the external interpreter is activated, GHC starts a separate process, iserv (customizable with the -pgmi flag) which has the role analogous to the thrunner script for GHCJS.

Over time, the iserv code has evolved with GHC and has been extended to include more operations. By now, there are quite a few differences in features:

Template Haskell supportyesyes
Object codethrough pipefrom file
Object code linkingcompileriserv process

thrunner is not quite as complete as iserv: It lacks GHCi and the debugger, and there is no bytecode support. But these features are not essential for basic Template Haskell.

Proxies and Bytecodes

We have now seen two systems for running Template Haskell code outside the compiler process: The original GHCJS thrunner and the extended GHC iserv.

Clearly it isn't ideal to have multiple "external interpreter" systems in GHC, therefore we plan to switch from thrunner to iserv for the upcoming JavaScript GHC backend. We don't need the debugger or GHCi support yet, but we do need to adapt to other changes in the infrastructure. So what does this mean in practice?

The biggest change is that we have to rework the linker: thrunner does not contain any linking logic by itself: GHCJS compiles everything to JavaScript and sends compiled code to the thrunner process, ready to be executed. In contrast, iserv has a loader for object and archive files. When dependencies need to be loaded into the interpreter, GHC just gives it the file name.

Another change is using the updated message types. In the thrunner session example above we could see that each message is paired with a response. For example a RunTH' response always follows a RunTH message, with possibly other messages in between. iserv has an interesting approach for the Message datatype: Instead of having pairs of data constructors for each message and its response, iserv has a GADT Message a, where the a type parameter indicates the expected response payload for each data constructor.

During development of the thrunner program it turned out to be very useful to save and replay Template Haskell sessions for debugging purposes. We'd like to do this again, but now saving the message in a readable/writable format. Since we're dealing with JavaScript, JSON appears to be the obvious choice.

Our plan is to have an iserv implementation that consists of a JavaScript part that runs in node.js and a proxy process to handle communication with GHC. The proxy process converts the messages between GHC's own (binary based) serialization format and JSON. The proxy process is relatively simple, but it does reveal one downside of the new GADT based message types: A proxy is stateful. We must always know which message we have sent to convert the response back from JSON to binary.

It's not yet known whether we will implement a full bytecode interpreter. We expect it to become clear during implementation whether we can get away without one early on.


We have seen how Template Haskell and GHCi code can be run outside the GHC process for profiling or cross compiling, with both the thrunner approach in GHCJS and the newer iserv in GHC.

We at IOG DevX are working on switching to the iserv infrastructure for the upcoming GHC JavaScript backend, which involves a substantial rewrite, mainly because of differences in linking. This is a work in progress, and we intend to revisit this topic in another blog post once the final design has been implemented.

· 2 min read

Welcome to the (rather late) April 2022 monthly update from the GHC DevX team at IOG. Since the last update we've continued work on the upcoming JavaScript backend for GHC. Unfortunately, we have nothing to show quite yet but that doesn't mean nothing has happened! On the contrary, we've made great progress and are close to that crucial first milestone hello world. Besides our work on the JavaScript backend, we were pleased to finally push through the Modularizing GHC paper that Sylvain has been working on for 2+ years! It causes quite the splash on the Haskell discourse and reddit, we recommend reading it if you haven't already (links below). Alright, enough introduction let's get into the update.

JavaScript Backend

We have made the following progresses in the implementation of a JavaScript backend for GHC (adapted from GHCJS):

  • linker: ported GHCJS's linker code into GHC. A lot of code was duplicated from GHC and slightly modified for GHCJS's needs, making the process far from trivial.

  • testsuite: fixed Hadrian to run GHC's testsuite with cross-compilers !7850. There are remaining issues though (see #21292).

  • build system: fixes for GHC's configure script were ported (e.g. support for the "ghcjs" target in config.sub). GHCJS's custom build script was integrated into We can now configure the build with: ./configure --target=js-unknown-ghcjs

  • TH: we have conducted some experiments to find the best way to bridge GHCJS's TH runner and GHC's external interpreter. This will be described in details in a future blog post.

  • FFI: basic support for JavaScript FFI has been ported from GHCJS to GHC. We haven't ported the JavaScript parser, so we have dropped the fancy import syntax (e.g. "$"). It should be enough to build boot libraries and we will add JS parsing support later.

At this stage, we are working on building boot libraries and on supporting linking with the JS RTS.

Development happens in the following branch:

Modularity paper

Sylvain, Jeffrey, and John Ericson (from Obsidian Systems) wrote a paper about "modularizing GHC" using domain-driven design.

We've got a lot of great feedback about it (expect a first revision soon). We also got a GHC contribution directly inspired by the paper (see !8160) which was very welcome!

· 18 min read

Table of Contents

Haskell is a great language camouflaged by lackluster tooling. This situation has led to well-known problems (who could forget Cabal hell?). A less discussed problem is what I will call the Black-box syndrome: It is hard to know exactly what the memory representation and runtime performance of my Haskell programs are1. Now black-box syndrome is not only a problem, it is also one of the nice features in the language since like all good abstractions it elides things Id rather not care about, at least most of the time. In other words, I am happy I dont have to do manual memory manipulation!

However, when I have my optimization hat on, I run face first into black-box syndrome. The crux of the problem is a tension between the need for observation during performance engineering and optimization, and the need to ship fast code. During development we want to be able to open up a system, see exactly how it is working, make tweaks, package it back up and test again. I want to be able to answer questions like Why is my executable this size?, Which code is a hot loop?, or When does my code do direct, known or unknown function calls?.

In order to answer these questions we need the ability to observe every part of that system as the machine experiences it, without this ability we have no way to make progress other than test, change some code, compile and test again in an ad-hoc manner. And therein lies the problem, most Haskell tooling is insufficient to provide the observability that we would like, instead the tooling often expects and requires us to make source code changes to our program or even recompile all of our libraries and code for a profiling way. This leads to the idea and the expectation in the Haskell community that Haskell programs are hard to optimize because the barrier to entry for optimization has artificially increased.

Csaba Hruska has recently been making headway in this area with his work on the GRIN compiler and an external STG interpreter. His STG interpreter (and patched ghc) exactly solve these problems and he has demonstrated dumping the entire call graph of large Haskell projects, filter to hot loops and finding unknown function calls in these graphs. If you havent seen his demo be sure to watch it, it is well worth your time.

This post is the first in a new blog series. In this blog series were going to kick the tires on the external STG interpreter see what it can do, and what we can uncover in some popular libraries by using it. In particular, Im interested in running it on projects Ive previously optimizedsuch as ghc itself, containers, unordered-containersusing the standard methods: ticky-ticky profiling, prof, flamegraphs, heap profiling, ghc-debug, cachegrind etc. This post, however, will be focused on setting up the patched ghc and interpreter on a NixOS system. My goals are threefold:

  1. Give an overview of the project and project layout to lower barrier to entry for the system.
  2. Give step by step instructions on setting up the interpreter on a nix-based system and provide a forked github repo for nix users. This should allow nix users to just git clone foo and nix-build (spoiler: it wont be that easy but still not hard.)
  3. Popularize Csabas project! It is a refreshing take on Haskell optimization and compilation.

Making sense of the project

The external STG interpreter is part of the GRIN compiler project. We are not doing anything with the GRIN compiler (yet!) and so we are only interested in The GHC whole compiler project. The whole-compiler-project has several sub-projects that well be building and using directly:

  • external-stg: This subproject provides utilites well be using, in particular mkfullpak
  • external-stg-interpreter: This is the actual STG interpreter. The good news is that this is independent of the rest of the project and can be built just like a regular Haskell executable
  • ghc-wpc: This is a fork of ghc-8.10.x (Im not sure exactly which version it forks to be honest) which we must build in order to use the external STG interpreter. Ghc-wpc serves as a frontend for the external-stg-interpreter.

Building a working external STG interpreter

The external STG interpreter can be built like any regular haskell executable. But in order to use the interpreter we have to build ghc-wpc. ghc-wpc is necessary because it serves as a frontend for the STG interpreter. It compiles a Haskell program like normal and then dumps an enriched STG IR to file. This file is then run through a utility gen-exe (gen-exe is an executable built in the external-stg-compiler sub-project) which picks up the compilation pipeline from the STG IR and creates an executable like we would expect from a normal compilation pipeline.

The major difference between this process and the usual compiler pipeline is that ghc-wpc leaves enough compiler information on disk for the rest of the tooling to consume, namely, in files with a *.o_stgbin (this is STG IR generated at compile time), and *.o_stgapp (project linker and dependency information) extension. Thus, once we build this custom ghc version we can use it to build the source code we wish to analyze and begin our optimization work.

For the rest of this tutorial Ill be referencing my fork of the ghc-whole-compiler-project that includes everything you need if you want to follow along, including .nix files for creating a nix-shell which will prepare a suitable environment to run the entire toolchain.


The usual way to build ghc using a nix based system is with the ghc.nix project. Ghc.nix provides a default.nix with a suitable environment to run hadrian and build ghc. For ghc-wpc well need some special packages, and we need our boot compiler to be exactly ghc-8.3.3. The custom ghc.nix file is included in my fork, Ive taken the liberty to pin the nixpkgs to the right version for ghc-8.3.3. So lets begin:

Clone the forked repo:

$ git clone

$ cd ghc-whole-program-compiler-project

$ tree -L 1
├── dist-newstyle
├── external-stg
├── external-stg-compiler
├── external-stg-interpreter
├── ghc.nix.wpc
├── ghc-wpc
├── lambda
├── mod-pak
├── shell.nix
├── stack.yaml
└── stack.yaml.lock

Youll find the patched ghc.nix included (ghc.nix.wpc) and a shell.nix for a nix-shell. The shell.nix file simply references ghc.nix.wpc/default.nix with the appropriate options:

$ cat shell.nix
import (./ghc.nix.wpc/default.nix) {
useClang = true;
withHadrianDeps = true;
withIde = false;
withLlvm = true;

Building ghc-wpc

Now we can enter a nix-shell and build ghc-wpc:

$ pwd

$ nix-shell shell.nix # or just nix-shell
trace: checking if /home/doyougnu/programming/haskell/ghc-whole-program-compiler-project/hadrian/hadrian.cabal is present: no
Recommended ./configure arguments (found in $CONFIGURE_ARGS:
or use the configure_ghc command):



Now we need to cd into ghc-wpc and tweak the hadrian build.

MAJOR CONSTRAINT: You must build ghc-wpc with hadrian/build-stack, if you build in any other way youll run into shared object errors, see this ticket for details.

So in order to build ghc-wpc with stack well have to tweak the stack.yaml file. You must do this since it is not included in the fork:

Quick side note: To make the formatting nicer I truncate nix-shell:~/foo/bar/baz/ghc-whole-program-compiler-project to just ..., so nix-shell:.../ghc-wpc is equivalent to ~/path/to/ghc-whole-compiler-project/ghc-wpc.

[nix-shell:...]$ cd ghc-wpc/hadrian/

[nix-shell:.../ghc-wpc/hadrian]$ cat stack.yaml
resolver: lts-15.5

- '.'
- 'GHC-Cabal'

system-ghc: true

enable: true
shell-file: ../../shell.nix

The changes are: (1) tell stack we are using nix, and (2) reference the shell.nix file which points to ghc.wpc.nix at the root of the project, i.e., ghc-whole-program-compiler-project/shell.nix.

Now we should be able to begin our build, return to the root of ghc-wpc and run the following:

[nix-shell:.../ghc-wpc/hadrian]$ cd ..

[nix-shell:.../ghc-wpc]$ ./boot && ./configure

[nix-shell:.../ghc-wpc]$ hadrian/build-stack -j

and go get some coffee since this will take some time. Once it finishes you should have the ghc-wpc binary in _build/stage1/bin

[nix-shell:.../ghc-wpc]$ ls -l _build/stage1/bin/
total 8592
-rwxr-xr-x 1 doyougnu users 1843752 Apr 29 23:01 ghc
-rw-r--r-- 1 doyougnu users 11082 Apr 29 23:01 ghc.dyn_o_ghc_stgapp
-rwxr-xr-x 1 doyougnu users 660128 Apr 29 22:50 ghc-pkg
-rw-r--r-- 1 doyougnu users 9977 Apr 29 22:50 ghc-pkg.dyn_o_ghc_stgapp
-rwxr-xr-x 1 doyougnu users 4624680 Apr 29 23:01 haddock
-rw-r--r-- 1 doyougnu users 16883 Apr 29 23:01 haddock.dyn_o_ghc_stgapp
-rwxr-xr-x 1 doyougnu users 49344 Apr 29 22:25 hp2ps
-rw-r--r-- 1 doyougnu users 2504 Apr 29 22:25 hp2ps.dyn_o_ghc_stgapp
-rwxr-xr-x 1 doyougnu users 716440 Apr 29 22:35 hpc
-rw-r--r-- 1 doyougnu users 9959 Apr 29 22:35 hpc.dyn_o_ghc_stgapp
-rwxr-xr-x 1 doyougnu users 738544 Apr 29 22:35 hsc2hs
-rw-r--r-- 1 doyougnu users 10264 Apr 29 22:35 hsc2hs.dyn_o_ghc_stgapp
-rwxr-xr-x 1 doyougnu users 58384 Apr 29 22:34 runghc
-rw-r--r-- 1 doyougnu users 8864 Apr 29 22:34 runghc.dyn_o_ghc_stgapp

Notice that this build dumped *.<way>_o_ghc_stgapp files!

Building the stg tooling

Now that we have a working ghc-wpc we need to build the rest of the project by pointing stack to the ghc-wpc binary in ghc-wpc/_build/stage1/bin. That is, we must change the ghc-whole-program-compiler-project/stack.yaml file:

[nix-shell:~/programming/haskell/ghc-whole-program-compiler-project]$ cat stack.yaml
resolver: lts-16.13

allow-newer: true

- 'external-stg-compiler'
- 'external-stg'

"$everything": -fno-stgbin -fno-stgapp -optcxx-std=c++17

- async-pool-0.9.1@sha256:4015140f896c3f1652b06a679b0ade2717d05557970c283ea2c372a71be2a6a1,1605
- souffle-haskell-1.1.0
- zip-1.7.0

# use custom ext-stg whole program compiler GHC
compiler: ghc-8.11.0
skip-ghc-check: true

enable: false

# use local GHC (for development)
system-ghc: true
- /home/doyougnu/programming/haskell/ghc-whole-program-compiler-project/ghc-wpc/_build/stage1/bin

#dump-logs: all
# keep-tmp-files: true
# cabal-verbose: true

The changes are: (1) set compiler: ghc-8.11.0 (the ghc-wpc fork), (2) set skip-ghc-check: true so that stack doesnt complain about the ghc version, (3) set nix.enable: false, confusingly if you leave this as true then stack will try to use nixpkgs to get a ghc binary, but we want it to use our local binary so we disable this even though well still be in our original nix-shell (4) set system-path: true to tell stack we will be using a ghc we have on our system, and finally (5) set extra-path: <path-to-ghc-wpc-binary>.

Now we can run stack and install the stg tooling:

[nix-shell:...]$ stack --stack-root `pwd`/.stack-root install
Trouble loading CompilerPaths cache: UnliftIO.Exception.throwString called with:

Compiler file metadata mismatch, ignoring cache
Called from:
throwString (src/Stack/Storage/User.hs:277:8 in stack-2.7.5-9Yv1tjrmAU3JiZWCo86ldN:Stack.Storage.User)

WARNING: Ignoring tagged's bounds on template-haskell (>=2.8 && <2.17); using template-haskell-
Reason: allow-newer enabled.
WARNING: Ignoring aeson's bounds on template-haskell (>= && <2.17); using template-haskell-
Reason: allow-newer enabled.
WARNING: Ignoring th-abstraction's bounds on template-haskell (>=2.5 && <2.17); using template-haskell-
Reason: allow-newer enabled.
WARNING: Ignoring unliftio-core's bounds on base (>=4.5 && <4.14); using base-
Reason: allow-newer enabled.
WARNING: Ignoring souffle-haskell's bounds on megaparsec (>=7.0.5 && <8); using megaparsec-8.0.0.
stack --stack-root `pwd`/.stack-root install
... # bunch of output
Copied executables to /home/doyougnu/.local/bin:
- dce-fullpak
- ext-stg
- fullpak
- gen-exe
- gen-exe2
- gen-obj
- gen-obj2
- mkfullpak
- show-ghc-stg

Warning: Installation path /home/doyougnu/.local/bin not found on the PATH environment variable.

You can add ~/.local/bin to your PATH if you want, Ill just be directly referencing these binaries as we go.

Building the external-stg-interpreter

We are almost all done, all that is left is to build the external-stg-interpreter and run a small script that links everything together into a shared object for the interpreter. So:

[nix-shell:...]$ cd external-stg-interpreter/

[nix-shell:.../external-stg-interpreter]$ stack install
... # bunch of output
Copied executables to /home/doyougnu/.local/bin:
- ext-stg
- ext-stg-interpreter
- fullpak
- mkfullpak

Warning: Installation path /home/doyougnu/.local/bin not found on the PATH environment variable.

Now we have our ext-stg-interpreter built! There are a few caveats I want to point out here. Ive modified ghc-whole-program-compiler-project/external-stg-interpreter/stack.yaml to load the right packages and use nix:

[nix-shell:.../external-stg-interpreter]$ cat stack.yaml
resolver: lts-16.13

- '.'
- 'external-stg'

- souffle-haskell-2.1.0
- primitive-
- zip-1.7.0

enable: true
packages: [ zlib, libffi, pkg-config, bzip2 ]

Notice the nix: block. We could have just as easily built this using nix directly or using our shell.nix file.

Linking the external-stg-interpreter

The only task left is to link into a shared object library called To do that we need to use the script called, c, in ghc-whole-program-compiler-project/external-stg-interpreter/data. This script is a bit of a hack, it generates the shared object file so that we can link the symbols requested by the C FFI in base, but it populates those functions with our replacements, which do absolutely nothing. For example, we supply a fake garbage collect:

// in .../external-stg-interpreter/data/
void performGC(void) {

void performMajorGC(void) {

This works because we won't be using the runtime system at all, we'll be using the external STG interpreter instead, however we still need to provide these symbols in order to link. **MAJOR NOTE: this file must be next to any *.fullpak file youll be running the interpreter on** or else youll get an undefined symbol error during linking, for example:

[nix-shell:.../external-stg-interpreter/data]$ ls ghc-rts-base.fullpak minigame-strict.fullpak

### notice no .so file
[nix-shell:.../external-stg-interpreter/data]$ ~/.local/bin/ext-stg-interpreter ghc-rts-base.fullpak
ext-stg-interpreter: user error (dlopen: ./ cannot open shared object file: No such file or directory)

## we error'd out because it was missing, also
## if you get this error then you have an old file and need to rerun the c script
[nix-shell:.../external-stg-interpreter/data]$ ~/.local/bin/ext-stg-interpreter ghc-rts-base.fullpak
ext-stg-interpreter: user error (dlopen: ./ undefined symbol: getProcessElapsedTime)

To link the interpreter we need to run c in the data/ sub-folder:

[nix-shell:.../external-stg-interpreter]$ cd data/

[nix-shell:.../external-stg-interpreter/data/]$ ls
ar c cbits-rts.dyn_o c-src stub-base.dyn_o

[nix-shell:.../external-stg-interpreter/data/]$ ./c
++ ls ar/libHSbase- ar/libHSbindings-GLFW- ar/libHSbytestring- ar/libHScriterion-measurement- ar/libHSghc- ar/libHSGLUT- ar/libHSGLUT- ar/libHShashable- ar/libHSinteger-gmp- ar/libHSlambdacube-quake3-engine- ar/libHSmersenne-random-pure64- ar/libHSOpenGLRaw- ar/libHSprimitive- ar/libHSprocess- ar/libHStext- ar/libHSunix- ar/libHSunix- ar/libHSzlib-
++ ls stub-base.dyn_o/Blank_stub.dyn_o stub-base.dyn_o/ClockGetTime_stub.dyn_o stub-base.dyn_o/Internals_stub.dyn_o stub-base.dyn_o/RUsage_stub.dyn_o
++ ls cbits-rts.dyn_o/StgPrimFloat.dyn_o cbits-rts.dyn_o/TTY.dyn_o
++ ls c-src/fake_rts.c c-src/hack.c c-src/hschooks.c
+ gcc -o -shared -Wl,--whole-archive ar/libHSbase- ar/libHSbindings-GLFW- ar/libHSbytestring- ar/libHScriterion-measurement- ar/libHSghc- ar/libHSGLUT- ar/libHSGLUT- ar/libHShashable- ar/libHSinteger-gmp- ar/libHSlambdacube-quake3-engine- ar/libHSmersenne-random-pure64- ar/libHSOpenGLRaw- ar/libHSprimitive- ar/libHSprocess- ar/libHStext- ar/libHSunix- ar/libHSunix- ar/libHSzlib- -Wl,--no-whole-archive stub-base.dyn_o/Blank_stub.dyn_o stub-base.dyn_o/ClockGetTime_stub.dyn_o stub-base.dyn_o/Internals_stub.dyn_o stub-base.dyn_o/RUsage_stub.dyn_o cbits-rts.dyn_o/StgPrimFloat.dyn_o cbits-rts.dyn_o/TTY.dyn_o -fPIC c-src/fake_rts.c c-src/hack.c c-src/hschooks.c -lm -lgmp -ltinfo -lGL -lX11 -lXi -lXrandr -lXxf86vm -lXcursor -lXinerama -lpthread

This will produce in the immediate directory:

[nix-shell:.../external-stg-interpreter/data/]$ ls -l
total 984
drwxr-xr-x 2 doyougnu users 4096 Apr 27 14:10 ar
-rwxr-xr-x 1 doyougnu users 300 Apr 27 14:10 c
drwxr-xr-x 2 doyougnu users 4096 Apr 27 14:10 cbits-rts.dyn_o
drwxr-xr-x 2 doyougnu users 4096 Apr 27 14:10 c-src
-rwxr-xr-x 1 doyougnu users 986008 Apr 30 11:50 ## <----- new
drwxr-xr-x 2 doyougnu users 4096 Apr 27 14:10 stub-base.dyn_o

Now we can test our interpreter by running it on the *.fullpak files in external-stg-interpreter/data:

[nix-shell:.../external-stg-interpreter/data/]$ cd ..

[nix-shell:.../external-stg-interpreter/data]$ ls ghc-rts-base-call-graph-summary ghc-rts-base-call-graph.tsv ghc-rts-base.fullpak minigame-strict.fullpak

## remove the old .so file
[nix-shell:.../external-stg-interpreter/data]$ rm

## soft-link to the one we just built
[nix-shell:.../external-stg-interpreter/data]$ ln -s

[nix-shell:.../external-stg-interpreter/data]$ ls -l
total 79220
drwxr-xr-x 6 doyougnu users 4096 Apr 30 11:50
-rw-r--r-- 1 doyougnu users 48 Apr 30 11:47 ghc-rts-base-call-graph-summary
-rw-r--r-- 1 doyougnu users 28238 Apr 30 11:47 ghc-rts-base-call-graph.tsv
-rw-r--r-- 1 doyougnu users 22450708 Apr 27 14:10 ghc-rts-base.fullpak
lrwxrwxrwx 1 doyougnu users 43 Apr 30 11:55 -> ### <---- new
-rw-r--r-- 1 doyougnu users 58630129 Apr 27 14:10 minigame-strict.fullpak

[nix-shell:.../external-stg-interpreter/data]$ ~/.local/bin/ext-stg-interpreter ghc-rts-base.fullpak
ssHeapStartAddress: 53522
ssTotalLNECount: 69
ssClosureCallCounter: 360
executed closure id count: 114
call graph size: 150

[nix-shell:.../external-stg-interpreter/data]$ ls -l
total 79220
drwxr-xr-x 6 doyougnu users 4096 Apr 30 11:50
-rw-r--r-- 1 doyougnu users 48 Apr 30 11:56 ghc-rts-base-call-graph-summary ### <---- interpreter output
-rw-r--r-- 1 doyougnu users 28238 Apr 30 11:56 ghc-rts-base-call-graph.tsv ### <---- interpreter output
-rw-r--r-- 1 doyougnu users 22450708 Apr 27 14:10 ghc-rts-base.fullpak
lrwxrwxrwx 1 doyougnu users 43 Apr 30 11:55 ->
-rw-r--r-- 1 doyougnu users 58630129 Apr 27 14:10 minigame-strict.fullpak

And it works, we have two new files, <foo>-call-graph-summary and <foo>-call-graph.tsv which we can analyze to inspect the behavior of our program (more on this later).

The whole setup process on a demo

That was a rather involved example, to make clear the dependencies and steps required to run this on your own code the rest of this tutorial will run the interpreter on two of Csabas demos from his skillshare talk. First lets grab the code:

$ pwd

$ git clone

$ ls
ext-stg-interpreter-presentation-demos ghc-whole-program-compiler-project ..

Now well run the first demo which is a simply fold over a list:

$ nix-shell ghc-whole-program-compiler-project/shell.nix
trace: checking if /home/doyougnu/programming/haskell/hadrian/hadrian.cabal is present: no
Recommended ./configure arguments (found in $CONFIGURE_ARGS:
or use the configure_ghc command):


[nix-shell:~/programming/haskell]$ cd ext-stg-interpreter-presentation-demos/demo-01-tsumupto/

[nix-shell:~/programming/haskell/ext-stg-interpreter-presentation-demos/demo-01-tsumupto]$ ../../ghc-whole-program-compiler-project/ghc-wpc/_build/stage1/bin/ghc -O2 tsumupto.hs
[1 of 1] Compiling Main ( tsumupto.hs, tsumupto.o )
Linking tsumupto ...
$ cd ext-stg-interpreter-presentation-demos/demo-01-tsumupto

$ ls
tsumupto tsumupto.hi tsumupto.hs tsumupto.o tsumupto.o_ghc_stgapp tsumupto.o_modpak

Note, that we have two new files: *.o_ghc_stgapp and .o_modpak as a result of building with ghc-wpc. If you try to run this from outside the nix-shell youll get an error about missing mkmodpak:

$ ../../ghc-whole-program-compiler-project/ghc-wpc/_build/stage1/bin/ghc -O2 tsumupto.hs
[1 of 1] Compiling Main ( tsumupto.hs, tsumupto.o )
ghc: could not execute: mkmodpak

Now that we have those files we can run the interpreter, but first though we need to make a *.fullpak file from the *.o_ghc_stgapp file and create a symbolic link to

## make the fullpack file
$ ~/.local/bin/mkfullpak tsumupto.o_ghc_stgapp
all modules: 259
app modules: 113
app dependencies:
... # bunch of output
main Main
creating tsumupto.fullpak

## create the link to the shared object file
$ ln -s ../../ghc-whole-program-compiler-project/external-stg-interpreter/data/

## the final directory should look like this
$ ls tsumupto tsumupto.fullpak tsumupto.hi tsumupto.hs tsumupto.o tsumupto.o_ghc_stgapp tsumupto.o_modpak

And now we can run the interpreter:

$ ~/.local/bin/ext-stg-interpreter tsumupto.fullpak
ssHeapStartAddress: 44082
ssTotalLNECount: 43
ssClosureCallCounter: 30275
executed closure id count: 112
call graph size: 146

The first line is the output of the program and the rest are diagnostics that the interpreter outputs. More importantly we should have a tab-separated csv file and call graph file in our local directory after running the interpreter:

$ ls -l
total 23876
lrwxrwxrwx 1 doyougnu users 114 Apr 30 12:21 -> ../../ghc-whole-program-compiler-project/external-stg-interpreter/data/
-rwxr-xr-x 1 doyougnu users 9442648 Apr 30 12:12 tsumupto
-rw-r--r-- 1 doyougnu users 53 Apr 30 12:23 tsumupto-call-graph-summary ### <---- interpreter output
-rw-r--r-- 1 doyougnu users 27490 Apr 30 12:23 tsumupto-call-graph.tsv ### <---- interpreter output
-rw------- 1 doyougnu users 14922366 Apr 30 12:19 tsumupto.fullpak
-rw-r--r-- 1 doyougnu users 1769 Apr 30 12:12 tsumupto.hi
-rw-r--r-- 1 doyougnu users 207 Apr 28 22:56 tsumupto.hs
-rw-r--r-- 1 doyougnu users 4488 Apr 30 12:12 tsumupto.o
-rw-r--r-- 1 doyougnu users 8817 Apr 30 12:12 tsumupto.o_ghc_stgapp
-rw------- 1 doyougnu users 9803 Apr 30 12:12 tsumupto.o_modpak

Which can be loaded into gephi for closer inspection of the call graph of our program. Be sure to watch the rest of the demo in Csabas talk for this part! For now well be going over using gephi and these files in our next blog post in this series, stay tuned!


File Descriptions

  • foo.modpak: A zip file which contains the Core, STG, CMM, source code, and assembly for the module foo
  • foo.fullpak: A zip file which contains the same information as modpack but for every module of the program rather than just module foo.
  • foo.o_ghc_stgapp: a yaml like file that contains:
    • the modules dependencies including package dependencies
    • a bunch of file paths for shared objects of the libraries
    • the flags the module was built with
  • shared object file created by ext-stg-interpreter/data/ Required to be in the same directory as ext-stg-interpreter will be invoked.

Step-by-Step guide for running the interpreter on your code

  1. Build your project with ghc-wpc/_build/stage1/bin by directly invoking that ghc (as I did in the demo-01 project) or by pointing stack to it with system-ghc and extra-path in stack.yaml, or by passing -w <path-to-ghc-wpc-binary with cabal.
  2. Generate the foo.fullpak file with mkfullpak foo.o_ghc_stgapp
  3. Soft-link to in the directory you will run the interpreter in. This file must be present when you run the interpreter!
  4. Now run the interpreter on project.fullpak
  5. Analyze foo-call-graph-summary and foo-call-graph.tsv with whatever tools make sense to you


  1. This isnt completely true, there is the RuntimeRep type controls exactly this and the levity polymorphism work by Richard Eisenberg. See this video for examples on using these features. We do plan to include a more thorough and real world example on using levity polymorphism for better performance in the haskell optimization handbook.

· 5 min read

In this post I discuss the inlining of Integer and Natural operations in Haskell. It’s a promising performance work I’ve been conducting six months ago, which was blocked by an independent issue, but that I will likely resume soon as the issue has been fixed in the meantime.

To follow this post, you must know that Natural numbers are represented as follows in ghc-bignum:

-- | Natural number
-- Invariant: numbers <= WORD_MAXBOUND use the `NS` constructor
data Natural
= NS !Word#
| NB !BigNat#

Small naturals are represented with a Word# and large ones with a BigNat# (a ByteArray#).

Now consider the following simple example using Natural:

-- | Add 2 to a Word. Use Natural to avoid Word overflow
foo :: Word -> Natural
foo x = fromIntegral x + 2

There are only small naturals involved: fromIntegral x is small because x is a Word, and 2 is small. We could hope that GHC would use Word# primops to implement this and would allocate a Natural heap object for the result only. However it’s not what happens currently, even in GHC HEAD. In the following STG dump, we can see that a Natural heap object is allocated for x before calling naturalAdd (let bindings in STG reflect heap allocations):

foo1 = NS! [2##];

foo =
\r [x_sXn]
case x_sXn of {
W# x#_sXp ->
let { sat_sXq = NS! [x#_sXp]; } in naturalAdd sat_sXq foo1;

Let’s look at naturalAdd:

-- | Add two naturals
naturalAdd :: Natural -> Natural -> Natural
{-# NOINLINE naturalAdd #-}
naturalAdd (NS x) (NB y) = NB (bigNatAddWord# y x)
naturalAdd (NB x) (NS y) = NB (bigNatAddWord# x y)
naturalAdd (NB x) (NB y) = NB (bigNatAdd x y)
naturalAdd (NS x) (NS y) =
case addWordC# x y of
(# l,0# #) -> NS l
(# l,c #) -> NB (bigNatFromWord2# (int2Word# c) l)

We are clearly in the last case where both arguments are small. It seems beneficial to allow this function to be inlined. If we did we would get:

foo =
\r [x_s158]
case x_s158 of {
W# x#_s15a ->
case addWordC# [x#_s15a 2##] of {
(#,#) l_s15c ds_s15d ->
case ds_s15d<TagProper> of ds1_s15e {
case int2Word# [ds1_s15e] of sat_s15f {
case bigNatFromWord2# sat_s15f l_s15c of ds2_s15g {
__DEFAULT -> NB [ds2_s15g];
0# -> NS [l_s15c];

which produces much better assembly code, especially if there is no carry:

    addq $2,%rax       ; add 2 to a machine word
setc %bl ; test the carry.
movzbl %bl,%ebx ; it could be done
testq %rbx,%rbx ; more efficiently
jne _blk_c17c ; with "jc"
movq $NS_con_info,-8(%r12) ; alloc NS datacon value
movq %rax,(%r12) ; with the addition result as payload.
leaq -7(%r12),%rbx ; make it the first argument
addq $8,%rbp ; and then
jmp *(%rbp) ; call continuation

So why aren’t we always inlining naturalAdd? We even explicitly disallow it with a NOINLINE pragma. The reason is that naturalAdd and friends are involved in constant-folding rules.

For example, consider:

bar :: Natural -> Natural
bar x = x + 2

baz = bar 0x12345678913245678912345679123456798

Currently we get the following Core:

bar1 = NS 2##

bar = \ x_aHU -> naturalAdd x_aHU bar1

baz = NB 99114423092485377935703335253042771879834

You can see that baz is a constant thanks to constant-folding.

However if we let naturalAdd inline we get:

= case bigNatAddWord# 99114423092485377935703335253042771879832 2##
of ds_d11H
{ __DEFAULT ->
NB ds_d11H

baz is no longer a constant.

A solution would be to add constant-folding rules for BigNat# functions, such as bigNatAddWord#. This is exactly what we have started doing in #20361. Our new plan is:

  • Make BigNat# operation NOINLINE and add constant-folding rules for them
  • Make Integer/Natural operations INLINEABLE (expose their unfolding)
  • Hence rely on constant-folding for Word#/Int#/BigNat# to provide constant folding for Integer and Natural

The good consequences of this plan are:

  • Less allocations when bignum operations are inlined and some of the arguments are known to be small/big or fully known (constant).
  • Integer and Natural are less magical: you can implement your own similar types and expect the same performance without having to add new rewrite rules

There were some unforeseen difficulties with this plan though:

  1. Some of the rewrite rules we need involve unboxed values such as BigNat# and Word# and the weren’t supported. Luckily, this has been recently fixed (#19313) by removing the “app invariant” (#20554). Thanks Joachim! That’s the reason why we could resume this work now.
  2. Some unfoldings (RHSs) become bigger due to the inlining of bignum operations. Hence they may not themselves be inlined further due to inlining thresholds even if it would be beneficial. A better inlining heuristic would fix this (see #20516). It will likely be the topic of the next post.

· 3 min read

JS Backend

In March the team focused on porting more GHCJS code to GHC head.

  • Most of us are new to GHCJS’s codebase so we are taking some time to better understand it and to better document it as code gets integrated into GHC head.
  • Development process: initially we had planned to integrate features one after the others into GHC head. However it was finally decided that features would be merged into a wip/javascript-backend branch first and then later merged into GHC head. After trying this approach we decided to work directly into another branch: wip/js-staging . Opening merge requests that can’t be tested against a branch that isn’t GHC head didn’t bring any benefit and slowed us too much.
  • Documentation: we wrote a document comparing the different approaches to target JavaScript/WebAssembly
  • RTS: some parts of GHCJS’s RTS are generated from Haskell code, similarly to code generated with the genapply program in the C RTS. This code has been ported to GHC head. As JS linking---especially linking with the RTS---will only be performed by GHC in the short term, we plan to make it generate this code dynamically at link time.
  • Linker: most of GHCJS’s linker code has been adapted to GHC head. Because of the lack of modularity of GHC, a lot of GHC code was duplicated into GHCJS and slightly modified. Now that both codes have diverged we need to spend some time making them converge again, probably by making the Linker code in GHC more modular.
  • Adaptation to GHC head: some work is underway to replace GHCJS’s Objectable type-class with GHC’s Binary type-class which serves the same purpose. Similarly a lot of uses of Text have been replaced with GHC’s ShortText or FastString.
  • Template Haskell: GHCJS has its own TH runner which inspired GHC’s external interpreter (“Iserv”) programs. We have been exploring options to port TH runner code as an Iserv implementation. The Iserv protocol uses GADTs to represent its messages which requires more boilerplate code to convert them into JS because we can’t automatically derive aeson instances for them.
  • Plugins: we have an MR adding support for “external static plugins” to GHC !7377. Currently it only supports configuring plugins via environment variables. We have been working on adding support for command-line flags instead.
  • Testsuite: we have fixed GHC’s build system so that it can run GHC’s testsuite when GHC is built as a cross-compiler (!7850). There is still some work to do (tracked in #21292) to somehow support tests that run compiled programs: with cross-compilers, target programs can’t be directly executed by the host architecture.


  • Performance book: some time was spent on the infrastructure (CI) and on switching the format of the book to ReStructured Text
  • Modularity: some time was spent discussing GHC’s design and refactoring (c.f. !7442 and #20927).

· 2 min read


  • To cross compile Haskell code for windows a wine process must be used to evaluate Template Haskell code at compile time. Some times this code needs DLLs to be present for the Template Haskell code to run. We had been maintaining a list of DLLs manually (#1400 for instance added secp256k1). A more general solution (#1405) was found that uses the pkgsHostTarget environment variable to obtain a list of all the packages dependencies. Then the DLLs from the are made available to the wine process running the Template Haskell code. This should make more libraries build correctly while reducing unnecessary dependencies.
  • The way Haskell.nix cleans source trees has changed with #1403, #1409 and #1418. When using Nix >=2.4 source in the store is now filtered in the same way it is locally. This has a couple of key advantages:
    • It makes it less likely that results on CI systems (where the source is likely to be in the store) will differ from results for local builds (where the source is in a cloned git repository).
    • Potential for reducing load on CI. Although more work may be needed, this kind of filtering combined with the experimental content addressing features of Nix reduce the required rebuilds.
  • In the past rather cryptic error messages were given when an attempt was made to use an old version of GHC on a platform Haskell.nix did not support it. In some cases Haskell.nix would even attempt to build GHC and only fail after some time. Better error messages are now given right away when an attempt is made to use a GHC version that is not supported for a particular platform #1411

Version Updates

  • GHC 9.2.2 was added #1394

Bug fixes

  • gitMinimal replaces git to reduce the dependency tree of cabalProject functions #1387
  • Less used of allowSubstitutes=false #1389
  • Fixed aarch64-linux builds by using correct boot compiler #1390
  • icu-i18n package mapping added to make text-icu build #1395
  • Fixes needed for newer nixpkgs versions
    • Use list for configureFlags #1396
    • The spdx json file is in a .json output #1397
    • gdk_pixbuf is now gdk-pixbuf #1398
  • Replaced deprecated NixOS binary cache settings in docs #1410
  • Enable static build of secp256k1 on musl #1413

Finally, we’d like to thank all the awesome contributors, who make haskell.nix a thriving open source project! ❤️

· 2 min read

JS backend

This month we worked on adapting code from GHCJS to merge into GHC head. We also started discussing the implementation process publicly and especially with our colleagues at Well-Typed.

  • Ticket about adapting GHCJS’ code into a proper JS backend for GHC has been opened [#21078]. Feedback was very positive!
  • There were discussions about the process and an agreement to target GHC 9.6 release [email on ghc-devs, wiki page]
  • deriveConstants is a program used to generate some header file included in the rts package. While it is mainly useful for native targets, we had to make it support Javascript targets [!7585]
  • Javascript is going to be the first official target platform supported by GHC that has its own notion of managed heap objects. Hence we may need a new RuntimeRep to represent these values for Haskell codes interacting with JS codes via FFI. We opened !7577 into which we tried to make this new RuntimeRep non JS specific so that it could be reused for future backends targeting other managed platforms (e.g. CLR, JVM). It triggered a lot of discussions summarized in #21142.
  • GHCJS’s code generator was ported to GHC head [!7573]. In its current state, we can generate Javascript unoptimised code -- the optimiser hasn’t been ported yet -- by compiling a module with -c -fjavascript. It required many changes, not only to adapt to changes between GHC 8.10 and GHC head but also to avoid adding new package dependencies. It was also an opportunity to refactor and to document the code, which is still a work in progress.
  • GHC doesn’t use any lens library, hence to port the code generator we had to replace lenses with usual record accessors. It turned out that case alternatives in STG lacked them because they were represented with a triple. We took the opportunity to introduce a proper record type for them !7652

Plutus-apps JS demo

  • We improved the proof of concept JavaScript library for generating Plutus transactions with a given set of constraints and lookups, exposing functionality from the plutus-ledger-constraints package. [Report]


· 9 min read

IOG is committed to improving Haskell developer experience, both by sponsoring the Haskell Foundation and by directly founding a team committed to this task: the Haskell DX team.

The team now tries to provide regular (monthly) updates about its work. This post is a bit longer because it covers all of 2021 which has not been covered anywhere else.

Code generation

  • Added a new backend for AArch64 architectures, especially to support Apple’s M1. Previously AArch64 was only supported via the LLVM based backend which is much slower. [!5884]
  • Added support for Apple’s M1 calling convention. In GHC 9.2.1 it implied making lifted sized types (e.g. Word8, Int16...) use their unlifted counterparts (e.g. Word8#, Int16#...); in GHC 8.10.7 – a minor release –  a less invasive but more fragile solution was implemented [commit].
  • Fixed a very old GHC issue [#1257] by making GHCi support unboxed values [!4412]: ByteCode is now generated from STG instead of directly from Core. It allows more Haskell codes to be supported by HLS and it even allows GHC code to be loaded into GHCi [link].
  • Fixed a bug in the Cmm sinking pass that led to register corruption at runtime with the C backend. Even if we don’t use the C backend, fixing this avoided spurious errors in CI jobs using it [#19237,!5755]
  • Fixed a register clobbering issue for 64-bit comparisons generated with the 32-bit x86 NCG backend [commit].
  • Fixed generation of switches on sized literals in StgToCmm [!6211]
  • Fixed LLVM shifts [#19215,!4822]


  • Fixed an off-by-one error in the MachO (Darwin) linker [!6041]. The fix is simple but the debugging session was epic!
  • Fix to avoid linking plugin units unconditionally with target code, which is wrong in general but even more so when GHC is used as a cross-compiler: plugins and target code aren’t for the same platform [#20218,!6496]


  • With John Ericson (Obsidian Systems) we finally made GHC independent of its target [!6791,!6539]. It means that there is no need to rebuild GHC to make it target another platform, so it now becomes possible to add support for a --target=... command-line flag [#11470]. It also means that a cross-compiling GHC could build plugins for its host platform in addition to building code for its target platform.
  • A side-effect of the previous bullet is that primops’ types are now platform independent. Previously some of them would use Word64 on 32-bit architectures and Word on 64-bit architectures: now Word64 is used on every platform. A side-effect of this side-effect is that we had to make Word64 as efficient as Word: it now benefits from the same optimizations (constant folding #19024, etc.). On 32-bit platforms, it reduced allocations by a fair amount in some cases: e.g. -25.8% in T9203 test and -11.5% when running haddock on base library [!6167]. We hope it will benefit other 32-bit architectures such as JavaScript or WebAssembly.
  • GHC built as a cross-compiler doesn’t support compiler plugins [#14335]. We have been working on refactoring GHC to make it support two separate environments in a given compiler session – one for target code and another for the plugin/compiler code. The implementation in [!6748] conflicts quite a lot with the support of multiple home-units that was added at about the same time. GHC needs to be refactored a lot more to correctly support this approach, so instead we implemented a different approach to load plugins which is more low-level and bypasses the issue [#20964, !7377].
  • We made GHC consider the target platform instead of the host platform in guessOutputFile [!6116]
  • Use target platform instead of host platform to detect literal overflows [#17336,!4986]


  • We updated GHCJS to use GHC 8.10.7 [branch]
  • We worked on making GHCJS’s codebase more suitable for integration into GHC: reducing the number of dependencies, avoiding the use of Template Haskell, reusing GHC’s build system, etc. There is now a GHCJS integrated into a GHC 8.10.7 fork [branch].
  • This experience led us to plan the realization of a JS backend into GHC head based on GHCJS. More information about this topic in our next report.
  • We worked on making GHC’s testsuite pass with GHCJS, triaging tests that legitimately fail on a JS platform from tests revealing real GHCJS issues. [LINK]


  • We seemed to be the first to try to build GHC on Windows with the updated GNU autotools 2.70 and this release made a breaking change to the way auxiliary files (config.guess, config.sub) were handled, breaking GHC’s build (#19189). The root cause of the issue couldn’t be easily solved so we modified GHC’s build system to avoid the use of these auxiliary files, bypassing the issue. Most GHC devs won’t ever notice that something was broken to begin with when they will update their GNU toolchain on Windows. [!4768,!4987,!5065]
  • Fixed cross-compilation of GHC from Linux to Windows using Hadrian [#20657,!6945,!6958]


  • Fixed Natural to Float/Double conversions to align with the method used for Integer to Float/Double and added missing rewrite rules [!6004]
  • Made most bignum literals be desugared into their final form in HsToCore stage instead of CoreToStg stage to ensure that Core optimizations were applied correctly to them [#20245,!6376]
  • Some constant folding rules were missing and were added:
  • Allowed some ghc-bignum operations to inline to get better performance, while still managing to keep constant-folding working [#19641,!6677,!6696,!6306]. There is some work left to do (cf #20361) but it is blocked by #19313 which in turn is blocked by #20554 which should be fixed soon (!6865, thanks Joachim!).
  • The ubiquitous fromIntegral function used to have many associated rewrite rules to make it fast (avoiding heap allocation of a passthrough Integer when possible) that were difficult to manage due to the combinatorial number of needed rules (#19907, #20062). We found a way to remove all these rules (!5862).

Technical debt & modularity

  • Made several component of the compiler independent of DynFlags (parsed command-line flags):
  • Made the handling of “package imports” less fragile [!6586] and refactored some code related to dependencies and recompilation avoidance [!6528,!6346].
  • Abstracted plugin related fields from HscEnv [!7175]
  • Made a home-unit optional in several places [!7013]: the home-unit should only be required when compiling code, not when loading code (e.g. when loading plugins in cross-compilers #14335).
  • Made GHC no longer expose the (wrong) selected ghc-bignum backend with ghc --info. ghc-bignum now exposes a backendName function for this purpose [#20495,!6903]
  • Moved tmpDir from Settings to DynFlags [!6297]
  • Removed use of unsafePerfomIO in getProgName [!6137]
  • Refactored warning flags handling [!5815]
  • Made assertions use normal functions instead of CPP [!5693]
  • Made the interpreter more independent of the driver [!5627]
  • Replaced ptext . sLit with text [!5625]
  • Removed broken “dynamic-by-default” setting [#16782,!5467]
  • Abstracted some components from the compiler session state (HscEnv):
    • unit-related fields into a new UnitEnvdatatype [!5425]
    • FinderCache and NameCache[!4951]
    • Loader state [!5287]
  • Removed the need for a home unit-id to initialize an external package state (EPS) [!5043]
  • Refactored -dynamic-too handling [#19264,!4905]


  • Made divInt#, modInt# and divModInt# branchless and inlineable [#18067,#19636,!3229]
  • Fixed Integral instances for Word8/16/32 and showWord to use quotRemWordN [!5891,!5846]
  • Improved performance of occurrence analysis [#19989,!5977]
  • Fixed unnecessary pinned allocations in appendFS [!5989]
  • Added a rewrite rules for string literals:
    • Concatenation of string literals [#20174,#16373,!6259]
    • (++) . unpackCString# ⇒ unpackAppendCString# leading to a 15% reduction in compilation time on a specific example. [!6619]
    • Compute SDoc literal size at compilation time [#19266, !4901]
  • Fix for Dwarf strings generated by the NCG that were unnecessarily retained in the FastString table [!6621]
  • Worked on improving inlining heuristics by taking into account applied constructors at call sites [#20516,!6732]. More work is needed though.
  • Fixed #20857 by making the Id cache for primops used more often [!7241]
  • Replaced some avoidable uses of replicateM . length with more efficient code [!7198]. No performance gain this time but the next reader of this code won’t have to wonder if fixing it could improve performance.
  • Made exprIsCheapX inline for modest but easy perf improvements [!7183]
  • Removed an allocation in the code used to write text on a Handle (used by putStrLn, etc.) [!7160]
  • Replaced inefficient list operations with more efficient Monoid ([a],[b]) operations in the driver [!7069], leading to 1.9% reduction in compiler allocations in MultiLayerModules test.
  • Disabled some callstack allocations in non-debug builds [!6252]
  • Made file copy in GHC more efficient [!5801]
  • Miscellaneous pretty-printer enhancements [!5226]
  • Type tidying perf improvements with strictness [#14738,!4892]


  • Fixed issues related to the RTS’s ticker
    • Fixed some races [#18033,#20132,!6201]
    • Made the RTS open the file descriptor for its timer (timerfd) on Linux synchronously to avoid weird interactions with Haskell code manipulating file descriptors [#20618,!6902].
  • Moved GHC’s global variables used to manage Uniques into the RTS to fix plugin issues [#19940,!5900]

Build system / CI

  • Fixed Hadrian output to display warnings and errors after the multi screen long command lines [#20490,!6690]
  • Avoided the installation of a global platformConstants file; made GHC load constants from the RTS unit instead, allowing it to be reinstalled with different constants [!5427]
  • Made deriveConstants output its file atomically [#19684,!5520]
  • Made compression with xz faster on CI [!5066]
  • Don’t build extra object with -no-hs-main [#18938,!4974]
  • Add hi-boot dependencies with ghc -M [#14482,!4876]


  • Stack: fixed interface reading in hi-file-parser to support GHC 8.10 and 9.0 [PR, Stack#5134]
  • Enhanced pretty-printing of coercions in Core dumps [!4856]

· 2 min read



  • Support for external Hackage repositories was improved by #1370. We can now use an extra package repository just by adding a repository block to the cabal.project file. This makes it easy to make use of an extra hackage databases such as hackage.head and hackage-overlay-ghcjs. A sha256 for the repository it can be added as a comment in the repository block or by including it in the sha256map argument.

Version Updates

  • nix-tools was updated to use the Cabal 3.6.2 and hnix 0.16 nix-tools#113
  • Nixpkgs pins were bumped #1371
  • Update booting on aarch64 linux to ghc 8.8.4 1325 and 1374

Bug fixes

  • Allow linking pcre statically with musl #1363
  • Add gpiod to system nixpkgs map #1359
  • Add poppler-cpp to png-config Nixpkgs map #1373
  • Use the same logic that cabal-install uses for determining the path of a packages .tar.gz in a repository nix-tools#114
  • Fix libnuma dependency in rts.conf 1342
  • Fix when "materialized" dir is deep #1376
  • Prefer local building for git-ls-files #1378 and #1381
  • Fix stack cache generator sha256 is a string not a lambda #1383
  • Only pass --index-state to cabal when asked #1384
  • Pass enableDWARF to makeConfigFiles to fix -g3 support in nix-shell #1385

Finally, we’d like to thank all the awesome contributors, who make haskell.nix a thriving open source project! ❤️

· One min read

Hopefully 2022 should be the year GHC will get a JavaScript backend without relying on GHCJS. This month the team has been busy planning the work that needs to be done to get there!


  • GHCJS has been updated to reduce the gap with GHC 8.10.7 codebase to the point that GHC’s build system is used to build GHCJS
  • Internal work planning for the integration of GHCJS into GHC
  • A different approach to load plugins into cross-compilers has been implemented [#20964, !7377]
  • GHCJS has been exercised to showcase compilation of some Plutus applications


  • A few “subsystems” of GHC have been made more modular and reusable by making them independent of the command-line flags (DynFlags) [#17957, !7158, !7199, !7325]. This work resulted in a 10% reduction in call sites to DynFlags and has now removed all references to DynFlags up to the CoreToStg pass, which is almost the entire backend of GHC.


  • Jeffrey wrote a new HF proposal about writing a Haskell Optimization handbook and has started working on it

· 2 min read

January 2022

This month we merged some very significant improvements to the support for compiling for Android and iOS based AArch64 devices.  When the build system is also AArch64 template haskell can often be run locally.  This will make targeting mobile devices from AArch64 builders much easier.

A long running branch containing bug fixes for cross compilation to JavaScript with GHCJS was merged.  One nice feature included is better support for adding bindings to C code compiled with emscripten.  In some cases it can be as easy as adding a single JavaScript file to the package with wrappers for the C functions.


  • Much improved AArch64 support including Template Haskell (#1316)
  • Improved GHCJS and support for calling C code compiled with emscripten (#1311)
  • The environment variables LANG and LOCALE_ARCHIVE are no longer set in shells allowing the users prefered settings to persist (#1341).
  • source-repo-override argument added for cabal projects to allow the location of source-repository-package packages to be replaced (#1354)

Version Updates

  • GHC 9.0.2 was added to the available GHC versions (#1338)
  • The nixpkgs pins for 21.05, 21.11 and unstable were all updated (#1334).
  • Remaining uses of cabal 3.4 were updated to 3.6.2 (#1328)

Bug fixes

  • Dwarf build of ghc 9.2.1 now skipped on hydra to work around 4GB hydra limit (#1333)
  • Removed use of propagatedBuildInputs in ghc derivation (#1318).
  • Caching of the check-hydra CI script was fixed (#1340)