At IOG DevX we have been working on integrating various bits of GHCJS into GHC, with the goal of having a fully working JavaScript backend for the 9.6 release. For some parts this has mostly consisted of an update of the code to use the newer GHC API and dependencies. Other bits, like the Template Haskell runner, need more work.
This post gives an overview of the existing approaches for running Template Haskell in GHC based cross compilers and our plan for the JavaScript backend. Hopefully we can revisit this topic once all the work has been done, and see what exactly we ended up with.
When I first worked on Template Haskell (TH) support for GHCJS, there was no mechanism to combine Template Haskell with cross compilation in GHC.
Normally, Template Haskell is run by loading library code directly into the GHC process and using the bytecode interpreter for the current module. Template Haskell can directly access GHC data structures through the Q monad. Clearly this would not be possible for GHCJS: We only have JavaScript code available for the libraries and the organization of the JavaScript data structures is very different from what GHC uses internally.
So I had to look for an alternative. Running Template Haskell consists of two parts:
loading/executing the TH code
handling compiler queries from the TH code, for example looking up names or types
Running the TH code can be done by first compiling the Haskell to JavaScript and then using the JavaScript eval feature.
Template Haskell code can query the compiler using the Quasi typeclass. I noticed that none of the methods required passing around functions or complicated data structures, so it would be possible to serialize each request and response and send it to another process.
So I went ahead and implemented this approach with a script thrunner.js to load and start the code in a node.js server, a message type with serialization, and a new instance of the Quasi typeclass to handle the communication with the compiler via the messages. This is still what's in use by GHCJS to this day. Every time GHCJS encounters Template Haskell, it starts a thrunner process and the compiler communicates with it over a pipe.
After starting thrunner.js GHCJS sends the Haskell parts of the Template Haskell runnner to the script. This includes the runtime system and the implementation of the Quasi typeclass and communication protocol. After that, the TH session starts. A typical TH session looks as follows:
Compiler
thrunner
RunTH THExp <js code> <source location>
LookupName (Just <name-string>)
LookupName' (Just <name>)
Reify <name>
Reify' <name-info>
RunTH' <result>
RunTH THDec <js code> <source location>
AddTopDecls <declarations>
AddTopDecls'
RunTH' <result>
FinishTH True
FinishTH' <memory-consumption>
Each message is followed up by a corresponding reply. For example, a LookupName' response follows a LookupName request and a RunTH message will eventually generate a RunTH' result. The first RunTH message contains the compiled JavaScript for the Template Haskell code, along with its dependencies. Each subsequent RunTH only includes dependencies that have not already been sent.
The thrunner process stays alive during the compilation of at least an entire module, allowing for persistent state (putQ/getQ).
If we build a Haskell program with (cost centre) profiling, the layout of our data structures changes to include bookkeeping of cost centre information. This means that we need a special profiling runtime system to run this code.
What can we do if we want to run our profiled build in GHCi or Template Haskell? We cannot load compiled profiling libraries into GHC directly; its runtime system expects non-profiled code. We could use a profiled version of the compiler itself, but this would make all compilation very slow. Or we could somehow separate the profiled code of our own program from the non-profiled code in the compiler.
This was Simon Marlow's motivation for adapting the GHCJS thrunner approach, integrating in GHC and extending it it to support GHCi and bytecode. This functionality can be activated with the -fexternal-interpreter flag and has been available since GHC version 8.0.1. When the external interpreter is activated, GHC starts a separate process, iserv (customizable with the -pgmi flag) which has the role analogous to the thrunner script for GHCJS.
Over time, the iserv code has evolved with GHC and has been extended to include more operations. By now, there are quite a few differences in features:
Feature
thrunner
iserv
Template Haskell support
yes
yes
GHCi
no
yes
Debugger
no
yes
Bytecode
no
yes
Object code
through pipe
from file
Object code linking
compiler
iserv process
thrunner is not quite as complete as iserv: It lacks GHCi and the debugger, and there is no bytecode support. But these features are not essential for basic Template Haskell.
We have now seen two systems for running Template Haskell code outside the compiler process: The original GHCJS thrunner and the extended GHC iserv.
Clearly it isn't ideal to have multiple "external interpreter" systems in GHC, therefore we plan to switch from thrunner to iserv for the upcoming JavaScript GHC backend. We don't need the debugger or GHCi support yet, but we do need to adapt to other changes in the infrastructure. So what does this mean in practice?
The biggest change is that we have to rework the linker: thrunner does not contain any linking logic by itself: GHCJS compiles everything to JavaScript and sends compiled code to the thrunner process, ready to be executed. In contrast, iserv has a loader for object and archive files. When dependencies need to be loaded into the interpreter, GHC just gives it the file name.
Another change is using the updated message types. In the thrunner session example above we could see that each message is paired with a response. For example a RunTH' response always follows a RunTH message, with possibly other messages in between. iserv has an interesting approach for the Message datatype: Instead of having pairs of data constructors for each message and its response, iserv has a GADT Message a, where the a type parameter indicates the expected response payload for each data constructor.
During development of the thrunner program it turned out to be very useful to save and replay Template Haskell sessions for debugging purposes. We'd like to do this again, but now saving the message in a readable/writable format. Since we're dealing with JavaScript, JSON appears to be the obvious choice.
Our plan is to have an iserv implementation that consists of a JavaScript part that runs in node.js and a proxy process to handle communication with GHC. The proxy process converts the messages between GHC's own (binary based) serialization format and JSON. The proxy process is relatively simple, but it does reveal one downside of the new GADT based message types: A proxy is stateful. We must always know which message we have sent to convert the response back from JSON to binary.
It's not yet known whether we will implement a full bytecode interpreter. We expect it to become clear during implementation whether we can get away without one early on.
We have seen how Template Haskell and GHCi code can be run outside the GHC process for profiling or cross compiling, with both the thrunner approach in GHCJS and the newer iserv in GHC.
We at IOG DevX are working on switching to the iserv infrastructure for the upcoming GHC JavaScript backend, which involves a substantial rewrite, mainly because of differences in linking. This is a work in progress, and we intend to revisit this topic in another blog post once the final design has been implemented.
Haskell is a great language camouflaged by lackluster tooling. This situation
has led to well-known problems (who could forget Cabal hell?). A less discussed
problem is what I will call the โBlack-box syndromeโ: It is hard to
know exactly what the memory representation and runtime performance of my
Haskell programs are1. Now black-box syndrome is not only a problem,
it is also one of the nice features in the language since like all good
abstractions it elides things Iโd rather not care about, at least most of
the time. In other words, I am happy I donโt have to do manual memory
manipulation!
However, when I have my optimization hat on, I run face first into black-box syndrome. The crux of the problem is a tension between the need for observation during performance engineering and optimization, and the need to ship fast code. During development we want to be able to open up a system, see exactly how it is working, make tweaks, package it back up and test again. I want to be able to answer questions like โWhy is my executable this size?โ, โWhich code is a hot loop?โ, or โWhen does my code do direct, known or unknown function calls?โ.
In order to answer these questions we need the ability to observe every part of that system as the machine experiences it, without this ability we have no way to make progress other than test, change some code, compile and test again in an ad-hoc manner. And therein lies the problem, most Haskell tooling is insufficient to provide the observability that we would like, instead the tooling often expects and requires us to make source code changes to our program or even recompile all of our libraries and code for a profiling way. This leads to the idea and the expectation in the Haskell community that Haskell programs are hard to optimize because the barrier to entry for optimization has artificially increased.
Csaba Hruska has recently been making headway in this area with his work on the GRIN compiler and an external STG interpreter. His STG interpreter (and patched ghc) exactly solve these problems and he has demonstrated dumping the entire call graph of large Haskell projects, filter to hot loops and finding unknown function calls in these graphs. If you havenโt seen his demo be sure to watch it, it is well worth your time.
This post is the first in a new blog series. In this blog series weโre going to kick the tires on the external STG interpreter see what it can do, and what we can uncover in some popular libraries by using it. In particular, Iโm interested in running it on projects Iโve previously optimizedโsuch as ghc itself, containers, unordered-containersโusing the standard methods: ticky-ticky profiling, prof, flamegraphs, heap profiling, ghc-debug, cachegrind etc. This post, however, will be focused on setting up the patched ghc and interpreter on a NixOS system. My goals are threefold:
Give an overview of the project and project layout to lower barrier to entry for the system.
Give step by step instructions on setting up the interpreter on a nix-based system and provide a forked github repo for nix users. This should allow nix users to just git clone foo and nix-build (spoiler: it wonโt be that easy but still not hard.)
Popularize Csabaโs project! It is a refreshing take on Haskell optimization and compilation.
Making sense of the project
The external STG interpreter is part of the GRIN compiler project. We are not doing anything with the GRIN compiler (yet!) and so we are only interested in The GHC whole compiler project. The whole-compiler-project has several sub-projects that weโll be building and using directly:
external-stg: This subproject provides utilites weโll be using, in particular mkfullpak
external-stg-interpreter: This is the actual STG interpreter. The good news is that this is independent of the rest of the project and can be built just like a regular Haskell executable
ghc-wpc: This is a fork of ghc-8.10.x (Iโm not sure exactly which version it forks to be honest) which we must build in order to use the external STG interpreter. Ghc-wpc serves as a frontend for the external-stg-interpreter.
Building a working external STG interpreter
The external STG interpreter can be built like any regular haskell executable. But in order to use the interpreter we have to build ghc-wpc. ghc-wpc is necessary because it serves as a frontend for the STG interpreter. It compiles a Haskell program like normal and then dumps an enriched STG IR to file. This file is then run through a utility gen-exe (gen-exe is an executable built in the external-stg-compiler sub-project) which picks up the compilation pipeline from the STG IR and creates an executable like we would expect from a normal compilation pipeline.
The major difference between this process and the usual compiler pipeline is that ghc-wpc leaves enough compiler information on disk for the rest of the tooling to consume, namely, in files with a *.o_stgbin (this is STG IR generated at compile time), and *.o_stgapp (project linker and dependency information) extension. Thus, once we build this custom ghc version we can use it to build the source code we wish to analyze and begin our optimization work.
For the rest of this tutorial Iโll be referencing my fork of the ghc-whole-compiler-project that includes everything you need if you want to follow along, including .nix files for creating a nix-shell which will prepare a suitable environment to run the entire toolchain.
The usual way to build ghc using a nix based system is with the ghc.nix project. Ghc.nix provides a default.nix with a suitable environment to run hadrian and build ghc. For ghc-wpc weโll need some special packages, and we need our boot compiler to be exactlyghc-8.3.3. The custom ghc.nix file is included in my fork, Iโve taken the liberty to pin the nixpkgs to the right version for ghc-8.3.3. So letโs begin:
Youโll find the patched ghc.nix included (ghc.nix.wpc) and a shell.nix for a nix-shell. The shell.nix file simply references ghc.nix.wpc/default.nix with the appropriate options:
$ pwd /home/doyougnu/programming/haskell/ghc-whole-program-compiler-project $ nix-shell shell.nix # or just nix-shell trace: checking if /home/doyougnu/programming/haskell/ghc-whole-program-compiler-project/hadrian/hadrian.cabal is present: no Recommended ./configure arguments (found in$CONFIGURE_ARGS: or use the configure_ghc command): --with-gmp-includes=/nix/store/sznfxigwvrvn6ar3nz3f0652zsld9xqj-gmp-6.2.0-dev/include --with-gmp-libraries=/nix/store/447im4mh8gmw85dkrvz3facg1jsbn6c7-gmp-6.2.0/lib --with-curses-includes=/nix/store/84g84bg47xxg01ba3nv0h418v5v3969n-ncurses-6.1-20190112-dev/include --with-curses-libraries=/nix/store/xhhkr936b9q5sz88jp4l29wljbbcg39k-ncurses-6.1-20190112/lib --with-libnuma-includes=/nix/store/bfrcskjspk9a179xqqf1q9xqafq5s8d2-numactl-2.0.13/include --with-libnuma-libraries=/nix/store/bfrcskjspk9a179xqqf1q9xqafq5s8d2-numactl-2.0.13/lib --with-libdw-includes=/nix/store/sv6f05ngaarba50ybr6fdfc7cciv6nbv-elfutils-0.176/include --with-libdw-libraries=/nix/store/sv6f05ngaarba50ybr6fdfc7cciv6nbv-elfutils-0.176/lib --enable-dwarf-unwind [nix-shell:~/programming/haskell/ghc-whole-program-compiler-project]$
Now we need to cd into ghc-wpc and tweak the hadrian build.
MAJOR CONSTRAINT: You must build ghc-wpc with hadrian/build-stack, if you build in any other way youโll run into shared object errors, see this ticket for details.
So in order to build ghc-wpc with stack weโll have to tweak the stack.yaml file. You must do this since it is not included in the fork:
Quick side note: To make the formatting nicer I truncate
nix-shell:~/foo/bar/baz/ghc-whole-program-compiler-project to just ..., so
nix-shell:.../ghc-wpc is equivalent to
~/path/to/ghc-whole-compiler-project/ghc-wpc.
The changes are: (1) tell stack we are using nix, and (2) reference the shell.nix file which points to ghc.wpc.nix at the root of the project, i.e., ghc-whole-program-compiler-project/shell.nix.
Now we should be able to begin our build, return to the root of ghc-wpc and run the following:
Now that we have a working ghc-wpc we need to build the rest of the project by pointing stack to the ghc-wpc binary in ghc-wpc/_build/stage1/bin. That is, we must change the ghc-whole-program-compiler-project/stack.yaml file:
The changes are: (1) set compiler: ghc-8.11.0 (the ghc-wpc fork), (2) set skip-ghc-check: true so that stack doesnโt complain about the ghc version, (3) set nix.enable: false, confusingly if you leave this as true then stack will try to use nixpkgs to get a ghc binary, but we want it to use our local binary so we disable this even though weโll still be in our original nix-shell (4) set system-path: true to tell stack we will be using a ghc we have on our system, and finally (5) set extra-path: <path-to-ghc-wpc-binary>.
Now we can run stack and install the stg tooling:
[nix-shell:...]$ stack --stack-root `pwd`/.stack-root install Trouble loading CompilerPaths cache: UnliftIO.Exception.throwString called with: Compiler file metadata mismatch, ignoring cache Called from: throwString (src/Stack/Storage/User.hs:277:8 in stack-2.7.5-9Yv1tjrmAU3JiZWCo86ldN:Stack.Storage.User) WARNING: Ignoring tagged's bounds on template-haskell (>=2.8 && <2.17); using template-haskell-2.17.0.0. Reason: allow-newer enabled. WARNING: Ignoring aeson's bounds on template-haskell (>=2.9.0.0 &&<2.17); using template-haskell-2.17.0.0. Reason: allow-newer enabled. WARNING: Ignoring th-abstraction's bounds on template-haskell (>=2.5 && <2.17); using template-haskell-2.17.0.0. Reason: allow-newer enabled. WARNING: Ignoring unliftio-core's bounds on base (>=4.5&&<4.14); using base-4.14.0.0. Reason: allow-newer enabled. WARNING: Ignoring souffle-haskell's bounds on megaparsec (>=7.0.5 &&<8); using megaparsec-8.0.0. stack --stack-root `pwd`/.stack-root install ... # bunch of output ... ... Copied executables to /home/doyougnu/.local/bin: - dce-fullpak - ext-stg - fullpak - gen-exe - gen-exe2 - gen-obj - gen-obj2 - mkfullpak - show-ghc-stg Warning: Installation path /home/doyougnu/.local/bin not found on the PATH environment variable.
You can add ~/.local/bin to your PATH if you want, Iโll just be directly referencing these binaries as we go.
Building the external-stg-interpreter
We are almost all done, all that is left is to build the external-stg-interpreter and run a small script that links everything together into a shared object for the interpreter. So:
[nix-shell:...]$ cd external-stg-interpreter/ [nix-shell:.../external-stg-interpreter]$ stack install ... # bunch of output ... Copied executables to /home/doyougnu/.local/bin: - ext-stg - ext-stg-interpreter - fullpak - mkfullpak Warning: Installation path /home/doyougnu/.local/bin not found on the PATH environment variable.
Now we have our ext-stg-interpreter built! There are a few caveats I want to point out here. Iโve modified ghc-whole-program-compiler-project/external-stg-interpreter/stack.yaml to load the right packages and use nix:
Notice the nix: block. We could have just as easily built this using nix directly or using our shell.nix file.
Linking the external-stg-interpreter
The only task left is to link into a shared object library called
libHSbase-4.14.0.0.cbits.so. To do that we need to use the script called, c,
in ghc-whole-program-compiler-project/external-stg-interpreter/data. This
script is a bit of a hack, it generates the shared object file so that we can link the symbols requested by the C
FFI in base, but it populates those functions with our replacements, which do absolutely nothing. For example, we supply a fake garbage collect:
// in .../external-stg-interpreter/data/cbits.so-script/c-src/fake_rts.c ... voidperformGC(void){ } voidperformMajorGC(void){ } ...
This works because we won't be using the runtime system at all, we'll be using
the external STG interpreter instead, however we still need to provide these
symbols in order to link. **MAJOR NOTE: this file must be next to any
*.fullpak file youโll be running the interpreter on** or else
youโll get an undefined symbol error during linking, for example:
[nix-shell:.../external-stg-interpreter/data]$ ls cbits.so-script ghc-rts-base.fullpak minigame-strict.fullpak ### notice no .so file [nix-shell:.../external-stg-interpreter/data]$ ~/.local/bin/ext-stg-interpreter ghc-rts-base.fullpak ext-stg-interpreter: user error (dlopen: ./libHSbase-4.14.0.0.cbits.so: cannot open shared object file: No such file or directory) ## we error'd out because it was missing, also ## if you get this error then you have an old cbits.so file and need to rerun the c script [nix-shell:.../external-stg-interpreter/data]$ ~/.local/bin/ext-stg-interpreter ghc-rts-base.fullpak ext-stg-interpreter: user error (dlopen: ./libHSbase-4.14.0.0.cbits.so: undefined symbol: getProcessElapsedTime)
To link the interpreter we need to run c in the data/cbits.so-script sub-folder:
And it works, we have two new files, <foo>-call-graph-summary and <foo>-call-graph.tsv which we can analyze to inspect the behavior of our program (more on this later).
The whole setup process on a demo
That was a rather involved example, to make clear the dependencies and steps required to run this on your own code the rest of this tutorial will run the interpreter on two of Csabaโs demoโs from his skillshare talk. First letโs grab the code:
Now weโll run the first demo which is a simply fold over a list:
$ nix-shell ghc-whole-program-compiler-project/shell.nix trace: checking if /home/doyougnu/programming/haskell/hadrian/hadrian.cabal is present: no Recommended ./configure arguments (found in$CONFIGURE_ARGS: or use the configure_ghc command): --with-gmp-includes=/nix/store/sznfxigwvrvn6ar3nz3f0652zsld9xqj-gmp-6.2.0-dev/include --with-gmp-libraries=/nix/store/447im4mh8gmw85dkrvz3facg1jsbn6c7-gmp-6.2.0/lib --with-curses-includes=/nix/store/84g84bg47xxg01ba3nv0h418v5v3969n-ncurses-6.1-20190112-dev/include --with-curses-libraries=/nix/store/xhhkr936b9q5sz88jp4l29wljbbcg39k-ncurses-6.1-20190112/lib --with-libnuma-includes=/nix/store/bfrcskjspk9a179xqqf1q9xqafq5s8d2-numactl-2.0.13/include --with-libnuma-libraries=/nix/store/bfrcskjspk9a179xqqf1q9xqafq5s8d2-numactl-2.0.13/lib --with-libdw-includes=/nix/store/sv6f05ngaarba50ybr6fdfc7cciv6nbv-elfutils-0.176/include --with-libdw-libraries=/nix/store/sv6f05ngaarba50ybr6fdfc7cciv6nbv-elfutils-0.176/lib --enable-dwarf-unwind [nix-shell:~/programming/haskell]$ cd ext-stg-interpreter-presentation-demos/demo-01-tsumupto/ [nix-shell:~/programming/haskell/ext-stg-interpreter-presentation-demos/demo-01-tsumupto]$ ../../ghc-whole-program-compiler-project/ghc-wpc/_build/stage1/bin/ghc -O2 tsumupto.hs [1 of 1] Compiling Main ( tsumupto.hs, tsumupto.o ) Linking tsumupto ... $ cd ext-stg-interpreter-presentation-demos/demo-01-tsumupto $ ls tsumupto tsumupto.hi tsumupto.hs tsumupto.o tsumupto.o_ghc_stgapp tsumupto.o_modpak
Note, that we have two new files: *.o_ghc_stgapp and .o_modpak as a result of building with ghc-wpc. If you try to run this from outside the nix-shell youโll get an error about missing mkmodpak:
$ ../../ghc-whole-program-compiler-project/ghc-wpc/_build/stage1/bin/ghc -O2 tsumupto.hs [1 of 1] Compiling Main ( tsumupto.hs, tsumupto.o ) ghc: could not execute: mkmodpak
Now that we have those files we can run the interpreter, but first though we need to make a *.fullpak file from the *.o_ghc_stgapp file and create a symbolic link to libHSbase-4.14.0.0.cbits.so:
## make the fullpack file $ ~/.local/bin/mkfullpak tsumupto.o_ghc_stgapp all modules: 259 app modules: 113 app dependencies: ... # bunch of output ... main Main creating tsumupto.fullpak ## create the link to the shared object file $ ln -s ../../ghc-whole-program-compiler-project/external-stg-interpreter/data/cbits.so-script/libHSbase-4.14.0.0.cbits.so libHSbase-4.14.0.0.cbits.so ## the final directory should look like this $ ls libHSbase-4.14.0.0.cbits.so tsumupto tsumupto.fullpak tsumupto.hi tsumupto.hs tsumupto.o tsumupto.o_ghc_stgapp tsumupto.o_modpak
The first line is the output of the program and the rest are diagnostics that the interpreter outputs. More importantly we should have a tab-separated csv file and call graph file in our local directory after running the interpreter:
Which can be loaded into gephi for closer inspection of the call graph of our program. Be sure to watch the rest of the demo in Csabaโs talk for this part! For now weโll be going over using gephi and these files in our next blog post in this series, stay tuned!
foo.modpak: A zip file which contains the Core, STG, CMM, source code, and assembly for the module foo
foo.fullpak: A zip file which contains the same information as modpack but for every module of the program rather than just module foo.
foo.o_ghc_stgapp: a yaml like file that contains:
the moduleโs dependencies including package dependencies
a bunch of file paths for shared objects of the libraries
the flags the module was built with
libHSbase-4.14.0.0.cbits.so: shared object file created by ext-stg-interpreter/data/cbits.so-script.c. Required to be in the same directory as ext-stg-interpreter will be invoked.
Step-by-Step guide for running the interpreter on your codeโ
Build your project with ghc-wpc/_build/stage1/bin by directly invoking that ghc (as I did in the demo-01 project) or by pointing stack to it with system-ghc and extra-path in stack.yaml, or by passing -w <path-to-ghc-wpc-binary with cabal.
Generate the foo.fullpak file with mkfullpak foo.o_ghc_stgapp
Soft-link to libHSbase-4.14.0.0.cbits.so in the directory you will run the interpreter in. This file must be present when you run the interpreter!
Now run the interpreter on project.fullpak
Analyze foo-call-graph-summary and foo-call-graph.tsv with whatever tools make sense to you
This isnโt completely true, there is the RuntimeRep type controls
exactly this and the levity polymorphism work by Richard
Eisenberg. See this
video for examples on using these
features. We do plan to include a more thorough and real world example on using
levity polymorphism for better performance in the haskell optimization
handbook.โฉ