Skip to main content

· 5 min read

In this post I discuss the inlining of Integer and Natural operations in Haskell. It’s a promising performance work I’ve been conducting six months ago, which was blocked by an independent issue, but that I will likely resume soon as the issue has been fixed in the meantime.


To follow this post, you must know that Natural numbers are represented as follows in ghc-bignum:

-- | Natural number
--
-- Invariant: numbers <= WORD_MAXBOUND use the `NS` constructor
data Natural
= NS !Word#
| NB !BigNat#

Small naturals are represented with a Word# and large ones with a BigNat# (a ByteArray#).

Now consider the following simple example using Natural:

-- | Add 2 to a Word. Use Natural to avoid Word overflow
foo :: Word -> Natural
foo x = fromIntegral x + 2

There are only small naturals involved: fromIntegral x is small because x is a Word, and 2 is small. We could hope that GHC would use Word# primops to implement this and would allocate a Natural heap object for the result only. However it’s not what happens currently, even in GHC HEAD. In the following STG dump, we can see that a Natural heap object is allocated for x before calling naturalAdd (let bindings in STG reflect heap allocations):

foo1 = NS! [2##];

foo =
\r [x_sXn]
case x_sXn of {
W# x#_sXp ->
let { sat_sXq = NS! [x#_sXp]; } in naturalAdd sat_sXq foo1;
};

Let’s look at naturalAdd:

-- | Add two naturals
naturalAdd :: Natural -> Natural -> Natural
{-# NOINLINE naturalAdd #-}
naturalAdd (NS x) (NB y) = NB (bigNatAddWord# y x)
naturalAdd (NB x) (NS y) = NB (bigNatAddWord# x y)
naturalAdd (NB x) (NB y) = NB (bigNatAdd x y)
naturalAdd (NS x) (NS y) =
case addWordC# x y of
(# l,0# #) -> NS l
(# l,c #) -> NB (bigNatFromWord2# (int2Word# c) l)

We are clearly in the last case where both arguments are small. It seems beneficial to allow this function to be inlined. If we did we would get:

foo =
\r [x_s158]
case x_s158 of {
W# x#_s15a ->
case addWordC# [x#_s15a 2##] of {
(#,#) l_s15c ds_s15d ->
case ds_s15d<TagProper> of ds1_s15e {
__DEFAULT ->
case int2Word# [ds1_s15e] of sat_s15f {
__DEFAULT ->
case bigNatFromWord2# sat_s15f l_s15c of ds2_s15g {
__DEFAULT -> NB [ds2_s15g];
};
};
0# -> NS [l_s15c];
};
};
};

which produces much better assembly code, especially if there is no carry:

    addq $2,%rax       ; add 2 to a machine word
setc %bl ; test the carry.
movzbl %bl,%ebx ; it could be done
testq %rbx,%rbx ; more efficiently
jne _blk_c17c ; with "jc"
_blk_c17i:
movq $NS_con_info,-8(%r12) ; alloc NS datacon value
movq %rax,(%r12) ; with the addition result as payload.
leaq -7(%r12),%rbx ; make it the first argument
addq $8,%rbp ; and then
jmp *(%rbp) ; call continuation
...

So why aren’t we always inlining naturalAdd? We even explicitly disallow it with a NOINLINE pragma. The reason is that naturalAdd and friends are involved in constant-folding rules.

For example, consider:

bar :: Natural -> Natural
bar x = x + 2

baz = bar 0x12345678913245678912345679123456798

Currently we get the following Core:

bar1 = NS 2##

bar = \ x_aHU -> naturalAdd x_aHU bar1

baz = NB 99114423092485377935703335253042771879834

You can see that baz is a constant thanks to constant-folding.

However if we let naturalAdd inline we get:

baz
= case bigNatAddWord# 99114423092485377935703335253042771879832 2##
of ds_d11H
{ __DEFAULT ->
NB ds_d11H
}

baz is no longer a constant.

A solution would be to add constant-folding rules for BigNat# functions, such as bigNatAddWord#. This is exactly what we have started doing in #20361. Our new plan is:

  • Make BigNat# operation NOINLINE and add constant-folding rules for them
  • Make Integer/Natural operations INLINEABLE (expose their unfolding)
  • Hence rely on constant-folding for Word#/Int#/BigNat# to provide constant folding for Integer and Natural

The good consequences of this plan are:

  • Less allocations when bignum operations are inlined and some of the arguments are known to be small/big or fully known (constant).
  • Integer and Natural are less magical: you can implement your own similar types and expect the same performance without having to add new rewrite rules

There were some unforeseen difficulties with this plan though:

  1. Some of the rewrite rules we need involve unboxed values such as BigNat# and Word# and the weren’t supported. Luckily, this has been recently fixed (#19313) by removing the “app invariant” (#20554). Thanks Joachim! That’s the reason why we could resume this work now.
  2. Some unfoldings (RHSs) become bigger due to the inlining of bignum operations. Hence they may not themselves be inlined further due to inlining thresholds even if it would be beneficial. A better inlining heuristic would fix this (see #20516). It will likely be the topic of the next post.

· 3 min read

JS Backend​

In March the team focused on porting more GHCJS code to GHC head.

  • Most of us are new to GHCJS’s codebase so we are taking some time to better understand it and to better document it as code gets integrated into GHC head.
  • Development process: initially we had planned to integrate features one after the others into GHC head. However it was finally decided that features would be merged into a wip/javascript-backend branch first and then later merged into GHC head. After trying this approach we decided to work directly into another branch: wip/js-staging . Opening merge requests that can’t be tested against a branch that isn’t GHC head didn’t bring any benefit and slowed us too much.
  • Documentation: we wrote a document comparing the different approaches to target JavaScript/WebAssembly https://gitlab.haskell.org/ghc/ghc/-/wikis/javascript
  • RTS: some parts of GHCJS’s RTS are generated from Haskell code, similarly to code generated with the genapply program in the C RTS. This code has been ported to GHC head. As JS linking---especially linking with the RTS---will only be performed by GHC in the short term, we plan to make it generate this code dynamically at link time.
  • Linker: most of GHCJS’s linker code has been adapted to GHC head. Because of the lack of modularity of GHC, a lot of GHC code was duplicated into GHCJS and slightly modified. Now that both codes have diverged we need to spend some time making them converge again, probably by making the Linker code in GHC more modular.
  • Adaptation to GHC head: some work is underway to replace GHCJS’s Objectable type-class with GHC’s Binary type-class which serves the same purpose. Similarly a lot of uses of Text have been replaced with GHC’s ShortText or FastString.
  • Template Haskell: GHCJS has its own TH runner which inspired GHC’s external interpreter (“Iserv”) programs. We have been exploring options to port TH runner code as an Iserv implementation. The Iserv protocol uses GADTs to represent its messages which requires more boilerplate code to convert them into JS because we can’t automatically derive aeson instances for them.
  • Plugins: we have an MR adding support for “external static plugins” to GHC !7377. Currently it only supports configuring plugins via environment variables. We have been working on adding support for command-line flags instead.
  • Testsuite: we have fixed GHC’s build system so that it can run GHC’s testsuite when GHC is built as a cross-compiler (!7850). There is still some work to do (tracked in #21292) to somehow support tests that run compiled programs: with cross-compilers, target programs can’t be directly executed by the host architecture.

Misc​

  • Performance book: some time was spent on the infrastructure (CI) and on switching the format of the book to ReStructured Text
  • Modularity: some time was spent discussing GHC’s design and refactoring (c.f. !7442 and #20927).

· 2 min read

Changes​

  • To cross compile Haskell code for windows a wine process must be used to evaluate Template Haskell code at compile time. Some times this code needs DLLs to be present for the Template Haskell code to run. We had been maintaining a list of DLLs manually (#1400 for instance added secp256k1). A more general solution (#1405) was found that uses the pkgsHostTarget environment variable to obtain a list of all the packages dependencies. Then the DLLs from the are made available to the wine process running the Template Haskell code. This should make more libraries build correctly while reducing unnecessary dependencies.
  • The way Haskell.nix cleans source trees has changed with #1403, #1409 and #1418. When using Nix >=2.4 source in the store is now filtered in the same way it is locally. This has a couple of key advantages:
    • It makes it less likely that results on CI systems (where the source is likely to be in the store) will differ from results for local builds (where the source is in a cloned git repository).
    • Potential for reducing load on CI. Although more work may be needed, this kind of filtering combined with the experimental content addressing features of Nix reduce the required rebuilds.
  • In the past rather cryptic error messages were given when an attempt was made to use an old version of GHC on a platform Haskell.nix did not support it. In some cases Haskell.nix would even attempt to build GHC and only fail after some time. Better error messages are now given right away when an attempt is made to use a GHC version that is not supported for a particular platform #1411

Version Updates​

  • GHC 9.2.2 was added #1394

Bug fixes​

  • gitMinimal replaces git to reduce the dependency tree of cabalProject functions #1387
  • Less used of allowSubstitutes=false #1389
  • Fixed aarch64-linux builds by using correct boot compiler #1390
  • icu-i18n package mapping added to make text-icu build #1395
  • Fixes needed for newer nixpkgs versions
    • Use list for configureFlags #1396
    • The spdx json file is in a .json output #1397
    • gdk_pixbuf is now gdk-pixbuf #1398
  • Replaced deprecated NixOS binary cache settings in docs #1410
  • Enable static build of secp256k1 on musl #1413

Finally, we’d like to thank all the awesome contributors, who make haskell.nix a thriving open source project! ❤️

· 2 min read

JS backend​

This month we worked on adapting code from GHCJS to merge into GHC head. We also started discussing the implementation process publicly and especially with our colleagues at Well-Typed.

  • Ticket about adapting GHCJS’ code into a proper JS backend for GHC has been opened [#21078]. Feedback was very positive!
  • There were discussions about the process and an agreement to target GHC 9.6 release [email on ghc-devs, wiki page]
  • deriveConstants is a program used to generate some header file included in the rts package. While it is mainly useful for native targets, we had to make it support Javascript targets [!7585]
  • Javascript is going to be the first official target platform supported by GHC that has its own notion of managed heap objects. Hence we may need a new RuntimeRep to represent these values for Haskell codes interacting with JS codes via FFI. We opened !7577 into which we tried to make this new RuntimeRep non JS specific so that it could be reused for future backends targeting other managed platforms (e.g. CLR, JVM). It triggered a lot of discussions summarized in #21142.
  • GHCJS’s code generator was ported to GHC head [!7573]. In its current state, we can generate Javascript unoptimised code -- the optimiser hasn’t been ported yet -- by compiling a module with -c -fjavascript. It required many changes, not only to adapt to changes between GHC 8.10 and GHC head but also to avoid adding new package dependencies. It was also an opportunity to refactor and to document the code, which is still a work in progress.
  • GHC doesn’t use any lens library, hence to port the code generator we had to replace lenses with usual record accessors. It turned out that case alternatives in STG lacked them because they were represented with a triple. We took the opportunity to introduce a proper record type for them !7652

Plutus-apps JS demo​

  • We improved the proof of concept JavaScript library for generating Plutus transactions with a given set of constraints and lookups, exposing functionality from the plutus-ledger-constraints package. [Report]

Reporting​

· 9 min read

IOG is committed to improving Haskell developer experience, both by sponsoring the Haskell Foundation and by directly founding a team committed to this task: the Haskell DX team.

The team now tries to provide regular (monthly) updates about its work. This post is a bit longer because it covers all of 2021 which has not been covered anywhere else.

Code generation​

  • Added a new backend for AArch64 architectures, especially to support Apple’s M1. Previously AArch64 was only supported via the LLVM based backend which is much slower. [!5884]
  • Added support for Apple’s M1 calling convention. In GHC 9.2.1 it implied making lifted sized types (e.g. Word8, Int16...) use their unlifted counterparts (e.g. Word8#, Int16#...); in GHC 8.10.7 – a minor release –  a less invasive but more fragile solution was implemented [commit].
  • Fixed a very old GHC issue [#1257] by making GHCi support unboxed values [!4412]: ByteCode is now generated from STG instead of directly from Core. It allows more Haskell codes to be supported by HLS and it even allows GHC code to be loaded into GHCi [link].
  • Fixed a bug in the Cmm sinking pass that led to register corruption at runtime with the C backend. Even if we don’t use the C backend, fixing this avoided spurious errors in CI jobs using it [#19237,!5755]
  • Fixed a register clobbering issue for 64-bit comparisons generated with the 32-bit x86 NCG backend [commit].
  • Fixed generation of switches on sized literals in StgToCmm [!6211]
  • Fixed LLVM shifts [#19215,!4822]

Linker​

  • Fixed an off-by-one error in the MachO (Darwin) linker [!6041]. The fix is simple but the debugging session was epic!
  • Fix to avoid linking plugin units unconditionally with target code, which is wrong in general but even more so when GHC is used as a cross-compiler: plugins and target code aren’t for the same platform [#20218,!6496]

Cross-compilation​

  • With John Ericson (Obsidian Systems) we finally made GHC independent of its target [!6791,!6539]. It means that there is no need to rebuild GHC to make it target another platform, so it now becomes possible to add support for a --target=... command-line flag [#11470]. It also means that a cross-compiling GHC could build plugins for its host platform in addition to building code for its target platform.
  • A side-effect of the previous bullet is that primops’ types are now platform independent. Previously some of them would use Word64 on 32-bit architectures and Word on 64-bit architectures: now Word64 is used on every platform. A side-effect of this side-effect is that we had to make Word64 as efficient as Word: it now benefits from the same optimizations (constant folding #19024, etc.). On 32-bit platforms, it reduced allocations by a fair amount in some cases: e.g. -25.8% in T9203 test and -11.5% when running haddock on base library [!6167]. We hope it will benefit other 32-bit architectures such as JavaScript or WebAssembly.
  • GHC built as a cross-compiler doesn’t support compiler plugins [#14335]. We have been working on refactoring GHC to make it support two separate environments in a given compiler session – one for target code and another for the plugin/compiler code. The implementation in [!6748] conflicts quite a lot with the support of multiple home-units that was added at about the same time. GHC needs to be refactored a lot more to correctly support this approach, so instead we implemented a different approach to load plugins which is more low-level and bypasses the issue [#20964, !7377].
  • We made GHC consider the target platform instead of the host platform in guessOutputFile [!6116]
  • Use target platform instead of host platform to detect literal overflows [#17336,!4986]

GHCJS​

  • We updated GHCJS to use GHC 8.10.7 [branch]
  • We worked on making GHCJS’s codebase more suitable for integration into GHC: reducing the number of dependencies, avoiding the use of Template Haskell, reusing GHC’s build system, etc. There is now a GHCJS integrated into a GHC 8.10.7 fork [branch].
  • This experience led us to plan the realization of a JS backend into GHC head based on GHCJS. More information about this topic in our next report.
  • We worked on making GHC’s testsuite pass with GHCJS, triaging tests that legitimately fail on a JS platform from tests revealing real GHCJS issues. [LINK]

Windows​

  • We seemed to be the first to try to build GHC on Windows with the updated GNU autotools 2.70 and this release made a breaking change to the way auxiliary files (config.guess, config.sub) were handled, breaking GHC’s build (#19189). The root cause of the issue couldn’t be easily solved so we modified GHC’s build system to avoid the use of these auxiliary files, bypassing the issue. Most GHC devs won’t ever notice that something was broken to begin with when they will update their GNU toolchain on Windows. [!4768,!4987,!5065]
  • Fixed cross-compilation of GHC from Linux to Windows using Hadrian [#20657,!6945,!6958]

Numeric​

  • Fixed Natural to Float/Double conversions to align with the method used for Integer to Float/Double and added missing rewrite rules [!6004]
  • Made most bignum literals be desugared into their final form in HsToCore stage instead of CoreToStg stage to ensure that Core optimizations were applied correctly to them [#20245,!6376]
  • Some constant folding rules were missing and were added:
  • Allowed some ghc-bignum operations to inline to get better performance, while still managing to keep constant-folding working [#19641,!6677,!6696,!6306]. There is some work left to do (cf #20361) but it is blocked by #19313 which in turn is blocked by #20554 which should be fixed soon (!6865, thanks Joachim!).
  • The ubiquitous fromIntegral function used to have many associated rewrite rules to make it fast (avoiding heap allocation of a passthrough Integer when possible) that were difficult to manage due to the combinatorial number of needed rules (#19907, #20062). We found a way to remove all these rules (!5862).

Technical debt & modularity​

  • Made several component of the compiler independent of DynFlags (parsed command-line flags):
  • Made the handling of “package imports” less fragile [!6586] and refactored some code related to dependencies and recompilation avoidance [!6528,!6346].
  • Abstracted plugin related fields from HscEnv [!7175]
  • Made a home-unit optional in several places [!7013]: the home-unit should only be required when compiling code, not when loading code (e.g. when loading plugins in cross-compilers #14335).
  • Made GHC no longer expose the (wrong) selected ghc-bignum backend with ghc --info. ghc-bignum now exposes a backendName function for this purpose [#20495,!6903]
  • Moved tmpDir from Settings to DynFlags [!6297]
  • Removed use of unsafePerfomIO in getProgName [!6137]
  • Refactored warning flags handling [!5815]
  • Made assertions use normal functions instead of CPP [!5693]
  • Made the interpreter more independent of the driver [!5627]
  • Replaced ptext . sLit with text [!5625]
  • Removed broken “dynamic-by-default” setting [#16782,!5467]
  • Abstracted some components from the compiler session state (HscEnv):
    • unit-related fields into a new UnitEnvdatatype [!5425]
    • FinderCache and NameCache[!4951]
    • Loader state [!5287]
  • Removed the need for a home unit-id to initialize an external package state (EPS) [!5043]
  • Refactored -dynamic-too handling [#19264,!4905]

Performance​

  • Made divInt#, modInt# and divModInt# branchless and inlineable [#18067,#19636,!3229]
  • Fixed Integral instances for Word8/16/32 and showWord to use quotRemWordN [!5891,!5846]
  • Improved performance of occurrence analysis [#19989,!5977]
  • Fixed unnecessary pinned allocations in appendFS [!5989]
  • Added a rewrite rules for string literals:
    • Concatenation of string literals [#20174,#16373,!6259]
    • (++) . unpackCString# ⇒ unpackAppendCString# leading to a 15% reduction in compilation time on a specific example. [!6619]
    • Compute SDoc literal size at compilation time [#19266, !4901]
  • Fix for Dwarf strings generated by the NCG that were unnecessarily retained in the FastString table [!6621]
  • Worked on improving inlining heuristics by taking into account applied constructors at call sites [#20516,!6732]. More work is needed though.
  • Fixed #20857 by making the Id cache for primops used more often [!7241]
  • Replaced some avoidable uses of replicateM . length with more efficient code [!7198]. No performance gain this time but the next reader of this code won’t have to wonder if fixing it could improve performance.
  • Made exprIsCheapX inline for modest but easy perf improvements [!7183]
  • Removed an allocation in the code used to write text on a Handle (used by putStrLn, etc.) [!7160]
  • Replaced inefficient list operations with more efficient Monoid ([a],[b]) operations in the driver [!7069], leading to 1.9% reduction in compiler allocations in MultiLayerModules test.
  • Disabled some callstack allocations in non-debug builds [!6252]
  • Made file copy in GHC more efficient [!5801]
  • Miscellaneous pretty-printer enhancements [!5226]
  • Type tidying perf improvements with strictness [#14738,!4892]

RTS​

  • Fixed issues related to the RTS’s ticker
    • Fixed some races [#18033,#20132,!6201]
    • Made the RTS open the file descriptor for its timer (timerfd) on Linux synchronously to avoid weird interactions with Haskell code manipulating file descriptors [#20618,!6902].
  • Moved GHC’s global variables used to manage Uniques into the RTS to fix plugin issues [#19940,!5900]

Build system / CI​

  • Fixed Hadrian output to display warnings and errors after the multi screen long command lines [#20490,!6690]
  • Avoided the installation of a global platformConstants file; made GHC load constants from the RTS unit instead, allowing it to be reinstalled with different constants [!5427]
  • Made deriveConstants output its file atomically [#19684,!5520]
  • Made compression with xz faster on CI [!5066]
  • Don’t build extra object with -no-hs-main [#18938,!4974]
  • Add hi-boot dependencies with ghc -M [#14482,!4876]

Misc​

  • Stack: fixed interface reading in hi-file-parser to support GHC 8.10 and 9.0 [PR, Stack#5134]
  • Enhanced pretty-printing of coercions in Core dumps [!4856]

· 2 min read

Documentation​

Changes​

  • Support for external Hackage repositories was improved by #1370. We can now use an extra package repository just by adding a repository block to the cabal.project file. This makes it easy to make use of an extra hackage databases such as hackage.head and hackage-overlay-ghcjs. A sha256 for the repository it can be added as a comment in the repository block or by including it in the sha256map argument.

Version Updates​

  • nix-tools was updated to use the Cabal 3.6.2 and hnix 0.16 nix-tools#113
  • Nixpkgs pins were bumped #1371
  • Update booting on aarch64 linux to ghc 8.8.4 1325 and 1374

Bug fixes​

  • Allow linking pcre statically with musl #1363
  • Add gpiod to system nixpkgs map #1359
  • Add poppler-cpp to png-config Nixpkgs map #1373
  • Use the same logic that cabal-install uses for determining the path of a packages .tar.gz in a repository nix-tools#114
  • Fix libnuma dependency in rts.conf 1342
  • Fix when "materialized" dir is deep #1376
  • Prefer local building for git-ls-files #1378 and #1381
  • Fix stack cache generator sha256 is a string not a lambda #1383
  • Only pass --index-state to cabal when asked #1384
  • Pass enableDWARF to makeConfigFiles to fix -g3 support in nix-shell #1385

Finally, we’d like to thank all the awesome contributors, who make haskell.nix a thriving open source project! ❤️

· One min read

Hopefully 2022 should be the year GHC will get a JavaScript backend without relying on GHCJS. This month the team has been busy planning the work that needs to be done to get there!

Cross-compilation​

  • GHCJS has been updated to reduce the gap with GHC 8.10.7 codebase to the point that GHC’s build system is used to build GHCJS
  • Internal work planning for the integration of GHCJS into GHC
  • A different approach to load plugins into cross-compilers has been implemented [#20964, !7377]
  • GHCJS has been exercised to showcase compilation of some Plutus applications

Modularity​

  • A few “subsystems” of GHC have been made more modular and reusable by making them independent of the command-line flags (DynFlags) [#17957, !7158, !7199, !7325]. This work resulted in a 10% reduction in call sites to DynFlags and has now removed all references to DynFlags up to the CoreToStg pass, which is almost the entire backend of GHC.

Performance​

  • Jeffrey wrote a new HF proposal about writing a Haskell Optimization handbook and has started working on it

· 2 min read

January 2022​

This month we merged some very significant improvements to the support for compiling for Android and iOS based AArch64 devices.  When the build system is also AArch64 template haskell can often be run locally.  This will make targeting mobile devices from AArch64 builders much easier.

A long running branch containing bug fixes for cross compilation to JavaScript with GHCJS was merged.  One nice feature included is better support for adding bindings to C code compiled with emscripten.  In some cases it can be as easy as adding a single JavaScript file to the package with wrappers for the C functions.

Changes​

  • Much improved AArch64 support including Template Haskell (#1316)
  • Improved GHCJS and support for calling C code compiled with emscripten (#1311)
  • The environment variables LANG and LOCALE_ARCHIVE are no longer set in shells allowing the users prefered settings to persist (#1341).
  • source-repo-override argument added for cabal projects to allow the location of source-repository-package packages to be replaced (#1354)

Version Updates​

  • GHC 9.0.2 was added to the available GHC versions (#1338)
  • The nixpkgs pins for 21.05, 21.11 and unstable were all updated (#1334).
  • Remaining uses of cabal 3.4 were updated to 3.6.2 (#1328)

Bug fixes​

  • Dwarf build of ghc 9.2.1 now skipped on hydra to work around 4GB hydra limit (#1333)
  • Removed use of propagatedBuildInputs in ghc derivation (#1318).
  • Caching of the check-hydra CI script was fixed (#1340)