Skip to main content

One post tagged with "cabal"

View All Tags

· 11 min read

TL;DR: This blog post intends to sum up the why and how of cargo-cabal and hs-bindgen. If you’re looking for usage walkthroughs and code examples, check out project READMEs on GitHub!

N.B. quoted paragraphs in this article give straightforward motivation regarding some systems programming basic concepts. Feel free to skip them if you know you’re likely to be already comfortable with them ;)

Context

At IOG we maintain large Haskell codebases and we would like to interface them with some libraries written in Rust.

Rust is a system programming language known for its strong static typing guarantees, which make it similar to Haskell (even if a bit less expressive). However, unlike Haskell, Rust does not have a GC (Garbage Collector) and uses a compile-time memory management strategy. This mechanism is encoded through its type system with the concepts of “ownership” and “lifetime” of values, which can complicate the writing of programs but unlock smaller runtime costs footprint. Rust is becoming increasingly popular in the systems programming and embedded systems domains, and it is also used in areas such as cryptography, where performance and correctness are critical.

One typical use case concerns cryptographic primitives which must be very performant. The first use case that Iñigo Querejeta Azurmendi (Cardano Lead Cryptography Engineer) brought me consisted in replacing a cryptographic library used by cardano-base. Namely, replacing cryptonite, a library written in Haskell and C, with sha3, a Rust library (or "crate").

Why FFI (Foreign Function Interface)?

Solving the interoperability problem means:

  1. designing a protocol that allows two codes written with different languages and using different runtime systems to communicate ;
  2. designing tools and methods to build, to bundle, and to distribute such polyglot code bases (what developers fear most).

As our main criterion is performance, we want a solution with a minimal overhead. In particular, we want to avoid the use of any solution that relies on syscalls (like I/Os) and on costly data (de)serialization.

It leads us to exclude solutions such as IPC (Inter-Process Communication), e.g., using Google Protobuf over a Unix Domain Socket.

FFI looks like the right choice: no syscall, a foreign function call just behaves as a jump in memory and there is no extra data (de)serialization involved. The price to pay for this performance is that using the FFI requires special care to low-level calling conventions and memory management of the two involved systems. But we will come back to this topic later!

To go further: you can learn more about how to use FFI in Rust by reading the The Rustonomicon (Unsafe Rust guide) dedicated section, or the dedicated Rust FFI Omnibus tutorial. The ANSSI (French government security agency) also writes about it in Secure Rust Guidelines guide, and Rust Embedded book has an Interoperability with C chapter. On the Haskell side, you way want to take a look at GHC wiki or read the dedicated Real-World Haskell chapter!

FFI is a feature that's already offered by both rustc (Rust compiler) and ghc (Haskell compiler). It allows calling a Rust function from Haskell code (and vice versa). Both programming languages define an extern keyword that allows users to declare a function symbol that will only be resolved at linking step. N.B. mangling of the function should also be disabled, in Rust it requires decorating functions with #[no_mangle] attribute.

So, what's lacking in Haskell ecosystem? Let's take a look at what kind of integration other languages offer with Rust:

This list isn't exhaustive but give you a hint, all these projects are about generating bindings (bindgen)!

Why bindgen (bindings code generation)?

Let's sum it up by: "A good FFI is an FFI that you don't write …"

FFI are like a blind spot in your type system. Writing them manually is both frankly painful and really dangerous, as your compiler will not warn you about non-matching interfaces.

Binding generation comes to the rescue by considerably reducing the room for human errors. As a bonus, it also makes maintainers' life easier thanks to a smaller and more readable code base.

Example

Let's start with a minimal example: automatically generating bindings allowing Haskell codes to call a given Rust function. It is simply done by annotating the Rust function as follows:

use hs_bindgen::*;

#[hs_bindgen(greetings :: CString -> IO ())]
fn greetings(name: &str) {
println!("Hello, {name}!");
}

… it will be expanded to (you can try yourself with cargo expand):

use hs_bindgen::*;

fn greetings(name: &str) {
println!("Hello, {name}!");
}

#[no_mangle] // Mangling makes symbol names more difficult to predict.
// We disable it to ensure that the resulting symbol is really `__c_greetings`.
extern "C" fn __c_greetings(__0: *const core::ffi::c_char) -> () {
// `traits` module is `hs-bindgen::hs-bindgen-traits`
// n.b. do not forget to import it, e.g., with `use hs-bindgen::*`
traits::ReprC::from(greetings(traits::ReprRust::from(__0),))
}

… and will also generate the following Haskell code:

-- This file was generated by `hs-bindgen` crate and contains C FFI bindings
-- wrappers for every Rust function annotated with `#[hs_bindgen]`

{-# LANGUAGE ForeignFunctionInterface #-}

-- Why not rather using `{-# LANGUAGE CApiFFI #-}` language extension?
--
-- * Because it's GHC specific and not part of the Haskell standard:
-- https://ghc.gitlab.haskell.org/ghc/doc/users_guide/exts/ffi.html ;
--
-- * Because the capabilities it gave (by rather works on top of symbols of a C
-- header file) can't work in our case. Maybe we want a future with an
-- {-# LANGUAGE RustApiFFI #-} language extension that would enable us to
-- work on top of a `.rs` source file (or a `.rlib`, but this is unlikely as
-- this format has purposely no public specification).

{-# OPTIONS_GHC -Wno-unused-imports #-}

module Greetings (greetings) where

import Data.Int
import Data.Word
import Foreign.C.String
import Foreign.C.Types
import Foreign.Ptr

foreign import ccall unsafe "__c_greetings" greetings :: CString -> IO (())

In Rust, extern is an alias to extern "C" that stands for “use the C call convention” rather than extern "Rust" that use the Rust one, which is the default implicitly used.

Why C ABI (Application Binary Interface)?

First, GHC currently doesn't know anything about Rust calling convention, while it does about C's one: C's calling convention is the lingua franca of rustc/ghc.

Additionally, the Rust ABI (call-convention and types memory layout) isn’t stable. That means that it’s specified internally but could be broken by any rustc minor release, building a software on top of it is by definition a “hack” … If we think it’s worth it, we would have to perform our bindgen against a given rustc version (and that would be really laborious to maintain). So, do not fear the C ABI because, at least, it is stable!

To go further: I invite you to read “Rust does not have a stable ABI” by Federico Mena Quintero: a blog post discussing how much the absence of Rust stable ABI isn't a big deal in the context of GTK development. Highlighting that “How Swift Achieved Dynamic Linking Where Rust Couldn't” by Aria Beingessner isn't so far from GObject Introspection strategy!

Implementation

The previous code example highlighted that we rely on two constructs: an attribute procedural macro #[hs_bindgen], and (internally) ReprRust and ReprC traits.

Why use a Rust macro?

Binding code generation could have been achieved using an external tool, e.g., cbindgen parses Rust code (before macro expansion) and deduces C function signatures.

But instead we decided to define a custom macro (like cxx, wasm-bindgen, and PyO3 do), and so we require the user to depend on a custom crate.

The reason is that we want generated bindings to always match the source code used for their generation. By using a macro we enforce binding generation during the build process and bindings can't get out-of-sync.

To go further: cbindgen has a major limitation in that it does not understand Rust's module system or namespacing. As mentioned in its documentation, this means that if cbindgen sees that it needs the definition for MyType and there exists two things in your project with the type name MyType, it won't know what to do. Rather, using the framework implemented here for hs-bindgen would allow us to provide a better c-bindgen implementation.

ReprRust and ReprC traits respectively ensure that the arguments and return value of exposed function respect a set of given safety rules. As a recall, Rust's traits are similar to Haskell's typeclasses: it’s a way to define a contract for a type, specifying a set of methods that the type must implement. This allows for generic programming, where a function or data structure can operate on any type that implements the given set of traits.

Wrapping user types by these traits have several benefits:

  • Unsupported types are nicely reported as “the trait ReprRust<T> is not implemented for U error (that suggest other types that the trait implement to the user);

  • The user can extensively always implement these traits for arbitrary types ;

  • Provided traits implementation for std types take care of memory management ;

  • Traits improve a lot of ergonomics by implicitly and safely casting a given type to an FFI-safe one.

What's an FFI-safe type?

rustc will complain if a function prefixed by extern keyword use as arguments types that are not FFI-safe. FFI-safe types guarantee that a type has a specified layout (memory representation) by e.g. having a #[repr(C)] compiler attribute, for the given C call convention.

To go further: the memory management strategy is freeing the value is the role of the receiver (which has “ownership” of it). This means that values returned by Rust functions aren't dropped by Rust but rather should be freed on the Haskell side!

EDIT:* Thanks to community feedbacks from Merijn Verstraaten, I just released hs-bindgen v0.8.0 that now generates safe Haskell foreign imports by default! You can still generate unsafe bindings simply by prefixing a function name like #[hs_bindgen(unsafe NAME :: TYPE)] in Rust attribute macro. I invite you to read “FFI safety and GC” by Fraser Tweedale or GHC's users guide to understand the differences between Haskell unsafe/safe keywords.

DevX

cargo-cabal is a CLI tool that helps you, in one simple command, turn a Rust crate into a Haskell Cabal library!

I was heavily inspired by the developer experience that offers wasm-pack or maturin: launched in any Rust project folder. These tools help the user interactively tweak their Cargo.toml package file and generate the build files needed by Node.js npm or Python setuptools.

What cargo-cabal actually does is:

  • Ask the user to add crate-type = ["staticlib"] (or "cdylib", dynamic libraries require an extra build.rs file that is generated by cargo-cabal) to their Cargo.toml file;

  • Generate a custom X.cabal linking rustc output as extra-librairies, and either a (naersk and haskell.nix based) flake.nix or a Setup.lhs Cabal build script (to work around this issue).

To go further: stack isn't supported yet, but we could easily imagine a cargo-stack binary that just wraps a cargo-cabal --stack CLI option!

What's next?

cargo-cabal and hs-bindgen combined are less than 1000 LoC, they also support Rust #[no_std] code, and I would be glad to keep them as KISS and modular as possible. But there is still room for improvements, e.g., by adding trait implementations for more Rust std types, or possibly supporting async functions with async-ffi?!

Furthermore, It’s also nice to give a sneak peek on what others do for comparison: OCaml allows extensions to be written directly in Rust with no C stubs, this work was supported from the OCaml Software Foundation and you can find a basic example project here. It offers safe OCaml/Rust interoperability, meaning utilities to convert ADTs (Algebraic Data Types) and functions using them.

It would be delightful to get as far as having custom preludes in Haskell binding code, that offers Rust type layout in Haskell. For example, Rust slices are not an existing concept in C but could be easily represented as an FFI-safe struct.

Finally, it’s worth mentioning that there are also proposals to improve the interface between Haskell programs requiring Rust libraries, including this Cabal RFC. As a reminder, the implementation proposed here does not provide support for Haskell dependencies in Rust project yet, but there is a previous unmaintained attempt by Michael Gattozzi to bring Haskell runtime support to Rust binaries. We should also keep a close look on the Rust RFC that offers to introduce a #[repr(interop)] attribute: Experimental feature gate proposal interoperable_abi.

I would like to thank @doyougnu, @hsyl20, @govanify, and @iquerejeta for their reviews and for their helpful suggestions.

Thanks for reading, feel free to experiment with this proof of concept and to provide feedback on GitHub!