Haskell exceptionfree-readfile library released

Categories
Haskell logo

tl;dr - I finally got around to releasing exceptionfree-readfile, a small haskell package for reading files without logging exceptions when running a binary built for profiling (+RTS -xc -RTS).

Why write exceptionfree-readfile?

I decided to create exceptionfree-readfile while trying to clean up the output of my +RTS -xc -RTS output (I also wrote about it on reddit). For those unfamiliar, when haskell programs are built with profiling enabled, the runtime system can be triggered with special options (so +RTSstarts the RunTime System arguments, and -RTS ends them), and the -xc flag is what gives you stack traces in the event of an error. This is all well and good, but the problem is that when you have -xc enabled, base’s readFile logs an exception even on successful file reads. While there were other solutions (like modifying ghc to add some sort filtering option for -xc), the quickest way to sctach my won itch was to just write a version of base’s readFile that didn’t use exceptions as control flow.

As I noted in the reddit comment, I was interested in both writing an recusrive accumulator version and one with an FFI call to Rust (since I know in rust it’s very easy to read files without throwing any sort of exception). Naturally, to test/benchmark the difference between the new version I was going to write, I implemented three versionsd:

  • A baseline version (which just calls to readFile from base)
  • A custom version (which uses recursion + accumulator w/ a top level catch)
  • An “oxidized” Rust FFI-powered version

This post will be mostly about the Rust FFI version, since it was a bit more interesting to work out than the Haskell version (and I took the most notes about it).

Rewriting readFile in Haskell

The code for readFile is pretty daunting, and while at first I started by trying to just copy it all out and modify it bit by bit (with the help of the compiler), this approach didn’t work for me. The exceptions-as-control-flow pattern is really deeply embedded in the code, and systematically going thorugh and unwinding everything into a recursive loop pattern quickly went sideways. After a while thinking I was going in the right direction, I started running into a ~ (Either IOException a) errors and some other issues. A few issues I ran into:

  • A bunch of less-than-optimally named functions that basically check invariants
  • There’s a schism between (Handler__ -> IO a) and (Handler__ -> IO (Handler__, a)), seems like a little difference but it was a PITA to keep these straight

After a while I actually just decided to start from scratch – stripping down to the basics: figuring out how to read a single line the same way base was, then doing the same thing repeatedly.

The resulting code is [the code in the library](), and while it works, it’s not particularly pretty. I do want to note that while I was figuring this out I was despairing quite a bit about how easy this was to do in Rust (given that rust doesn’t really have “exceptions”), which brings me to…

Writing the “oxidized” readFile (final code)

Well the rust code was pretty easy to write – std::fs has a function for reading files called read_to_string, so while the initial code was easy to write, but it was a bit more challenging to figure out how to safely work with the FFI interfaces in Rust in a way that Haskell would be able to ingest (the Rust book’s section on FFI was very helpful):

#[repr(C)]
pub struct ContentsOrError {
    err: *const c_char,
    contents: *const c_char,
}

// TODO: return length? safer reading on haskell side

#[no_mangle]
pub unsafe extern "C" fn read_file(path: *const c_char, debug: bool) -> *mut ContentsOrError {
    // Convert the cstring to a rust string
    if debug {
        println!("[libhaskell_exceptionfree_readfile] Attempting to read file from [{:?}]", CStr::from_ptr(path).to_str());
    }

    let path = match CStr::from_ptr(path).to_str() {
        Ok(v) => v,
        Err(e) => return Box::into_raw(
            Box::new(
                ContentsOrError {
                    err: CString::new(format!("Invalid path: {}", e))
                        .expect("Invalid path (printing further information failed)")
                        .into_raw(),
                    contents: null(),
                }
            )
        )
    };

    if debug {
        println!("[libhaskell_exceptionfree_readfile] Successfully converted path, reading contents...");
    }

    // Read the file into a string
    let contents = match std::fs::read_to_string(path) {
        Ok(v) => v,
        Err(e) => return Box::into_raw(
            Box::new(
                ContentsOrError {
                    err: CString::new(format!("Failed to read file: {}", e))
                        .expect("read_to_string failed (printing futher information failed")
                        .into_raw(),
                    contents: null(),
                }
            )
        )
    };

    if debug {
        println!("[libhaskell_exceptionfree_readfile] Building CString from contents...");
    }

    // Build a CString from the contents
    let cstr_contents = match CString::new(contents) {
        Ok(v) => v,
        Err(e) => return Box::into_raw(
            Box::new(
                ContentsOrError {
                    err: CString::new(format!("Failed to convert string: {}", e))
                        .expect("building comment string fialed (printing further information failed)")
                        .into_raw(),
                    contents: null(),
                }
            )
        )
    };

    if debug {
        println!("[libhaskell_exceptionfree_readfile] Returning struct with contents or error...");
        println!("[libhaskell_exceptionfree_readfile] address [{:p}]", cstr_contents.as_ptr())
    }

    return Box::into_raw(
        Box::new(
            ContentsOrError { err: null(), contents: cstr_contents.into_raw() }
        )
    )
}

Figuring out Haskell FFI

It’s been a little while since I wrote some rust so getting back into the groove of things was a little jarring – I wrote some code then think to myself “was rust this ugly/awkward?”, then look at the Rust book and some old code of mine to get the feel for the language idioms again. More importantly than getting famliiar with rust, I needed to get familiar with Rust’s FFI interface. It wasn’t eas easy as I expected to find information on the conversion between C structs and the Haskell FFI (I did not look at the language reference, so this is probably self-inflicted pain), but I found the following resources useful:

There are also some packages that let you derive the Storable instance:

At this point, all I was trying to pass was one small struct – the contents of the file along with a possible error value, so trying to make Storable instance seemed a little heavy of an approach. Surely it wasn’t so hard to write my own Storage instance?

-- | Type that comes back from the rust side
data ContentsOrError = ContentsOrError { err      :: CString
                                       , contents :: CString
                                       } deriving (Eq, Show)

-- | Pointer to a ContentsOrError struct
type ContentsOrErrorPtr = Ptr ContentsOrError

-- | Size of a cstring
cstrSize :: Int
cstrSize = sizeOf (undefined :: CString)

instance Storable ContentsOrError where
    -- | ContentsOrError contains two nullable pointers (to character sequences)
    sizeOf _ = 2 * cstrSize
    -- | Aligned on the size of one of the nullable pointers
    alignment _ = sizeOf cstrSize
    -- | Peek bytes for the nullable pointers one by one and re-form
    peek ptr = peekByteOff ptr 0
               >>= \err -> peekByteOff ptr cstrSize
               >>= \contents -> pure (ContentsOrError err contents)
    -- | Poke bytes for the nullable pointers into memory
    poke ptr (ContentsOrError err contents) = pokeByteOff ptr cstrSize err

After figuring out how I was going to store the results, it was time to actually pull in the C call that was going to produce those results:

foreign import ccall "read_file" read_file :: CString -> Bool -> IO ContentsOrErrorPtr

And then, time to write the actual function that would call this and return the results:

readFailedErr :: String -> IOError
readFailedErr str = IOError Nothing IllegalOperation "read_file" str Nothing Nothing

unexpectedFailureError :: IOError
unexpectedFailureError = IOError Nothing IllegalOperation "read_file" "unexpected failure" Nothing Nothing

ffiResultParseFailureError :: IOError
ffiResultParseFailureError = IOError Nothing IllegalOperation "read_file" "FFI result parse (`peek`) failure" Nothing Nothing

readFile :: FilePath -> IO (Either IOError String)
readFile = _readFile False

readFileDebug :: FilePath -> IO (Either IOError String)
readFileDebug = _readFile True

_readFile :: DebugState -> FilePath -> IO (Either IOError String)
_readFile debugState path = newCString path
                >>= \pathCStr -> read_file pathCStr debugState
                >>= peek
                >>= \case
                        (ContentsOrError err contents)
                          -- | Unsuccessful read without error - unexpected failure
                          | err == nullPtr && contents == nullPtr -> pure $ Left unexpectedFailureError
                          -- | Unsuccessful read with error - contents missing but err was not null
                          | contents == nullPtr                   -> peekCString err
                                                                     >>= pure . Left . readFailedErr
                          -- | Successful read with no error - was null and contents weren't
                          | err == nullPtr                        -> peekCString contents
                                                                     >>= pure . Right
                        -- | If we don't get the parse we expected then
                        _ -> pure $ Left ffiResultParseFailureError

Linking the rust code

Now that the rust & haskell sides of the FFI barrier are complete, the next thing was to figure out how to build my stack-powered Haskell project with the appropriate use of the generated rust library. This is another point where rust is a bit ahead of haskell – building the libraries (whether statically or dynamically) was pretty easy, but figuring out how to get stack to take a dynamic or static library for use in compilation was not so straight forward.

At the outset the build was immediately failing (as it should) since it couldn’t find the reference to read_file introduced by the rust shared lib. Figuring out how to properly link the generated rust shared library took me some digging:

It was really difficult and painful to do all of this, which is to be expected (somewhat) since I haven’t read the GHC users guide from cover to cover, and this isn’t something that’s normally covered in any of the haskell books I’ve read. I’d say haskell’s target audience is a bit higher “up the stack” compared to rust’s and maybe that’s what’s contributing to the ease of handling these low-level controls with rust tooling. Eventually I got it working with the following command:

$ stack build --profile --local-bin-path target --copy-bins --ghc-options="-optl -l/home/path/to/haskell-exceptionfree-readfile/rust/target/release/libhaskell_exceptionfree_readfile.so"

So it wasn’t all that hard – just hard for me to figure out, I needed to add the -optl GHC option to tell GHC where the rust shared library (.so = shared, .a = static) so it could load it while building. In true it-never-work-the-first-time fashion, the first run was a segfault (you won’t see this since the code above is the final, working version):

$ make test-example-file METHOD=oxidized
target/exceptionfree-readfile -m oxidized test/fixtures/example.txt
Reading file [test/fixtures/example.txt]
make: *** [Makefile:102: test-example-file] Segmentation fault (core dumped)

Weirdly enough, I was very happy to see this segfault – at the very least the foreign code was attempted, even though it was wrong at the time.

Debugging the linked code

NOTE you won’t need to do any of this, since the versions of the code reproduced above are the working versions.

After seeing the segfault mentioned above, I added some good ‘ol println debugging, and the output was illuminating – the rust code wasn’t the problem like I had original assumed, it was the reading of the returned structures:

$ make test-example-file METHOD=oxidized
target/exceptionfree-readfile -m oxidized test/fixtures/example.txt
Reading file [test/fixtures/example.txt]
[libhaskell_exceptionfree_readfile] Attempting to read file from [Ok("test/fixtures/example.txt")]
[libhaskell_exceptionfree_readfile] Successfully converted path, reading contents...
[libhaskell_exceptionfree_readfile] Building CString from contents...
[libhaskell_exceptionfree_readfile] Returning struct with contents or error...
make: *** [Makefile:108: test-example-file] Segmentation fault (core dumped)

It’s maybe a bit hard to see here, but my implementation of peek was the issue here. Taking this time to think I also realized that the rust function was returning the actual object, but the haskell code is expecting a pointer to the actual object – the rust code needed to return the right thing:

$ make test-example-file METHOD=oxidized
target/exceptionfree-readfile -m oxidized test/fixtures/example.txt
Reading file [test/fixtures/example.txt]
[libhaskell_exceptionfree_readfile] Attempting to read file from [Ok("test/fixtures/example.txt")]
[libhaskell_exceptionfree_readfile] Successfully converted path, reading contents...
[libhaskell_exceptionfree_readfile] Building CString from contents...
[libhaskell_exceptionfree_readfile] Returning struct with contents or error...
exceptionfree-readfile: src/System/IO/ExceptionFree/Internal/Oxidized.hs:(63,21)-(72,87): Non-exhaustive patterns in case

make: *** [Makefile:109: test-example-file] Error 1

Now we’re getting somewhere – the haskell side is erroring, but at the very least it seems to be properly reading the data passed back by rust.

Debugging the Haskell interface code

Let’s do some more debugging on the the haskell side of things:

readFile :: FilePath -> IO (Either IOError String)
readFile path = newCString path
                >>= \pathCStr -> read_file pathCStr True
                >>= peek
                >>= \coe@(ContentsOrError err contents) -> putStrLn (show coe)
                >> pure (Left ffiResultParseFailureError)
                    -- -- | Unsuccessful read without error - unexpected failure
                    -- | err == nullPtr && contents == nullPtr -> pure $ Left unexpectedFailureError
                    -- -- | Unsuccessful read with error - contents missing but err was not null
                    -- | contents == nullPtr                   -> peekCAString err
                    --                                            >>= pure . Left . readFailedErr
                    -- -- | Successful read with no error - was null and contents weren't
                    -- | err == nullPtr                        -> peekCAString contents
                    --                                                    >>= pure . Right
                    --     -- -- | Peek failed to parse
                    --     -- _ -> pure $ Left ffiResultParseFailureError

This outputs:

$ make test-example-file METHOD=oxidized
target/exceptionfree-readfile -m oxidized test/fixtures/example.txt
Reading file [test/fixtures/example.txt]
[libhaskell_exceptionfree_readfile] Attempting to read file from [Ok("test/fixtures/example.txt")]
[libhaskell_exceptionfree_readfile] Successfully converted path, reading contents...
[libhaskell_exceptionfree_readfile] Building CString from contents...
[libhaskell_exceptionfree_readfile] Returning struct with contents or error...
ContentsOrError {err = 0x0000000000fd7800, contents = 0x0000000000fd7800}
Error: read_file: illegal operation (FFI result parse (`peek`) failure)

Well it looks like I am getting the right object back – I’m getting two memory addresses it looks like… But they’re the same one! But upon further inspection I was misusing peekBytesOff I think I’m misusing peek bytes off – I needed to give it an increasing offset. peek needed to look like this:

peek ptr = peekByteOff ptr 0
   >>= \err -> peekByteOff ptr cstrSize
   >>= \contents -> pure (ContentsOrError err contents)

Once I made those changes, here’s what I get:

        $ make test-example-file METHOD=oxidized
        target/exceptionfree-readfile -m oxidized test/fixtures/example.txt
        Reading file [test/fixtures/example.txt]
        [libhaskell_exceptionfree_readfile] Attempting to read file from [Ok("test/fixtures/example.txt")]
        [libhaskell_exceptionfree_readfile] Successfully converted path, reading contents...
        [libhaskell_exceptionfree_readfile] Building CString from contents...
        [libhaskell_exceptionfree_readfile] Returning struct with contents or error...
        ContentsOrError {err = 0x0000000000000000, contents = 0x00000000016f6800}
        Error: read_file: illegal operation (FFI result parse (`peek`) failure)

Well that’s much better NULL pointer for the error, and a raw pointer to some data on the heap for the string contents! Let’s modify the code to do a read…

readFile :: FilePath -> IO (Either IOError String)
readFile path = newCString path
                >>= \pathCStr -> read_file pathCStr True
                >>= peek
                >>= \coe@(ContentsOrError err contents) -> putStrLn (show coe)
                >> peekCAString contents
                >>= \s -> putStrLn ("contents: [" <> s <> "]")
                >> pure (Left ffiResultParseFailureError)

And this outputs:

$ make test-example-file METHOD=oxidized
target/exceptionfree-readfile -m oxidized test/fixtures/example.txt
Reading file [test/fixtures/example.txt]
[libhaskell_exceptionfree_readfile] Attempting to read file from [Ok("test/fixtures/example.txt")]
[libhaskell_exceptionfree_readfile] Successfully converted path, reading contents...
[libhaskell_exceptionfree_readfile] Building CString from contents...
[libhaskell_exceptionfree_readfile] Returning struct with contents or error...
ContentsOrError {err = 0x0000000000000000, contents = 0x00000000015cd800}
contents: []
Error: read_file: illegal operation (FFI result parse (`peek`) failure)

Close! but no dice – contents is empty for some reason (and the contents at that address shouldn’t be empty!) – after adding a bit more debugging code on the haskelll side:

.... other output
[libhaskell_exceptionfree_readfile] address [0x17e5800]
ContentsOrError {err = 0x0000000000000000, contents = 0x00000000017e5800}
contents: []
Error: read_file: illegal operation (FFI result parse (`peek`) failure)

With this I know that at the very least the same result is being used on the rust side and the Haskell side – that’s a good sign, but doesn’t quite explain why Haskell can’t read the string that should be at that address…

Debug: Taking another look at the Rust code

At this point I figured that maybe the boxing was not necessary, but what I should be returning is a *mut pointer to the thing I want – this is how Rust knows it’s going to lose control of the memory at that address (essentially giving it to haskell). Unfortunately, haskell does not yet support easy struct passing, but lucky for me an angel appeared in the form of a blog post from jakegoulding.com.

Turns out the issue was on the rust side, I needed to use into_raw:

I definitely hadn’t read enough documentation, as usual. Once the code properly used into_raw, the oxidized version was done!

This was a bit frustrating to find, but in order of discovery:

This crumb trail leads you to the fact that Haskell can’t link static libraries for my usecase (so even if I build one from the rust side, I can’t get it into my haskell library) – this means to actually deploy the oxidized solution in the real library I’d have to require that everyone install library beforehand, which is silly.

Everyday I get closer to just using rust for everything – but for now I just can’t get enough of Haskell’s sweet sweet type system.

So how fast is it?

Well the Haskell code turns out to be quite a bit slower than the code in base – here are a clip from a run of the benchmarks:

benchmarking readFile/Original-KB100
time                 995.0 μs   (967.4 μs .. 1.046 ms)
                     0.976 R²   (0.955 R² .. 0.992 R²)
mean                 1.157 ms   (1.104 ms .. 1.257 ms)
std dev              298.9 μs   (183.4 μs .. 463.8 μs)
variance introduced by outliers: 95% (severely inflated)

benchmarking readFile/ExceptionFree-KB100
time                 9.193 ms   (9.116 ms .. 9.298 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 9.102 ms   (9.030 ms .. 9.190 ms)
std dev              241.3 μs   (167.6 μs .. 324.5 μs)

benchmarking readFile/Oxidized-KB100
time                 6.229 ms   (6.171 ms .. 6.304 ms)
                     0.998 R²   (0.997 R² .. 0.999 R²)
mean                 5.856 ms   (5.759 ms .. 5.967 ms)
std dev              312.5 μs   (249.2 μs .. 355.2 μs)
variance introduced by outliers: 30% (moderately inflated)

As you can see, unfortunately exceptionfree-readfile is about 9x worse than base’s readFile for a 100KB file. This is pretty bad performance wise, but since reading files is not in the critical path for my uses (mostly done @ startup time), it’s good enough for me ™.

Wrap-up

Hopefully this rambling rant was a decent introduction to the things I went through to write exceptionfree-readfile (which wasn’t even the best way to solve the problem). Hopefully this also serves as a decent example for how to do Haskell <-> Rust FFI for some people as well.

Thanks for reading!