Minimal effort build improvements and a GHC 8.2.2 upgrade

How attempting to speed up my CI builds led to upgrading to GHC 8.2.2 (and eventually speeding up my CI builds)


19 minute read

Haskell logo

tl;dr - On a Haskell project I’m working on I started with >~20 minute cold-cache builds in the worst case in my Gitlab-powered CI environment then found some small ways to improve. Very recently I decided I wasn’t satisfied with ~10 / 15 minute builds and did the laziest, least-effort steps I could find to get to <10 minute cold-cache builds (~5min best case). Check out the [TLDR][tldr] section to see the Dockerfiles and steps I took summarized.

As mentioned in the TLDR, I’ve worked on this problem before, and while I was satisfied then with the reduction in CI runtime at the time, lately I’ve been doing more work on the relevant Haskell project (there are a lot of posts in the pipeline about the additions I made), and I quickly became unsatisfied with having to wait for ~16 minutes JUST to have my pipelines run (never talk of actually docker build + publish or the extra steps that get run when a version gets pushed.

Despite the previous work, CI runs during a new version tag/release include additional steps (docker building, generated SDK publish, etc) and took close to 20 minutes to complete in the worst (all caches cold) cases. I thought it was ridiculous that I had to wait that long. Haskell is a great, and although it’s not known for it’s build times, my project is relatively simple and ~20 minutes is excessive. There are obviously differences between my local machine (4C/8T desktop with ample RAM) and the CI execution enviornment (likely a process single-core-ish existence behind the infrastructure, but still, 20 minutes seemed to be a long time to wait.

While I wanted to wade in and truly “solve” (or contribute to help solve) this problem, I also didn’t want to try and cut my teeth on becoming a Stack/GHC contributor just yet (especially in such a complex area), so I resolved to do this the laziest way possible. I should note that’s also just about my default mode – low hanging fruit is my favorite kind of fruit.

Along the way to improving the CI build times (again), I also went about upgrading from Stack’s LTS 8.15 resolver to the LTS 11.6 resolver, which also meant a change from GHC 8.0.2 to GHC 8.2.2, which introduced some complications along the way.

Since one detour often begets another, I also felt I needed to spend some time dealing with generated docker image sizes produced by the then-newly-working setup. I posted about the resulting image sizes issue on r/haskell so if you don’t want to sit through this blog post feel free to check there for some of the great advice from the community. The following will be a rehash of my notes during the time, and though it will be sometimes incomplete (like avenues that I mention but just don’t go down), hopefully there’s something valuable in here.

Step 0: RTFM (in this case more like figuring out the landscape)

Before I could dive in to trying to make the build process faster for Haskell in the CI environment, I took a quick inventory and did some searching on avenues I hadn’t yet fully explored:

With some light reading of these posts and bunch of general leads to follow, I got to the first task, reducing the build time.

Step 1: Reducing build time with stack config

Note that stack build --fast is the best way to make build time dramatically faster locally, but it’s not necessarily appropriate for the CI build environment – I need to do a build that doesn’t just typecheck when I’m in CI (I believe that’s the crux of the optimization there), but I need to run tests in the CI environment as well (and sometimes even build the container later). However, since I’m using docker, I can test to some extent what the CI runner would be doing since it’s docker-image based on my own machine.

One of my first moves here was to start trying the RTS options that one can pass to ghc through stack, by using the ghc-options property in stack.yaml.

After a tiny bit of reading (mostly of the stuff from the RTFM section), I patted myself on the back thinking I was done:

# Faster builds
  "$targets": -j -rtsopts -with-rtsopts "-A128m" -with-rtsopts "-n2m"

There’s two real changes here, the A128m flag, along with the n2m flag. I’m not actually really sure wtf n2m does but A128m increases the pool of memory used for garbage collection for ghc during compilation. The GHC documentation for -A was kind of helpful but kind of hard to parse/understand. The GHC flag cheatsheet was the real MVP.

I started to do some very inexact testing and here’s what I found:

Iteration Build description real user sys Δ%
1 (baseline) Local build w/ only -A128m 9m23.196s 0m0.519s 0m0.141s 0%
2 Local build w/ A128m & -n2m 9m14.405s 0m0.531s 0m0.146s -~1.6%

I would take these “results” with a seriously huge grain of salt because most of the time I ran the build commands only once, but here’s what my cursory testing lead me to suspect:

  • Local build baseline was 9m locally (this is on a 4C/8T machine with lots of ram.. that’s kinda long for a simple app)
  • -n2m didn’t really seem to save much time (~10s?), could have been just noise
  • I actually got an error message about -rtsopts and -with-rtsopts having no effect with -shared

That last point is a bit of a sticking point… It was very likely that all the “testing” I’d done up until that very point was moot, but it did lead me towards a good lead, namely…

Step 1.1: Not using LLVM (-fllvm) for builds

This is probably a step backwards as far as the future of Haskell/GHC goes, since the LLVM is a really promising project/group of projects and looks to be the future, but in my short-sighted quest for faster builds that don’t hurt the end binary, I was willing to allow it.

llvm is not the default backend (machine code generator) for GHC, the assembly option, called fasm actually is. Another thing I noticed that I thought I might change, is that the command inside the Dockerfile was using --force-dirty, and probably didn’t need to.

Iteration Build description real user sys Δ%
1 (baseline) Build w/ -fllvm 9m23.196s 0m0.519s 0m0.141s 0%
2 Build w/ out -fllvm (fasm is the default) 5m59.114s 0m0.510s 0m0.172s -~36.2%
3 Build w/ out & without --force-dirty 5m37.582s 0m0.462s 0m0.210s -~6.1%

Again, I didn’t really rigorously test these numbers (they’re not averages, only a single sample), but here’s what I hastily gleaned:

  • There’s a HUGE savings to turning off LLVM, definitely statistically significant
  • The jump from iteration 2 -> 3 wans’t huge but since --force-dirty is almost certainly not necessary I’m not going to think about it too much

For better or for worse, I’ve done close to nothing to understand GHC yet achieved a pretty nice speedup so I was ready to move on. A better, more thorough me in another timeline might have taken more time to read the GHC handbook and get more familiar with the options, but for now I was happy with the gains there.

Step 2: Migrating from LTS 8.15 to 11.6

Coming off the high from ~30% lazily-achieved compile speed gain, I figured that I might as well try and also update to a newer version of ghc, and see if that reaped even more benefits (and faster code). While I didn’t actually test the code improvements, the change to error messages (color, some more structuring AFAIK) was a really nice benefit (spurred on by this GHC ticket, with the [overall effort being possibly represented by another ticket, I believe).

One of the main value propositions of stack is that it helps you avoid Cabal hell (which I’ve actually personally been lucky enough to never experience), and makes upgrades way easier by providing resolvers which are like big bundles of packages that are known to work together.

Updating the resolver was all it took (inside stack.yaml):

resolver: lts-11.6

I also went ahead and removed some dependencies that were being downloaded but are included in the newer version of the resolver.

Upgrading did require some code changes, however, due to changes in the underlying packages (upgrading the resolver meant upgrading some of these packages).

Step 2.1: Fixing ToSchema implementations for Data.Swagger

The swagger2 package is an excellent package that I use to generate/deal with generating Swagger 2 documentation for my project. Swagger 3 (now known as the OpenAPI spec AFAIK) is not quite in mass adoption just yet, but Swagger 2 libraries are still perfectly good (and I don’t want to write my own Swagger 3 YAML just yet).

The swagger2 didn’t automatically handle one of my ADTs, Order very well however. Here’s what the type looked like:

data Order = NoOrdering
           | ASC DT.Text
           | DESC DT.Text deriving (Generic, Eq, Read)

instance ToSchema Order

The problem with automatically deriving the instance was due to mixing of the NoOrdering value constructor and the ASC/DESC value constructors IIRC, and swagger2 was actually very helpful, pointing you at how to work around it, namely mentioning to use genericDeclareNamedSchemaUnrestricted to generate the instance manually. For me the code that got everything to compile looked roughly like this:

import           Data.Swagger (ToParamSchema, ToSchema(..)), genericDeclareNamedSchemaUnrestricted, defaultSchemaOptions)

data Order = NoOrdering
           | ASC DT.Text
           | DESC DT.Text deriving (Generic, Eq, Read)

instance ToSchema Order where
    declareNamedSchema = genericDeclareNamedSchemaUnrestricted defaultSchemaOptions

I actually decided to do without a NoOrdering value constructor, deciding a Nothing :: Maybe Order was good enough to signal a lack of ordering preference. That allowed me to remove the new imports and go back to the automatically derived ToSchema instance.

Step 2.2: Fixing some API drift with ginger

Way back when I did a lot of Python (mostly Flask, I’m a huge Armin Ronacher fan), I really enjoyed using Jinja which was a popular choice in the ecosystem. Thanks to a brave contributor named Tobias Dammers, a Jinja2-like templating package named ginger is available to the Haskell community. I wrote briefly about switching to it in an update to a previous post.

The version of ginger I was using actually had some different types involved in making it all work, so I had some changes to make. This was very specific to how I was using Ginger (I wasn’t really using the “easy” API, which didn’t seem to change much) – I had build a bunch of type aliases on top of the base ginger types to make things easier for myself, for example:

type HtmlContext = GingerContext (Writer Html) Html
type TextContext = GingerContext (Writer DT.Text) DT.Text

In the older version of the code, GingerContext was abstracted over 2 type variables (i.e. GingerContext a b), but in the newer version it’s abstracted over 3 (i.e. GingerContext a b c). Unfortunately it took me quite a while to wrap my head around what the type variables meant (they weren’t named that), but I eventually got it (and a somewhat deeper understanding of haskell library API design and ginger along the way), and the code turned to something like this:

type HtmlContext = GingerContext SourcePos (Writer Html) Html
type TextContext = GingerContext SourcePos (Writer DT.Text) DT.Text

The change isn’t earth-shattering, but it did take me a long time to figure out what was happening (long enough to actually give up and go to sleep to try again the next day). With a fresh head, the key was figuring out there was a Template a used to produce/take types like Template Html or Template DT.Text but now it’s all Template SourcePos, so that lead to me figuring out what should go in the a spot.

One thing I wish more people in Haskell did was write comments around the type variables (I recently saw a great post on how to comment properly in Haskell on r/haskell), but I also understand why people often don’t – a lot of the time the type is so obvious to the person who wrote the library, it can be shortened without incurring much cognitive load.

There are conflicting research papers on variable name length’s affect on productivity I found referenced in an informative SO post) that argue both for and against longer more descriptive variable names. In this case it’s probably pretty clear which side of the fence I’m on.

Step 3: Switching to Fedora

If you’re confused by the absolute lack of introduction for this step, you’re having a similar experience to me when I realized that Haskell 8.2.2 wasn’t quite supported in Alpine – the stack-provided automatic installation of ghc didn’t (yet) work. I’ve struggled with similar issues before that were Arch Linux specific, and since it’s not really my specialty I wanted to find a way to avoid digging through ldd and stack/ghc/gcc/whatever else output to get to the root of the problem.

The easiest solution I found was just to switch to Fedora, hence the name of this step. The fact that this solution even exists is a testament to containerization and a flexible production environment – normally changing all your builders and deployment “machines” (or VMs) to a different linux distribution just to fix one binary would be a idiotic task. In the containerized future that is now, it’s as easy as changing a FROM alpine:3 to FROM fedora:28. Fedora isn’t as lean as Alpine, but it’s close enough as far as I am concerned, and the caching nature of Docker/OCI compliant images makes the size even less of an issue.

Step 3.1: Debugging issues installing Fedora

Switching to Fedora wasn’t without issues – turns out Fedora has a new installation tool called dnf that replaced yum, so I spent some time reading about that. I also spent a fair bit of time feeling out the lower-level dependencies that were missing on Fedora for my application the hard way – repeatedly building and failing in a container with slightly different output every time. Summarized:

  • Needed to ensure I removed the stack.yaml configuration (comment out #ghc-build: ncurses6) so it can get the right version
  • yum install haskell-platform vs curl -sSL | sh (opted for the latter)
  • Needed to add yum install zlib-devel after seeing an error during
  • Needed to add yum install libstdc++-devel
    • This actually wasn’t as easy, for some reason even after the library was present, the relevant libraries didn’t build properly.
    • Maybe fedora installs it in a non-default location, no idea what stack considers to be a default location for that library.
      • I had the same problems making sure the development libs were there in alpine so no big deal. Had to do these steps for alpine linux as well
      • One good aid while building was just running stack repeatedly – because it does building in parallel and saves successful progress, I’d get a little further every build, despite the same package having an issue (80 remaining -> 60 remaining -> 58 etc)
      • Doing this repeatedly, I got down to double-conversion- being one of 3 remaining dependencies (there’s probably a more proper way to do this, maybe exlcuding the package or something as a stack option), which was nice.
      • Turns out, this was due to c++ tools themselves not being installed, so it was fixed with dnf install gcc-c++

This is around the point that I realized that docker has actually allowed me to do a weird form of linux distribution arbitrage – pickinkg the right distro to build my project on the fly. This might be pretty cool, or it might be one of the worst things to happen to distros (in the unlikely life-simluation-branch/reality where people build on this arbitrage and stop trying to make reliable distros).

Step 3.2: Assessing the image size changes

I use a helper container to make my production builds, which is like a hard cache of stack and related project packages – a janky Docker Multi-Stage Builds, and it is pretty huge. Since changing to a different base image is obviously a big step I decided to take a brief look at the sizes.

Here’s what the image sizes look like for the builder now:

$ docker images | grep builder   latest              8c418ad4ebb1        11 minutes ago      4.4GB   <none>              cecff94bffa1        2 weeks ago         6.69GB   <none>              c7771f7b7431        2 months ago        6.2GB        latest              bde9444ce89a        4 months ago        8.45GB   <none>              fbd783679470        4 months ago        5.54GB

The builder images were pretty huge before anyway but it looks like fedora was actually an improvement over the older alpine-based builders. Also, image layer caching means I should almost never be transferring 4GB over the wire to do a build (this is somewhat less true in the CI environment with runners I don’t control).

One thing that switching to a fedora based builder also made me wonder would be if I had to give up relatively-static binaries… And it turns out that I do. Fedora is a libc based distribution which means that portable static binaries aren’t really easily easily possible. Even when using Alpine (a musl libc distribution), I ran into problems with network-related calls inside haskell that relied on APIs like gethostbyname (IIRC) and all these little functions that were dependent on the platform, so the binary wasn’t even properly static to begin with :(. I’ve actually asked the Rust community on how they do it – while I thought the only solution was to replace libnss it looks like they do the platform-specific thing (so if you use musl libc you’ll be OK).

Some light research suggested that building musl libc in Fedora is an option, but it still seems to be on the package maintainers whitelist ( as of now, and I’m not going to go into it.

This is also a big change worth noting – since I deploy with containers loaded with dynamic library dependencies (and the network-related calls stuff never let the Alpine-environment-built binary be fully static anyway), I’ve actually decided to STOP trying to do static builds for the project.

SIDETRACK: What about the numbers?

I started this post with the intention of showing a nice neat table of all the various things I did and the relevant affect on build time in CI but I’m way too lazy to do it, and for that I apologize. In the end the biggest change was switching off LLVM to fasm, so hopefully you can do your own analysis and weigh what’s right for you

After changing so much of the build infrastructure (locally and remotely), doing a non-trivial refactor on the codebase after the 8.2.2 update, it feels like I’ve let too many changesets sneak in to make a principled/rigorous review of the difference, without lots of effort.

That said, my builds did go from ~13 minutes to ~9 minutes (I believe after the change to fasm), so the factors I anecdotally believe affected change (in order of efficacy) were:

  1. Not using LLVM (using fasm backend)
  2. Not doing static builds (natural result of switching to Fedora)
  3. The refactor – Most importantly I broke up one huge source file listing Servant routes and handlers into many interlinked ones, which might have enabled builds to be more parallel
  4. Change to Fedora as the base OS (due to GHC 8.2.2 not building automagically with stack in Alpine anymore)

Step 4: Watch everything break at deploy time

All set to deploy right? Wrong.I got a big fat error in the logs:

$ kubectl logs app-578fdb798d-cks5n -c api -n totejo
/var/app/project/bin/project: error while loading shared libraries: cannot open shared object file: No such file or directory

Looks like for some reason dynamic library wasn’t available in the minimal deploy container. I figured I needed to make sure some shared library packages that Fedora provided needed to be installed in the deploy container but weren’t. To quickly test, I switched to naively using the builder container as the deploy container, and it started up fine.

Of course, the deploy container size balooned:                1.7.0               7df6ac1135cb        4 seconds ago       4.79GB                1.6.8               12ad5b70d098        5 hours ago         289MB                1.6.7               f51bb2fcbd3c        2 months ago        42.7MB                1.6.5               e01adcae0f5d        3 months ago        41.1MB                1.5.7               cc4005660a27        4 months ago        76.4MB                1.5.1               a549111ae4bb        5 months ago        39.8MB

The size goes from 40MB (Alpine GHC 8.0.2 minimal container w/ static build) to 289MB (Fedora minimal container w/ dynamic build but broken) to 4.79GB (Fedora build container w/ dynamic build).

Obviously, I couldn’t let that ride, so I basically went through and used ldd to find the deps that were missing, and installing them in whatever way was right for Feodora:

[root@0c303764b8f2 /]# ldd /var/app/project/
bin/  data/
[root@0c303764b8f2 /]# ldd /var/app/project/bin/project (0x00007ffc837c1000) => /lib64/ (0x00007ff8feb6a000) => /lib64/ (0x00007ff8fe94b000) => /lib64/ (0x00007ff8fe734000) => /lib64/ (0x00007ff8fe3a2000) => /lib64/ (0x00007ff8fe19a000) => /lib64/ (0x00007ff8fdf97000) => /lib64/ (0x00007ff8fdd93000) => /lib64/ (0x00007ff8fdb17000) => /lib64/ (0x00007ff8fd8ff000) => /lib64/ (0x00007ff8fd540000)
        /lib64/ (0x00007ff8feefe000)
      ...... more stuff ......

After getting all that done, the final, working “minimal” deploy image comes down to ~664MB. It’s not the greatest, but also not the worst, and definitely a lot less work than figuring out WTF is currently wrong with the GHC 8.2.2 build on Alpine right now.


Check out the new Dockerfile:


FROM fedora:28
LABEL description="Helper image to build Job Board API binary"

# Note this file is meant to be run from the TOP LEVEL `api` directory using `-f` option to docker
COPY . /root

RUN dnf -y install zlib-devel gcc-c++ libstdc++-devel make sqlite openssh openssh-clients

# Install stack
RUN curl -sSL | sh

# Build app
RUN stack install --local-bin-path /root/target
RUN stack clean --full


FROM as builder

# Copy fresh code, likely updated version of what was used on the base builder
COPY . /root

# Fresh install to re-build files that might have changed since last builder creation
RUN stack install --local-bin-path /root/target

FROM fedora:28

RUN dnf -y install zlib-devel gcc-c++ libstdc++-devel make sqlite openssh openssh-clients

COPY --from=builder /root/target/project /var/app/project/bin/project

CMD ["/var/app/project/bin/project", "RunServer"]

To summarize again, here are the steps I took that helped, in order of efficacy:

  1. Not using LLVM (using fasm backend)
  2. Not doing static builds (natural result of switching to Fedora)
  3. The refactor – Most importantly I broke up one huge source file listing Servant routes and handlers into many interlinked ones, which might have enabled builds to be more parallel
  4. Change to Fedora as the base OS (due to GHC 8.2.2 not building automagically with stack in Alpine anymore)

Wrap up

In the end, my builds went from ~10 / 15 minutes to <10 (with 5mins being the best case, in the case of a primed Gitlab CI runner cache). It took a bit, but I was glad to finally get there.

Haskell isn’t known for build times, but even with the embarassingly little knowledge of the toolchain that I have, I was able to get some help from the community and get my builds a little faster, and depend on the work of others.

Hope you enjoyed reading about the journey and maybe pulled some usable intelligence from the wandering.

Did you find this read beneficial? Send me questions/comments/clarifciations.
Want my expertise on your team/project? Send me interesting opportunities!