tl;dr - On a Haskell project I’m working on I started with >~20 minute cold-cache builds in the worst case in my Gitlab-powered CI environment then found some small ways to improve. Very recently I decided I wasn’t satisfied with ~10 / 15 minute builds and did the laziest, least-effort steps I could find to get to <10 minute cold-cache builds (~5min best case). Check out the [TLDR][tldr] section to see the Dockerfiles and steps I took summarized.
As mentioned in the TLDR, I’ve worked on this problem before, and while I was satisfied then with the reduction in CI runtime at the time, lately I’ve been doing more work on the relevant Haskell project (there are a lot of posts in the pipeline about the additions I made), and I quickly became unsatisfied with having to wait for ~16 minutes JUST to have my pipelines run (never talk of actually docker build + publish or the extra steps that get run when a version gets pushed.
Despite the previous work, CI runs during a new version tag/release include additional steps (docker building, generated SDK publish, etc) and took close to 20 minutes to complete in the worst (all caches cold) cases. I thought it was ridiculous that I had to wait that long. Haskell is a great, and although it’s not known for it’s build times, my project is relatively simple and ~20 minutes is excessive. There are obviously differences between my local machine (4C/8T desktop with ample RAM) and the CI execution enviornment (likely a process single-core-ish existence behind the gitlab.com infrastructure, but still, 20 minutes seemed to be a long time to wait.
While I wanted to wade in and truly “solve” (or contribute to help solve) this problem, I also didn’t want to try and cut my teeth on becoming a Stack/GHC contributor just yet (especially in such a complex area), so I resolved to do this the laziest way possible. I should note that’s also just about my default mode – low hanging fruit is my favorite kind of fruit.
Along the way to improving the CI build times (again), I also went about upgrading from Stack’s LTS 8.15 resolver to the LTS 11.6 resolver, which also meant a change from GHC 8.0.2 to GHC 8.2.2, which introduced some complications along the way.
Since one detour often begets another, I also felt I needed to spend some time dealing with generated docker image sizes produced by the then-newly-working setup. I posted about the resulting image sizes issue on r/haskell so if you don’t want to sit through this blog post feel free to check there for some of the great advice from the community. The following will be a rehash of my notes during the time, and though it will be sometimes incomplete (like avenues that I mention but just don’t go down), hopefully there’s something valuable in here.
Before I could dive in to trying to make the build process faster for Haskell in the CI environment, I took a quick inventory and did some searching on avenues I hadn’t yet fully explored:
-j
is already in use).With some light reading of these posts and bunch of general leads to follow, I got to the first task, reducing the build time.
Note that stack build --fast
is the best way to make build time dramatically faster locally, but it’s not necessarily appropriate for the CI build environment – I need to do a build that doesn’t just typecheck when I’m in CI (I believe that’s the crux of the optimization there), but I need to run tests in the CI environment as well (and sometimes even build the container later). However, since I’m using docker, I can test to some extent what the CI runner would be doing since it’s docker-image based on my own machine.
One of my first moves here was to start trying the RTS
options that one can pass to ghc
through stack
, by using the ghc-options
property in stack.yaml
.
After a tiny bit of reading (mostly of the stuff from the RTFM section), I patted myself on the back thinking I was done:
# Faster builds
ghc-options:
"$targets": -j -rtsopts -with-rtsopts "-A128m" -with-rtsopts "-n2m"
There’s two real changes here, the A128m
flag, along with the n2m
flag. I’m not actually really sure wtf n2m
does but A128m
increases the pool of memory used for garbage collection for ghc
during compilation. The GHC documentation for -A was kind of helpful but kind of hard to parse/understand. The GHC flag cheatsheet was the real MVP.
I started to do some very inexact testing and here’s what I found:
Iteration | Build description | real |
user |
sys |
Δ% |
---|---|---|---|---|---|
1 | (baseline) Local build w/ only -A128m |
9m23.196s | 0m0.519s | 0m0.141s | 0% |
2 | Local build w/ A128m & -n2m |
9m14.405s | 0m0.531s | 0m0.146s | -~1.6% |
I would take these “results” with a seriously huge grain of salt because most of the time I ran the build commands only once, but here’s what my cursory testing lead me to suspect:
-n2m
didn’t really seem to save much time (~10s
?), could have been just noise-rtsopts
and -with-rtsopts
having no effect with -shared
That last point is a bit of a sticking point… It was very likely that all the “testing” I’d done up until that very point was moot, but it did lead me towards a good lead, namely…
-fllvm
) for buildsThis is probably a step backwards as far as the future of Haskell/GHC goes, since the LLVM is a really promising project/group of projects and looks to be the future, but in my short-sighted quest for faster builds that don’t hurt the end binary, I was willing to allow it.
llvm
is not the default backend (machine code generator) for GHC, the assembly option, called fasm
actually is. Another thing I noticed that I thought I might change, is that the command inside the Dockerfile was using --force-dirty
, and probably didn’t need to.
Iteration | Build description | real |
user |
sys |
Δ% |
---|---|---|---|---|---|
1 | (baseline) Build w/ -fllvm |
9m23.196s | 0m0.519s | 0m0.141s | 0% |
2 | Build w/ out -fllvm (fasm is the default) |
5m59.114s | 0m0.510s | 0m0.172s | -~36.2% |
3 | Build w/ out & without --force-dirty |
5m37.582s | 0m0.462s | 0m0.210s | -~6.1% |
Again, I didn’t really rigorously test these numbers (they’re not averages, only a single sample), but here’s what I hastily gleaned:
--force-dirty
is almost certainly not necessary I’m not going to think about it too muchFor better or for worse, I’ve done close to nothing to understand GHC yet achieved a pretty nice speedup so I was ready to move on. A better, more thorough me in another timeline might have taken more time to read the GHC handbook and get more familiar with the options, but for now I was happy with the gains there.
Coming off the high from ~30% lazily-achieved compile speed gain, I figured that I might as well try and also update to a newer version of ghc
, and see if that reaped even more benefits (and faster code). While I didn’t actually test the code improvements, the change to error messages (color, some more structuring AFAIK) was a really nice benefit (spurred on by this GHC ticket, with the [overall effort being possibly represented by another ticket, I believe).
One of the main value propositions of stack
is that it helps you avoid Cabal hell (which I’ve actually personally been lucky enough to never experience), and makes upgrades way easier by providing resolvers which are like big bundles of packages that are known to work together.
Updating the resolver was all it took (inside stack.yaml
):
resolver: lts-11.6
I also went ahead and removed some dependencies that were being downloaded but are included in the newer version of the resolver.
Upgrading did require some code changes, however, due to changes in the underlying packages (upgrading the resolver meant upgrading some of these packages).
The swagger2
package is an excellent package that I use to generate/deal with generating Swagger 2 documentation for my project. Swagger 3 (now known as the OpenAPI spec AFAIK) is not quite in mass adoption just yet, but Swagger 2 libraries are still perfectly good (and I don’t want to write my own Swagger 3 YAML just yet).
The swagger2
didn’t automatically handle one of my ADTs, Order
very well however. Here’s what the type looked like:
data Order = NoOrdering
| ASC DT.Text
| DESC DT.Text deriving (Generic, Eq, Read)
instance ToSchema Order
The problem with automatically deriving the instance was due to mixing of the NoOrdering
value constructor and the ASC
/DESC
value constructors IIRC, and swagger2
was actually very helpful, pointing you at how to work around it, namely mentioning to use genericDeclareNamedSchemaUnrestricted
to generate the instance manually. For me the code that got everything to compile looked roughly like this:
import Data.Swagger (ToParamSchema, ToSchema(..)), genericDeclareNamedSchemaUnrestricted, defaultSchemaOptions)
data Order = NoOrdering
| ASC DT.Text
| DESC DT.Text deriving (Generic, Eq, Read)
instance ToSchema Order where
declareNamedSchema = genericDeclareNamedSchemaUnrestricted defaultSchemaOptions
I actually decided to do without a NoOrdering
value constructor, deciding a Nothing :: Maybe Order
was good enough to signal a lack of ordering preference. That allowed me to remove the new imports and go back to the automatically derived ToSchema
instance.
ginger
##Way back when I did a lot of Python (mostly Flask, I’m a huge Armin Ronacher fan), I really enjoyed using Jinja which was a popular choice in the ecosystem. Thanks to a brave contributor named Tobias Dammers, a Jinja2-like templating package named ginger
is available to the Haskell community. I wrote briefly about switching to it in an update to a previous post.
The version of ginger I was using actually had some different types involved in making it all work, so I had some changes to make. This was very specific to how I was using Ginger (I wasn’t really using the “easy” API, which didn’t seem to change much) – I had build a bunch of type aliases on top of the base ginger
types to make things easier for myself, for example:
type HtmlContext = GingerContext (Writer Html) Html
type TextContext = GingerContext (Writer DT.Text) DT.Text
In the older version of the code, GingerContext
was abstracted over 2 type variables (i.e. GingerContext a b
), but in the newer version it’s abstracted over 3 (i.e. GingerContext a b c
). Unfortunately it took me quite a while to wrap my head around what the type variables meant (they weren’t named that), but I eventually got it (and a somewhat deeper understanding of haskell library API design and ginger
along the way), and the code turned to something like this:
type HtmlContext = GingerContext SourcePos (Writer Html) Html
type TextContext = GingerContext SourcePos (Writer DT.Text) DT.Text
The change isn’t earth-shattering, but it did take me a long time to figure out what was happening (long enough to actually give up and go to sleep to try again the next day). With a fresh head, the key was figuring out there was a Template a
used to produce/take types like Template Html
or Template DT.Text
but now it’s all Template SourcePos
, so that lead to me figuring out what should go in the a
spot.
One thing I wish more people in Haskell did was write comments around the type variables (I recently saw a great post on how to comment properly in Haskell on r/haskell), but I also understand why people often don’t – a lot of the time the type is so obvious to the person who wrote the library, it can be shortened without incurring much cognitive load.
There are conflicting research papers on variable name length’s affect on productivity I found referenced in an informative SO post) that argue both for and against longer more descriptive variable names. In this case it’s probably pretty clear which side of the fence I’m on.
If you’re confused by the absolute lack of introduction for this step, you’re having a similar experience to me when I realized that Haskell 8.2.2 wasn’t quite supported in Alpine – the stack
-provided automatic installation of ghc
didn’t (yet) work. I’ve struggled with similar issues before that were Arch Linux specific, and since it’s not really my specialty I wanted to find a way to avoid digging through ldd
and stack
/ghc
/gcc
/whatever else output to get to the root of the problem.
The easiest solution I found was just to switch to Fedora, hence the name of this step. The fact that this solution even exists is a testament to containerization and a flexible production environment – normally changing all your builders and deployment “machines” (or VMs) to a different linux distribution just to fix one binary would be a idiotic task. In the containerized future that is now, it’s as easy as changing a FROM alpine:3
to FROM fedora:28
. Fedora isn’t as lean as Alpine, but it’s close enough as far as I am concerned, and the caching nature of Docker/OCI compliant images makes the size even less of an issue.
Switching to Fedora wasn’t without issues – turns out Fedora has a new installation tool called dnf
that replaced yum
, so I spent some time reading about that. I also spent a fair bit of time feeling out the lower-level dependencies that were missing on Fedora for my application the hard way – repeatedly building and failing in a container with slightly different output every time. Summarized:
stack.yaml
configuration (comment out #ghc-build: ncurses6
) so it can get the right versionyum install haskell-platform
vs curl -sSL https://get.haskellstack.org/ | sh
(opted for the latter)yum install zlib-devel
after seeing an error duringyum install libstdc++-devel
double-conversion-2.0.2.0
being one of 3 remaining dependencies (there’s probably a more proper way to do this, maybe exlcuding the package or something as a stack option), which was nice.dnf install gcc-c++
This is around the point that I realized that docker has actually allowed me to do a weird form of linux distribution arbitrage – pickinkg the right distro to build my project on the fly. This might be pretty cool, or it might be one of the worst things to happen to distros (in the unlikely life-simluation-branch/reality where people build on this arbitrage and stop trying to make reliable distros).
I use a helper container to make my production builds, which is like a hard cache of stack
and related project packages – a janky Docker Multi-Stage Builds, and it is pretty huge. Since changing to a different base image is obviously a big step I decided to take a brief look at the sizes.
Here’s what the image sizes look like for the builder now:
$ docker images | grep builder
registry.gitlab.com/project/builder-base latest 8c418ad4ebb1 11 minutes ago 4.4GB
registry.gitlab.com/project/builder-base <none> cecff94bffa1 2 weeks ago 6.69GB
registry.gitlab.com/project/builder-base <none> c7771f7b7431 2 months ago 6.2GB
registry.gitlab.com/project/builder latest bde9444ce89a 4 months ago 8.45GB
registry.gitlab.com/project/builder-base <none> fbd783679470 4 months ago 5.54GB
The builder images were pretty huge before anyway but it looks like fedora was actually an improvement over the older alpine-based builders. Also, image layer caching means I should almost never be transferring 4GB over the wire to do a build (this is somewhat less true in the CI environment with runners I don’t control).
One thing that switching to a fedora
based builder also made me wonder would be if I had to give up relatively-static binaries… And it turns out that I do. Fedora is a libc based distribution which means that portable static binaries aren’t really easily easily possible. Even when using Alpine (a musl libc distribution), I ran into problems with network-related calls inside haskell that relied on APIs like gethostbyname
(IIRC) and all these little functions that were dependent on the platform, so the binary wasn’t even properly static to begin with :(. I’ve actually asked the Rust community on how they do it – while I thought the only solution was to replace libnss
it looks like they do the platform-specific thing (so if you use musl libc you’ll be OK).
Some light research suggested that building musl libc in Fedora is an option, but it still seems to be on the package maintainers whitelist (https://fedoraproject.org/wiki/Package_maintainers_wishlist) as of now, and I’m not going to go into it.
This is also a big change worth noting – since I deploy with containers loaded with dynamic library dependencies (and the network-related calls stuff never let the Alpine-environment-built binary be fully static anyway), I’ve actually decided to STOP trying to do static builds for the project.
I started this post with the intention of showing a nice neat table of all the various things I did and the relevant affect on build time in CI but I’m way too lazy to do it, and for that I apologize. In the end the biggest change was switching off LLVM to fasm
, so hopefully you can do your own analysis and weigh what’s right for you
After changing so much of the build infrastructure (locally and remotely), doing a non-trivial refactor on the codebase after the 8.2.2 update, it feels like I’ve let too many changesets sneak in to make a principled/rigorous review of the difference, without lots of effort.
That said, my builds did go from ~13 minutes to ~9 minutes (I believe after the change to fasm), so the factors I anecdotally believe affected change (in order of efficacy) were:
fasm
backend)stack
in Alpine anymore)**All set to deploy right? Wrong.**I got a big fat error in the logs:
$ kubectl logs app-578fdb798d-cks5n -c api -n totejo
/var/app/project/bin/project: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory
Looks like for some reason dynamic library libstdc++.so.6
wasn’t available in the minimal deploy container. I figured I needed to make sure some shared library packages that Fedora provided needed to be installed in the deploy container but weren’t. To quickly test, I switched to naively using the builder container as the deploy container, and it started up fine.
Of course, the deploy container size balooned:
registry.gitlab.com/project/image 1.7.0 7df6ac1135cb 4 seconds ago 4.79GB
registry.gitlab.com/project/image 1.6.8 12ad5b70d098 5 hours ago 289MB
registry.gitlab.com/project/image 1.6.7 f51bb2fcbd3c 2 months ago 42.7MB
registry.gitlab.com/project/image 1.6.5 e01adcae0f5d 3 months ago 41.1MB
registry.gitlab.com/project/image 1.5.7 cc4005660a27 4 months ago 76.4MB
registry.gitlab.com/project/image 1.5.1 a549111ae4bb 5 months ago 39.8MB
The size goes from 40MB
(Alpine GHC 8.0.2 minimal container w/ static build) to 289MB
(Fedora minimal container w/ dynamic build but broken) to 4.79GB
(Fedora build container w/ dynamic build).
Obviously, I couldn’t let that ride, so I basically went through and used ldd
to find the deps that were missing, and installing them in whatever way was right for Feodora:
[root@0c303764b8f2 /]# ldd /var/app/project/
bin/ data/
[root@0c303764b8f2 /]# ldd /var/app/project/bin/project
linux-vdso.so.1 (0x00007ffc837c1000)
libm.so.6 => /lib64/libm.so.6 (0x00007ff8feb6a000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007ff8fe94b000)
libz.so.1 => /lib64/libz.so.1 (0x00007ff8fe734000)
libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007ff8fe3a2000)
librt.so.1 => /lib64/librt.so.1 (0x00007ff8fe19a000)
libutil.so.1 => /lib64/libutil.so.1 (0x00007ff8fdf97000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007ff8fdd93000)
libgmp.so.10 => /lib64/libgmp.so.10 (0x00007ff8fdb17000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007ff8fd8ff000)
libc.so.6 => /lib64/libc.so.6 (0x00007ff8fd540000)
/lib64/ld-linux-x86-64.so.2 (0x00007ff8feefe000)
...... more stuff ......
After getting all that done, the final, working “minimal” deploy image comes down to ~664MB
. It’s not the greatest, but also not the worst, and definitely a lot less work than figuring out WTF is currently wrong with the GHC 8.2.2 build on Alpine right now.
Check out the new Dockerfile
:
builder-base/Dockerfile
:
FROM fedora:28
LABEL description="Helper image to build Job Board API binary"
# Note this file is meant to be run from the TOP LEVEL `api` directory using `-f` option to docker
COPY . /root
WORKDIR /root
RUN dnf -y install zlib-devel gcc-c++ libstdc++-devel make sqlite openssh openssh-clients
# Install stack
RUN curl -sSL https://get.haskellstack.org/ | sh
# Build app
RUN stack install --local-bin-path /root/target
RUN stack clean --full
Dockerfile
:
FROM registry.gitlab.com/project/image-builder-base as builder
# Copy fresh code, likely updated version of what was used on the base builder
COPY . /root
WORKDIR /root
# Fresh install to re-build files that might have changed since last builder creation
RUN stack install --local-bin-path /root/target
FROM fedora:28
RUN dnf -y install zlib-devel gcc-c++ libstdc++-devel make sqlite openssh openssh-clients
COPY --from=builder /root/target/project /var/app/project/bin/project
EXPOSE 5001
CMD ["/var/app/project/bin/project", "RunServer"]
To summarize again, here are the steps I took that helped, in order of efficacy:
fasm
backend)stack
in Alpine anymore)In the end, my builds went from ~10 / 15 minutes to <10 (with 5mins being the best case, in the case of a primed Gitlab CI runner cache). It took a bit, but I was glad to finally get there.
Haskell isn’t known for build times, but even with the embarassingly little knowledge of the toolchain that I have, I was able to get some help from the community and get my builds a little faster, and depend on the work of others.
Hope you enjoyed reading about the journey and maybe pulled some usable intelligence from the wandering.