REST-ish Services in Haskell: Part 1

Part 1 of a series on how I write REST-ish web services in Haskell with a dash of robustness

vados

49 minute read

Haskell logo + Servant logo

tl;dr - A general tour through a bunch of patterns/strategies I use when developing robust-ish REST-ish web services with Haskell. This post boils down to using some approaches to getting creature comforts set up for your binary. If you want to go straight to the code, check out the gitlab repo @ tag part-1

UPDATE (11/01/2018)

Please do not attempt to run the code snippets directly! Reading the feedback on the r/haskell post and from readers that have emailed me personally, they're incomplete (some work but some don't). If you're playing along at home, please use the snippets as outlines of what the code looks like (compiler intervention possibly required). Head to the Gitlab repository (@ tag part-1) for the completed code that will absolutely work!

UPDATE (10/22/2018)

Thanks to the awesome feedback from the community @ r/haskell, some updates:

  • Links to wai-cli and magicbane, two projects written by u/floatboath that work to scaffold out the CLI and Servant areas of your application more directly basically include 1 lib and you have a CLI/Scaffolded app structure.
  • Inclusion of type signatures for *all* functions (including those in where clauses), thanks to u/kkweon for pointing it out!
  • Links to summoner, an iterative tool for scaffolding projects in Haskell, thanks u/chshersh!
  • A complex but more succint, composable, and type-enhanced way of specifying our configuration objects making use of DataKinds and TypeFamilies, based on code used in summoner (and an explanation mostly submitted by u/chshersh).

Multi-part blog post

This is a multi part blog post with the following sections:

  1. Part 1: Initial project setup (this post)
  2. Part 2: Domain model design and application architecture (Coming soon)
  3. Part 3: Building the API by putting it all together (Coming soon)
  4. Part 4: Observability and other operations concerns (Coming soon)

Introduction

Despite how esoteric Haskell might seem, it’s actually one of the most mainstream entries in ML family of languages of languages. It’s so mainstream in fact, that you can use it to help write your next web service! Haskell can also be used for other kinds of applications of course, but I find myself writing web services most often these days for little projects and wanted to write something to help lay out some patterns for someone trying to do that. While there was a time where Haskell didn’t have the support and ecosystem to easily support web servers development, those times are long gone.

That said, one thing that’s still somewhat lacking is writing around how people of various skill levels are using Haskell – we can’t just rely on the likes of Stephen Diehl (his compendium of what he wished he knew about Haskell earlier is fantastic) forever. There’s lots of content by very experienced Haskellers out there but, we’re lacking in medium to low quality entries, which is what I’m hoping to add to today :) ! You don’t have to be a mathmetician/type expert all at once to productively use Haskell – even if you don’t make use of the advanced features for a long time, the fact that you’re in the right place and the aspirations/skill ceiling are higher means a lot. Do mundane things like running a web service, talking to a database, and dip into the power of the type system when you feel comfortable, to make your code more safe/robust/resilient.

A word on the 3-tier applications & microservices

These days, I’m almost completely invested in the client-redered 3 tier architecture application (+/- “microservices” approach of network calls instead of function calls between large components – so I don’t reach for tools like Yesod (in other languages Rails or Django) that were intended to build full “web applications”. The approach I take from the get-go is “microservices”-like (which some argue is wrong), and I try my best to have the middle-layer do nothing more than manipulate data and respond to a REST-ish API. Browsers and clients are already very overpowered compared to what has existed in the past, and that trend is sure to continue. Server-side rendering just isn’t worth looking into/building for any more – even if it was you could implement server-side rendering by simply using client-side rendering on the server side (SSR approaches used by the likes of VueJS or EmberJS’s Fastboot). A lot of ink has already been spilled on the subject of whether microservices are a good idea, but I’d like to add some more – microservices are a good idea in the same way componentization is a good idea, in a well-architected application and the only difference between microservices and a monolith is is the distance between components (i.e. how they’re connected; local function call/IPC/queue/whatever vs over the network) .

Despite how hard observability is in distributed systems, microservices actually help people debug large systems by constraining the domain they have to consider – if you know a service isn’t involved in an issue at all (for example if it wasn’t called on the request path), 910 times you can ignore it. This same situation occurs in monoliths: what ends up happening is that you either have good separation (and thus the situation is identical) or you have terrible separation and it’s hard to ignore anything (because lot’s of places could be affecting the output), which is the 8th circle of hell. Anyway, let’s talk less about the latest tech trends and go back to how much sense it makes to get in on these sweet sweet trends with Haskell.

What makes up a “simple” web service?

There’s a lot that goes into writing the middle layer of the 3-tier architecture. There’s even more that goes into writing a robust middle layer – these days just about anyone can get a web server listening on port 8080 serving requests in their language of choice in <10 lines of code (ex. ruby’s sinatra, golang’s std library, python’s flask. Here’s just a tiny list of things a robust web service should be doing:

  • Correctness - table stakes is giving the right answer, in a flexible, robust way that won’t cause heartache for future developers
  • Configuration - easy and scalable static configuration (12 Factor apps anyone?) is almost a must-have – extra points for dynamic configurability
  • Logging - structured, properly leveled logs should be sendable to local (e.g. stdout) and remote (e.g. fluentd) destinations
  • Metrics - important business-logic level metrics as well as application-level/service metrics should be easily retrievable
  • Alerting - when do you find out that something is wrong with your application? When a customer tells you or when the system tells you? (to be fair this is basically a consequence of logging+metrics)
  • AuthN/AuthZ - best practices should be in place
  • Partial failure - distributed systems (you almost necessarily have one, even if your main app is a monolith) are almost always in a partially failed state – can your application handle that?
  • Interop - if you expose your system to other teams or even outside entities, does it behave in an industry standard way?

These are just some of the things that are required to make a half-way robust middle layer. At this point people have come up with ways to reduce the required effort, primarily consisting of:

  • creating and using templates (partially pre-baked programs)
  • creating and requiring use of shared libraries (easy to use smarts for all programs)
  • pushing the functionality into infrastructure (smarter pipes for programs to use)

When it’s possible, I think having smarter pipes is the way to go – it cuts down on overhead for people developing applications. As orchestration tools and infrastructure get more robust (ex. Kubernetes, linkerd, etc) we can solve the problem of these requirements at a lower level (on the OSI stack, but paradoxically a higher level of abstraction) for every application at the same time. Why integrate circuit breaking/retry logic in your applications when you can just build it into the substrate your application (and every other one) runs on with a tool like linkerd?. Similarly, you don’t need to use a process wrapper like NodeJS’s forever (or even better pm2) when you can manage container restarts at the orchestrator level like how k8s does with pod auto restarting.

Anyway, I use a fantastic library called servant to write my web services in haskell, and we’re going to be running through this by building a sample web service that everyone has come to know and love, a TODO list app! Throughout the post I’ll normally start off with the simplest way an approach could be tackled, then I’ll put on my strong typing tophat 💪 🎩 and try to outline approaches to make the code more flexible/robust/safe with use of more advanced Haskell features.

RTFM

If you’re not familiar with Haskell, you should definitely read up on it. Here’s a small somewhat scattered list of things you should read/watch/learn about – don’t let it’s size fool you, it will take a while to get up to speed or even feel comfortable if it’s your first time seeing these concepts.

I’m going to have to assume for the sake of my own sanity that you know a bunch of underlying knowledge involved in writing web services – HTTP, REST/HATEOAS concepts, relational databases, etc. There’s just too much to cover there to include it this article series.

Initial project setup

Let’s start from the beginning – we’re going to use stack to initialize our repository (check out the stack quickstart guide for more information):

$ stack new haskell-restish-todo
Downloading template "new-template" to create project "haskell-restish-todo" in haskell-restish-todo/ ...

The following parameters were needed by the template but not provided: author-email, author-name, category, copyright, github-username
You can provide them in /home/<you>/.stack/config.yaml, like this:
templates:
params:
author-email: value
author-name: value
category: value
copyright: value
github-username: value
Or you can pass each one as parameters like this:
stack new haskell-restish-todo new-template -p "author-email:value" -p "author-name:value" -p "category:value" -p "copyright:value" -p "github-username:value"

Looking for .cabal or package.yaml files to use to init the project.
Using cabal packages:
- haskell-restish-todo/

Selecting the best among 14 snapshots...

Downloaded lts-12.12 build plan.
Didn't see Agda-2.5.4.1@sha256:af95ca97485ac35501c993e75d4c683a1afbe833170abd24fe5643a0d68ffcb0,27527 in your package indices.
Updating and trying again.
Selected mirror https://s3.amazonaws.com/hackage.fpcomplete.com/
Downloading timestamp
Downloading snapshot
Updating index
Updated package index downloaded
Update complete
Populated index cache.
* Matches lts-12.12

Selected resolver: lts-12.12
Initialising configuration using resolver: lts-12.12
Total number of user packages considered: 1
Writing configuration to file: haskell-restish-todo/stack.yaml
All done.
$ cd haskell-restish-todo
$ git init .

There’s a servant template supported by stack-templates, but I am more comfortable using no template and going through things in a bit more of a manual fashion. Now that we’ve built the basic project, the directory structure should look something like this:

$ tree .
.
├── app
│   └── Main.hs
├── ChangeLog.md
├── LICENSE
├── package.yaml
├── README.md
├── haskell-restish-todo.cabal
├── Setup.hs
├── src
│   └── Lib.hs
├── stack.yaml
└── test
└── Spec.hs

3 directories, 10 files

Let’s head into the YAML package configuration generated by stack called package.yaml and take a look at it:

name:                haskell-restish-todo
version:             0.1.0.0
github:              "githubuser/haskell-restish-todo"
license:             BSD3
author:              "Author name here"
maintainer:          "example@example.com"
copyright:           "2018 Author name here"

extra-source-files:
- README.md
- ChangeLog.md

# Metadata used when publishing your package
# synopsis:            Short description of your package
# category:            Web

# To avoid duplicated efforts in documentation and dealing with the
# complications of embedding Haddock markup inside cabal files, it is
# common to point users to the README.md file.
description:         Please see the README on GitHub at <https://github.com/githubuser/haskell-restish-todo#readme>

The contents of this file are pretty standard for project-level configuration schemes. A few things I changed that you might want to look into, expressed as sed commands:

  • s/github/gitlab/g - Gitlab is my source code hosting site of choice (it’s F/OSS, lots of amazing features, check it out)
  • s/gitlabuser/mrman/g - My Gitlab user name (note that if you ran the first step githubuser is now gitlabuser in the file)

Some other files you might want to look at are stack.yaml and package.yaml (Stack documentation also contains a page on the differences between those files and Cabal). There’s a lot that can be set in this files, but the basic setup should be roughly similar to other package managers. During regular use you will find yourself updating package.yaml for the most part (stack.yaml when you need even more power than normal). Here’s what my stack.yaml looks like:

# This file was automatically generated by 'stack init'
#
# Some commonly used options have been documented as comments in this file.
# For advanced use and comprehensive documentation of the format, please see:
# https://docs.haskellstack.org/en/stable/yaml_configuration/

# Resolver to choose a 'specific' stackage snapshot or a compiler version.
# A snapshot resolver dictates the compiler version and the set of packages
# to be used for project dependencies. For example:
#
# resolver: lts-3.5
# resolver: nightly-2015-09-21
# resolver: ghc-7.10.2
# resolver: ghcjs-0.1.0_ghc-7.10.2
#
# The location of a snapshot can be provided as a file or url. Stack assumes
# a snapshot provided as a file might change, whereas a url resource does not.
#
# resolver: ./custom-snapshot.yaml
# resolver: https://example.com/snapshots/2018-01-01.yaml
resolver: lts-12.12

# User packages to be built.
# Various formats can be used as shown in the example below.
#
# packages:
# - some-directory
# - https://example.com/foo/bar/baz-0.0.2.tar.gz
# - location:
#    git: https://github.com/commercialhaskell/stack.git
#    commit: e7b331f14bcffb8367cd58fbfc8b40ec7642100a
# - location: https://github.com/commercialhaskell/stack/commit/e7b331f14bcffb8367cd58fbfc8b40ec7642100a
#  subdirs:
#  - auto-update
#  - wai
packages:
- .
# Dependency packages to be pulled from upstream that are not in the resolver
# using the same syntax as the packages field.
# (e.g., acme-missiles-0.3)
# extra-deps: []

# Override default flag values for local packages and extra-deps
# flags: {}

# Extra package databases containing global packages
# extra-package-dbs: []

# Control whether we use the GHC we find on the path
# system-ghc: true
#
# Require a specific version of stack, using version ranges
# require-stack-version: -any # Default
# require-stack-version: ">=1.7"
#
# Override the architecture used by stack, especially useful on Windows
# arch: i386
# arch: x86_64
#
# Extra directories used by stack for building
# extra-include-dirs: [/path/to/dir]
# extra-lib-dirs: [/path/to/dir]
#
# Allow a newer minor version of GHC than the snapshot specifies
# compiler-check: newer-minor

OK, before we go too much further, let’s just make sure this fresh project builds:

$ stack build
Preparing to install GHC (tinfo6) to an isolated location.
This will not interfere with any system-level installation.
Downloaded ghc-tinfo6-8.4.3.
Installed GHC.
[1 of 2] Compiling Main             ( /home/mrman/.stack/setup-exe-src/setup-mPHDZzAJ.hs, /home/mrman/.stack/setup-exe-src/setup-mPHDZzAJ.o)
[2 of 2] Compiling StackSetupShim   ( /home/mrman/.stack/setup-exe-src/setup-shim-mPHDZzAJ.hs, /home/mrman/.stack/setup-exe-src/setup-shim-mPHDZzAJ.o)
Linking /home/mrman/.stack/setup-exe-cache/x86_64-linux-tinfo6/tmp-Cabal-simple_mPHDZzAJ_2.2.0.1_ghc-8.4.3 ...
Building all executables for `haskell-restish-todo' once. After a successful build of all of them, only specified executables will be rebuilt.
haskell-restish-todo-0.1.0.0: configure (lib + exe)
Configuring haskell-restish-todo-0.1.0.0...
haskell-restish-todo-0.1.0.0: build (lib + exe)
Preprocessing library for haskell-restish-todo-0.1.0.0..
Building library for haskell-restish-todo-0.1.0.0..
[1 of 2] Compiling Lib              ( src/Lib.hs, .stack-work/dist/x86_64-linux-tinfo6/Cabal-2.2.0.1/build/Lib.o)
[2 of 2] Compiling Paths_haskell_restish_todo ( .stack-work/dist/x86_64-linux-tinfo6/Cabal-2.2.0.1/build/autogen/Paths_haskell_restish_todo.hs, .stack-work/dist/x86_64-linux-tinfo6/Cabal-2.2.0.1/build/Paths_haskell_restish_todo.o)
ignoring (possibly broken) abi-depends field for packages
Preprocessing executable 'haskell-restish-todo-exe' for haskell-restish-todo-0.1.0.0..
Building executable 'haskell-restish-todo-exe' for haskell-restish-todo-0.1.0.0..
[1 of 2] Compiling Main             ( app/Main.hs, .stack-work/dist/x86_64-linux-tinfo6/Cabal-2.2.0.1/build/haskell-restish-todo-exe/haskell-restish-todo-exe-tmp/Main.o)
[2 of 2] Compiling Paths_haskell_restish_todo ( .stack-work/dist/x86_64-linux-tinfo6/Cabal-2.2.0.1/build/haskell-restish-todo-exe/autogen/Paths_haskell_restish_todo.hs, .stack-work/dist/x86_64-linux-tinfo6/Cabal-2.2.0.1/build/haskell-restish-todo-exe/haskell-restish-todo-exe-tmp/Paths_haskell_restish_todo.o)
Linking .stack-work/dist/x86_64-linux-tinfo6/Cabal-2.2.0.1/build/haskell-restish-todo-exe/haskell-restish-todo-exe ...
haskell-restish-todo-0.1.0.0: copy/register
Installing library in /home/mrman/Projects/foss/haskell-restish-todo/.stack-work/install/x86_64-linux-tinfo6/lts-12.12/8.4.3/lib/x86_64-linux-ghc-8.4.3/haskell-restish-todo-0.1.0.0-GvfaqEXgEBu8eOFxKEbbO5
Installing executable haskell-restish-todo-exe in /home/mrman/Projects/foss/haskell-restish-todo/.stack-work/install/x86_64-linux-tinfo6/lts-12.12/8.4.3/bin
Registering library for haskell-restish-todo-0.1.0.0..

It took a little while to get through this (maybe 5 minutes) as stack downloaded a bunch of requirements (BTW if you’re on Arch Linux, don’t put up with the haskell-* series of packages, just setup stack locally and let it manage the packages). Before we try and run the app itself, let’s take a look at src/Lib.hs and app/Main.hs.

src/Lib.hs:

module Lib
    ( someFunc
    ) where

someFunc :: IO ()
someFunc = putStrLn "someFunc"

app/Main.hs:

module Main where

import Lib

main :: IO ()
main = someFunc

Pretty straight forward stuff! If everything’s been set up right, we should expect stack exec <project>-exe (corresponding to the executable target in the cabal file) to run the program as it sits now, as described by the Stack quickstart guide to write “someFunc” as output to STDOUT and exit immediately. Let’s make sure:

$ stack exec haskell-restish-todo-exe
someFunc

OK, great, we’ve got ourselves completely setup. Let’s get to adding the libraries we need (in particular, servant). This would be a good place for a git commit of the progress so far – I won’t be prescribing when to do so after this but obviously commit and back up your work at your own pace.

Adding dependencies

Thanks to the help of ghc, cabal and stack we’ve got the basic project all set up, and we have a working application that will at least do something. Let’s start to mix in servant – we’re going to add the popular server library to our project as a dependency. Thanks to stack’s special features (the build system, in particular the resolver) we can do this two different ways, either editing the .cabal file directly or editing package.yaml inside the project directory. stack will do the hard work of resolving the package’s version to one that meshes with what everything else we’ve specified. We’ll want to use servant (more specifically servant-server) from our app folder, so let’s make it a dependency of the executable we’re generating – src will hold only our domain-specific stuff. Here’s an excerpt of what the package.yaml file would look like:

And if you went the package.yaml route (which I recommend):

executables:
  haskell-restish-todo-exe:
    main:                Main.hs
    source-dirs:         app
    ghc-options:
    - -threaded
    - -rtsopts
    - -with-rtsopts=-N
    dependencies:
    - haskell-restish-todo
+    - servant-server

And if you modified the .cabal directly (which I don’t recommend):

executable haskell-restish-todo-exe
  main-is: Main.hs
  other-modules:
      Paths_haskell_restish_todo
  hs-source-dirs:
      app
  ghc-options: -threaded -rtsopts -with-rtsopts=-N
  build-depends:
                base >=4.7 && <5
              , haskell-restish-todo
+              , servant-server
  default-language: Haskell2010

That’s it! Now that we’ve added the dependency, let’s run stack build to make sure stack has no problem with what we’ve configured building everything:

$ stack build
Cabal-2.2.0.1: configure
Cabal-2.2.0.1: build
Cabal-2.2.0.1: copy/register
cabal-doctest-1.0.6: configure
cabal-doctest-1.0.6: build
cabal-doctest-1.0.6: copy/register
http-api-data-0.3.8.1: download
http-api-data-0.3.8.1: configure
http-api-data-0.3.8.1: build
http-api-data-0.3.8.1: copy/register
servant-0.14.1: download
servant-0.14.1: configure
servant-0.14.1: build
servant-0.14.1: copy/register
servant-server-0.14.1: download
servant-server-0.14.1: configure
servant-server-0.14.1: build
servant-server-0.14.1: copy/register
haskell-restish-todo-0.1.0.0: configure (lib + exe)
Configuring haskell-restish-todo-0.1.0.0...
haskell-restish-todo-0.1.0.0: build (lib + exe)
Preprocessing library for haskell-restish-todo-0.1.0.0..
Building library for haskell-restish-todo-0.1.0.0..
[1 of 2] Compiling Lib              ( src/Lib.hs, .stack-work/dist/x86_64-linux-tinfo6/Cabal-2.2.0.1/build/Lib.o)
[2 of 2] Compiling Paths_haskell_restish_todo ( .stack-work/dist/x86_64-linux-tinfo6/Cabal-2.2.0.1/build/autogen/Paths_haskell_restish_todo.hs, .stack-work/dist/x86_64-linux-tinfo6/Cabal-2.2.0.1/build/Paths_haskell_restish_todo.o)
ignoring (possibly broken) abi-depends field for packages
Preprocessing executable 'haskell-restish-todo-exe' for haskell-restish-todo-0.1.0.0..
Building executable 'haskell-restish-todo-exe' for haskell-restish-todo-0.1.0.0..
[1 of 2] Compiling Main             ( app/Main.hs, .stack-work/dist/x86_64-linux-tinfo6/Cabal-2.2.0.1/build/haskell-restish-todo-exe/haskell-restish-todo-exe-tmp/Main.o) [Lib changed]
[2 of 2] Compiling Paths_haskell_restish_todo ( .stack-work/dist/x86_64-linux-tinfo6/Cabal-2.2.0.1/build/haskell-restish-todo-exe/autogen/Paths_haskell_restish_todo.hs, .stack-work/dist/x86_64-linux-tinfo6/Cabal-2.2.0.1/build/haskell-restish-todo-exe/haskell-restish-todo-exe-tmp/Paths_haskell_restish_todo.o)
Linking .stack-work/dist/x86_64-linux-tinfo6/Cabal-2.2.0.1/build/haskell-restish-todo-exe/haskell-restish-todo-exe ...
haskell-restish-todo-0.1.0.0: copy/register
Installing library in /home/mrman/Projects/foss/haskell-restish-todo/.stack-work/install/x86_64-linux-tinfo6/lts-12.12/8.4.3/lib/x86_64-linux-ghc-8.4.3/haskell-restish-todo-0.1.0.0-6mAFcPLEQQ5E8l7oaz5TYn
Installing executable haskell-restish-todo-exe in /home/mrman/Projects/foss/haskell-restish-todo/.stack-work/install/x86_64-linux-tinfo6/lts-12.12/8.4.3/bin
Registering library for haskell-restish-todo-0.1.0.0..
Completed 6 action (s).

For me it took a little while for everything to build but now that we’ve got the servant-server dependency set up. Despite this little taste of the work it will take to get the web server all set up, we’re actually going to take an immediate detour (for the rest of this post) into concepts not directly related to serving web traffic, but that you’ll almost undoubtedly face when writing a robust web service.

Server executable creature comforts

Let’s take some time to make our main function and the code in app/ more robust – there are a few things that almost every executable for a server should do so let’s set them up before we get down to the nitty gritty of actually building the service. There are lots of great command line tools these days that make use of the subcommand pattern (e.x. <program> <cmd> or <program> <cmd> --help for targeted help) and I’d love to put some of that robustness in right away.

NOTE Another excellent guide to writing this kind of CLI-related code can be found in the literate haskell documentation for teleport. It’s arguably way better than what you’re about to read in the next section, especially it’s covering of optparse-applicative.

Supporting running specific directly commands (AKA CLI bling)

I often want to run specific functionality or specific commands (let’s say intiializing a root admin user) with the same CLI that launches the actual server – something like app init, and changing the command that starts the server to something like app server. There are a bunch of libraries in Hackage categorized as “CLI”, let’s use optparse-applicative to build it. Let’s add optparse-applicative as a dependency for the exe (just like we did for servant-server) then add some code directly inside app/Main.hs for now:

module Main where

import Lib
import Data.Semigroup ((<>))
import Options.Applicative (Parser, subparser, execParser, info, argument, str, idm, command)
import Control.Monad (join)

-- Todo: improve with newtypes? use existing types?
type Host = String
type Port = Integer

newtype Options = Options {cmd :: Command}

data Command = Serve Host Port

-- | Start up the server and serve requests
server :: IO ()
server = putStrLn "<SERVER START>"

-- | CLI options parser
opts :: Parser (IO ())
opts = subparser commands
    where
      -- | IO action that produces an IO action.
      --   This was necessary due to using optparse-applicative like I am, by returning an IO action straight after `info` in serverCmd
      serverAction :: IO (IO ())
      serverAction = pure server

      serverCmd :: ParserInfo (IO ())
      serverCmd = info serverAction idm

      commands :: Mod CommandFields Command
      commands = command "server" serverCmd

-- | execParser's signature is `execParser :: ParserInfo a -> IO a`,
--   so if you do the replacement , you'll get `ParserInfo (IO ()) -> IO (IO ())`, this wouldn't work for main.
--   That's where `join` comes in, it "smooshes" two levels of a monad (so `IO (IO ())` becomes `IO ()`)
main :: IO ()
main = join $ execParser parser
    where
      parser :: ParserInfo (IO ())
      parser = info opts idm

If we build the project and attempt to run the generated executable we’ll see the results of our light optparse-applicative setup:

$ stack build
< output clipped >
$ stack exec haskell-restish-todo-exe
Missing: COMMAND

Usage: haskell-restish-todo-exe COMMAND
$ stack exec haskell-restish-todo-exe server
<SERVER START>

As you can see, we’ve got a basic command-driven CLI setup – now if we decide to do something like manage DB migrations through our application or add other diagnostic commands, we’ll be ready. You might have noticed that we specify a host and port with some pretty weak type aliases, but don’t actually parse for them with optparse-applicative, this is becasue we’re going to slightly change the approach with the introduction of configuration hunting and gathering. Rather than fully write out the parser at this step, let’s write it with multiple configuration sources in mind.

Configuration hunting & gathering

In the previous section we enabled the ability to pass in a few options, namely the host and port configuration for running the web server. While it’s not necessarily super necessary at this stage, let’s go ahead and define an application-level configuration type that will hold all the relevant information we care about for the application:

src/Config.hs:

module Config where

data AppConfig = AppConfig
    { acHost :: Host
    , acPort :: Port
    }

This is all well and good but let’s pull out our strong typing tophat 💪 🎩 and nip some problems in the bud. Here’s that same type but a little fancier, using some parametric polymorphism:

data FancyAppConfig f = FancyAppConfig
    { facHost  :: f Host
    , facPort  :: f Port
    }

Why the increase in complexity? What this affords us is the ability to represent partially and completely specified configurations – for example a FancyAppConfig Maybe type that would contain an only partially specified configuration (replace the f with Maybe in every place it’s mentioned). While it’s a bit abstract (and maybe premature) at this stage, this allows us to reason about the state of the entries in the FancyAppConfig object we’re dealing with – we can write functions that accept FancyAppConfig Identity if we want to ensure a fully qualified structure, or FancyAppConfig Maybe if we want to allow partially defined structure. We can make this intention more clear with a couple type aliases:

-- ... other code ...

import Data.Functor.Identity

type CompleteAppConfig = AppConfig Identity
type PartialAppConfig = AppConfig Maybe

Before we go any further, let’s setup some machinery/safety around things that have defaults (and ensure FancyAppConfig has one). We’ll need Haskell’s FlexibleInstances extension for this to work right, so it might make sense to read up on that (it’s covered by stephen diehl and another writeup from the GHC extension guide thanks to Jannis Limperg):

{-# LANGUAGE FlexibleInstances #-}

-- ... other code ...

class HasDefaultValue a where
    defaultValue :: a

defaultHost :: Host
defaultHost = Host "localhost"

defaultPort :: Port
defaultPort = Port 5000

instance HasDefaultValue (FancyAppConfig Identity) where
    defaultValue = FancyAppConfig (Identity defaultHost) (Identity defaultPort)

instance HasDefaultValue (FancyAppConfig Maybe) where
    defaultValue = FancyAppConfig (Just defaultHost) (Just defaultPort)

Here’s a breakdown of what’s happening:

  • FlexibleInstances was added due to the need to specify the f in FancyAppConfig (which we’ll be calling AppConfig from now on, or using the type aliases CompleteAppConfig/PartialAppConfig).
  • HasDefaultValue is a typeclass for types that… have a default value, I’ve included a quick basic implementation

There’s not a lot in there right now but that’s OK, because we’re going to use this structure to enable loading configuration from a different source – TOML files. We’ll use the htoml package to parse the prospective files and refactor our main code to actually build that structure from the CLI rather than just taking the host and port separately.

Easily constructing AppConfig values from disparate sources

Now that we have an AppConfig type, it would be nice to easily construct values of it from a config file. There are lots of choices for configuration, but generally apps these days do best to take them from files (in easy to understand formats such as JSON, TOML, YAML, etc) and/or the environment (If you haven’t heard of 12 Factor apps, check it out and see if any of the ideas are interesting to you).

Constructing AppConfig values from JSON via config file, (FromJSONFile)

Since we don’t want to force people to put in every single configuration option when there are reasonable imaginable defaults, reading from a JSON configuration file should produce a AppConfig Maybe. There are lots of ways to parse JSON in Haskell but we’re going to use the well known aeson – along with an excellent guide written by Artyom Kazak. Here are the lines that were added/changed to support this functionality:

--- ... other code ...

import Data.Aeson (eitherDecode, withObject, (.:))
import Data.Bifunctor (first)
import Data.ByteString.Lazy  as DBL
import Data.Functor.Identity

-- ... other code ...

type Host = String -- changed to simple type alias (from newtype)
type Port = Integer -- changed to simple type alias (from newtype)

-- ... other code ...

-- The definition below hasn't changed but is here for reference
data AppConfig f  = AppConfig
    { host :: f Host
    , port :: f Port
    }

type CompleteAppConfig = AppConfig Identity
type PartialAppConfig = AppConfig Maybe

-- | Manual FromJSON instance for use by/with Aeson
instance FromJSON PartialAppConfig where
    parseJSON = withObject "cfg" parseObj
        where
          parseObj :: Object -> Parser PartialAppConfig
          parseObj obj = obj .: "host"
                         >>= \host -> obj .: "port"
                         >>= \port -> pure $ AppConfig { host=host
                                                       , port=port
                                                       }

class (FromJSON cfg) => FromJSONFile cfg where
    fromJSONFile :: FilePath -> IO (Either ConfigurationError cfg)

instance FromJSONFile PartialAppConfig where
    fromJSONFile path = decodeAndTransformError <$> DBL.readFile path
        where
          decodeAndTransformError :: ByteString -> Either ConfigurationError PartialAppConfig
          decodeAndTransformError = first ConfigParseError . eitherDecode

Let’s put on our strong typing tophats 💪 🎩 and achieve the same functionality with a little more power:

{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE DeriveAnyClass #-}
{-# LANGUAGE StandaloneDeriving #-}

--- ... other code ...

import Data.Aeson (eitherDecode)
import Data.Bifunctor (first)
import Data.ByteString.Lazy  as DBL
import Data.Functor.Identity
import GHC.Generics

-- ... other code ...

type Host = String -- changed to simple type alias (from newtype)
type Port = Integer -- changed to simple type alias (from newtype)

-- ... other code ...

-- The definition below hasn't changed but is here for reference
data AppConfig f  = AppConfig
    { host :: f Host
    , port :: f Port
    } deriving (Generic)

type CompleteAppConfig = AppConfig Identity
deriving instance Generic CompleteAppConfig
deriving instance Eq CompleteAppConfig
deriving instance Show CompleteAppConfig
deriving instance FromJSON CompleteAppConfig

type PartialAppConfig = AppConfig Maybe
deriving instance Generic PartialAppConfig
deriving instance Eq PartialAppConfig
deriving instance Show PartialAppConfig
deriving instance FromJSON PartialAppConfig

class (FromJSON cfg) => FromJSONFile cfg where
    fromJSONFile :: FilePath -> IO (Either ConfigurationError cfg)

instance FromJSONFile PartialAppConfig where
    fromJSONFile path = decodeAndTransformError <$> DBL.readFile path
        where
          decodeAndTransformError :: ByteString -> Either ConfigurationError PartialAppConfig
          decodeAndTransformError = first ConfigParseError . eitherDecode

So if you feel like we just went from 0 complexity/understandable Haskell to rocket science, don’t freak out, because we kind of did. There is a lot of additional stuff to understand in this code, despite how concise it is, let’s go through it and I’ll explain the what and why of the code above:

  • Host and Port needed to be changed to simple type aliases because aeson will try and decode it as an object instead of a simple value (so it will look for {getHost :: String})
  • GeneralizedNewtypeDeriving was no longer needed after changing Host and Port to plain type aliases
  • GHC.Generics was added to facilitate FromJSON instance generation – aeson can make FromJSON instances for objects that have a Generic instance automatically (you can read more about this in the aeson guide).
  • DeriveGeneric was used to avoid using a manual FromJSON instance for PartialAppConfig
  • DeriveAnyClass was added so we could derive FromJSON at all, it’s not one of the classes that is normally allowed to be derived.
  • StandaloneDeriving was added to allow specification of the f in AppConfig f – the normal deriving syntax doesn’t allow you to specify which instance (e.x. AppConfig Identity or AppConfig Maybe) should have the derivation.
  • Data.ByteString.Lazy was imported to help load config files but might not the laziness might not be necessary here as the config files should be pretty small.
  • instance FromJSON PartialAppConfig is used along with GHC.Generics so that aeson can actually automatically figure out the correct FromJSON instance.

Of course you’ll also need the following new dependencies (for reasons that are hopefully obvious now):

- aeson
- bytestring

Keep in mind that it’s acceptable to use the manual FromJSON instance if that’s what you feel comfortable with. As you use Haskell more and more over the years, you will be come more comfortable with extensions naturally, there’s no need to rush it (and it might actually be harmful to rush it). We can try this out in a local GHCI window to make sure it works (make a fake config file in /tmp to try):

$ stack ghci
haskell-restish-todo-0.1.0.0: initial-build-steps (lib + exe)
The following GHC options are incompatible with GHCi and have not been passed to it: -threaded
Configuring GHCi with the following packages: haskell-restish-todo
Using main module: 1. Package `haskell-restish-todo' component exe:haskell-restish-todo-exe with main-is file: /home/mrman/Projects/foss/haskell-restish-todo/app/Main.hs
GHCi, version 8.4.3: http://www.haskell.org/ghc/  :? for help
Loaded GHCi configuration from /home/mrman/.ghci
[1 of 3] Compiling Config           ( /home/mrman/Projects/foss/haskell-restish-todo/src/Config.hs, interpreted)
[2 of 3] Compiling Lib              ( /home/mrman/Projects/foss/haskell-restish-todo/src/Lib.hs, interpreted)
[3 of 3] Compiling Main             ( /home/mrman/Projects/foss/haskell-restish-todo/app/Main.hs, interpreted)
Ok, three modules loaded.
Loaded GHCi configuration from /tmp/haskell-stack-ghci/5c8cd753/ghci-script
*Main Config Lib
λ :l Config
[1 of 1] Compiling Config           ( /home/mrman/Projects/foss/haskell-restish-todo/src/Config.hs, interpreted)
Ok, one module loaded.
*Config
λ cfg = ( eitherDecode <$> DBL.readFile ("/tmp/test-cfg.json" :: FilePath) :: IO (Either String (AppConfig Identity)))
*Config
λ cfg
Right (AppConfig {host = Identity "somewhere-else", port = Identity 3000})
*Config

So far so good – not only do things compile, the FromJSONFile functionality I wrote seems to be working given a real path to a temp file on disk. A this point your spidey sense should be tingling to write some tests to ensure we never regress from this level of functionality. As scary as it seems, just hold off on the urge to start writing some tests for ~2 more quick sections and we’ll get to testing – for now, let’s be content that it works in GHCI.

Constructing AppConfig values from TOML via config file (FromTOMLFile)

TOML has to produce possibly incomplete/partially specified AppConfigs, just for the same reason as JSON. Let’s give it a shot, using htoml. For the most part, we can actually convert the TOML structure to a JSON one and read from there, which makes things much simpler for TOML by reducing it to the already-solved JSON problem.

import Data.Aeson (FromJSON(parseJSON), toJSON, eitherDecode)
import Data.Aeson.Types (parseEither)
import Data.Bifunctor (bimap)
import Data.Text.IO as DTI
import Text.Toml (parseTomlDoc)

-- ... other code ...

class (FromJSONFile cfg) => FromTOMLFile cfg where
    fromTOMLFile :: FilePath -> IO (Either ConfigurationError cfg)

instance FromTOMLFile PartialAppConfig where
    fromTOMLFile path = DTI.readFile path
                        >>= pure . first TOMLParserError . parseTomlDoc ""
                        >>= pure . second (parseEither parseJSON . toJSON)
                        >>= \v -> pure $ case v of
                                           Right (Right cfg) -> Right cfg
                                           Right (Left err) -> Left (ConfigParseError err)
                                           Left err -> Left err -- likely a TOMLParserError

The good news is there’s no more new crazy language extensions/advanced stuff to learn for the 💪 🎩 approach, but the bad news is that this was a bit more complicated to write than I’d expected (for me at least) while I was doing it. Since the interface to htoml is a little different, the structure is the same, but the steps inbetween (in particular converting some types and doing the conversion to JSON and back) got a bit tedious. There’s not much more to understanding this code than to read it and try to work through it in your head (and re-write/edit the code yourself), but how it works is simple (in theory at least, and if you hand-wave inbetween the >>=s):

  • Read in a file (Data.Text.IO.readFile)
  • Parse the file with htoml (parseTomlDoc) and if that fails, transform the error into a TOMLParserError
  • Use second to modify the Right-hand side, and convert it to a JSON (aeson) Value, then parse the Value into the config object
  • Since parseEither produces an Either we need to reach in and unwrap the value on the right side (if things were successful up until now Either ConfigurationError (Either String cfg) at this point, we need Either ConfigurationError cfg))

It’s a bit complicated to work through but hopefully it’s reasonable to read. Note that I also use the explicit >>= (bind) style rather than do blocks, because I think it forces me to write more reusable functionality and makes it clearer the steps code is taking. It’s also really clear to see noise when it appears (the pure .s are kidn of noisy, aren’t they?), and encourages not making the number of steps too long.

Constructing AppConfig values from ENV a typeclass, (FromENV)

ENV is a bit different from the other two as it should be very easy to write in a dumb way:

-- ... other code ...

import Text.Read (readMaybe)

-- ... other code ...

class FromENV cfg where
    fromENV :: ProcessEnvironment -> IO (Either ConfigurationError cfg)

instance FromENV PartialAppConfig where
    fromENV pEnv = pure $ Right $ AppConfig { host=prop "host"
                                            , port=join $ fmap readMaybe $ prop "port"
                                            }
        where
          -- | The ENV for a process looks like this (a basic haskell Map)
          env :: [(String, String)]
          env = getProcessEnv pEnv

cw          -- | Since the map is baked into the function itself (as `env`),
          --   this function literally boils down to a mapping from keys to string values from the env.
          prop :: String -> Maybe String
          prop = flip lookup env

Easy Peezy! With this we should be able to easily create a PartialAppConfig (which is a type alias for AppConfig Maybe) from environment variables as well.

Enabling multi-tiered configuration overrides by merging AppConfigs

Now that we can construct AppConfig values from different sources, the next obvious step is to prepare for when these sources interact. Sure, we could have just made them override one another completely, but I find that in most production applications, systems that are flexible in how they handle configuration (but make it very clear how conflicts are handled) are fantastic to use. Here’s the configuration override order I’d go in (least important first):

  1. JSON (a AppConfig Maybe, which may be full of Just values)
  2. TOML / YAML (a AppConfig Maybe, which may be full of Just values)
  3. ENV (a AppConfig Maybe, which may be full of Just values)

This list is in order of human readability – It’s more likely that humans hand-edit/override config in TOML than JSON, and even more likely for direct ENV changes. I also personally like TOML more than YAML and find that it is easier to read/reason about when lines get long, and a bunch of other reasons which I won’t go into here. JSON is more often for machines these days than it is for humans, though it is definitely readable/modifiable in a pinch. Regardless of how you structure the order or which tech you pick, the approach should be roughly the same:

class MergeOverridable a where
    mergeOverride :: a -> a -> a

The basic idea is simple – we can identify merge-overridable objects by whether they implement this typeclass which provides us with a mergeOverride function. Here’s a simplistic implementation:

instance MergeOverridable PartialAppConfig where
    mergeOverride a b = AppConfig { host=resolveMaybes host
                                  , port=resolveMaybes port
                                  }
        where
          -- TODO: definitely wirte a test here to ensure this order is right
          resolveMaybes :: (PartialAppConfig -> Maybe a) -> Maybe a
          resolveMaybes fn = (fn b) <|> (fn a)

At first I was quite confused, I’d never tried to “or” two Maybes before, and at first I wrote some code using fromMaybe (something like fromMaybe (fromMaybe Nothing secondThing) firstThing), but then after finding an SO post made it pretty clear that what I was looking for was the Control.Applicative instance of Maybe, which lead to the more concise syntax you see there.

Putting it all together

OK, let’s put all of this together into one function that prepares an AppConfig for our application to use. We’ll need some help from FileSystem.Path, so add the dependency (system-filepath) in your package.yaml then try:

-- ... other code ...

import Control.Exception (Exception, try, throw)
import Filesystem.Path as FP
import Filesystem.Path.CurrentOS  as FPCOS
import Control.Monad (join, when)
import Data.Maybe (fromMaybe)

-- ... other code ...

mergeInPartial :: CompleteAppConfig -> PartialAppConfig -> CompleteAppConfig
mergeInPartial c p = AppConfig { host = fromMaybe (host c) (Identity <$> host p)
                               , port = fromMaybe (port c) (Identity <$> port p)
                               }

-- | Ensure that an Either resolves to it's Right value, ensure that a
rightOrThrow :: (Exception a) => Either a b -> IO b
rightOrThrow e = case e of
                   (Left err) -> throw err
                   (Right v) -> return v

buildConfigWithDefault :: CompleteAppConfig -> [PartialAppConfig] -> CompleteAppConfig
buildConfigWithDefault orig partials = orig `mergeInPartial` combinedPartials
    where
      combinedPartials :: PartialAppConfig
      combinedPartials = Prelude.foldl mergeOverride (defaultValue :: PartialAppConfig) partials

-- | Build an App configuration from a given file, using system environment as well as
makeAppConfig :: ProcessEnvironment -> FP.FilePath -> IO (Either ConfigurationError CompleteAppConfig)
makeAppConfig env path = try generateConfig
    where
      extension :: Maybe Text
      extension = FP.extension path

      isJSONExtension = (=="json")
      isTOMLExtension = (=="toml")
      isJSONFile = maybe False id $ isJSONExtension <$> extension
      isTOMLFile = maybe False id $ isTOMLExtension <$> extension

      pathExtensionIsInvalid :: Bool
      pathExtensionIsInvalid = not $ isJSONFile || isTOMLFile

      pathInvalidExtensionErr :: ConfigurationError
      pathInvalidExtensionErr = InvalidPath path "Path is invalid (must be either a .json or .toml path)"

      envCfg :: PartialAppConfig
      envCfg = (fromENV env :: PartialAppConfig)

      getFileConfig :: IO (Either ConfigurationError PartialAppConfig)
      getFileConfig = case isJSONFile of
                        True -> fromJSONFile path
                        False -> fromTOMLFile path

      generateConfig :: IO CompleteAppConfig
      generateConfig = when pathExtensionIsInvalid (throw pathInvalidExtensionErr)
                       >> getFileConfig
                       >>= rightOrThrow
                       >>= \fileCfg -> pure (buildConfigWithDefault (defaultValue :: CompleteAppConfig) [fileCfg, envCfg])

There is a bit in the functions above, but it should be pretty easy to follow, hopefully… Let’s test it out from GHCI:

$ TODO_HOST=changed-host-from-env stack ghci
haskell-restish-todo-0.1.0.0: configure (lib + exe)
Configuring haskell-restish-todo-0.1.0.0...
haskell-restish-todo-0.1.0.0: initial-build-steps (lib + exe)
The following GHC options are incompatible with GHCi and have not been passed to it: -threaded
Configuring GHCi with the following packages: haskell-restish-todo
Using main module: 1. Package `haskell-restish-todo' component exe:haskell-restish-todo-exe with main-is file: /home/mrman/Projects/foss/haskell-restish-todo/app/Main.hs
GHCi, version 8.4.3: http://www.haskell.org/ghc/  :? for help
Loaded GHCi configuration from /home/mrman/.ghci
[1 of 3] Compiling Config           ( /home/mrman/Projects/foss/haskell-restish-todo/src/Config.hs, interpreted )
[2 of 3] Compiling Lib              ( /home/mrman/Projects/foss/haskell-restish-todo/src/Lib.hs, interpreted )
[3 of 3] Compiling Main             ( /home/mrman/Projects/foss/haskell-restish-todo/app/Main.hs, interpreted )
Ok, three modules loaded.
Loaded GHCi configuration from /tmp/haskell-stack-ghci/5c8cd753/ghci-script
*Main Config Lib
λ :l Config
[1 of 1] Compiling Config           ( /home/mrman/Projects/foss/haskell-restish-todo/src/Config.hs, interpreted )
Ok, one module loaded.
*Config FP FPCOS
λ env <- System.Environment.getEnvironment
*Config FP FPCOS
λ path = FPCOS.fromText (Data.Text.pack "/tmp/test-cfg.json")
*Config FP FPCOS
λ makeAppConfig (ProcessEnvironment env) path
Right (AppConfig {host = Identity "changed-host-from-env", port = Identity 3000})

Looks great! We’re picking up both the config from the environment and the config from the JSON file and they’re getting overrident properly (default values, then JSON for the 3000, then ENV for the changed-host-from-env). I tried one time with TOML just to be sure but I won’t post that here. One thing I wasn’t particularly sure about was how to make AppConfig Identity possibly MergeOverridable with AppConfig Maybe – I suspected it might be impossible to express so I just implemented mergeInPartial.

HasDefaultValue and MergeOverridable sure look familiar… (💪 🎩)

It’s strong typing tophat 💪 🎩 time! HasDefaultValue and MergeOverridable that we’ve just defined are simple to understand but terribly unnecessary typeclasses! Not only are these two typeclasses redundant with useful Haskell concepts that already exist, they can even be wrong when applied to other problems (the general case). What we’ve actually stumbled upon here is an instance of the Monoid typeclass from Data.Monoid! Here’s the definition of Monoid:

class Semigroup a => Monoid a where
        -- | Identity of 'mappend'
        mempty  :: a

        -- | An associative operation
        --
        -- __NOTE__: This method is redundant and has the default
        -- implementation @'mappend' = '(<>)'@ since /base-4.11.0.0/.
        mappend :: a -> a -> a
        mappend = (<>)
        {-# INLINE mappend #-}

        -- | Fold a list using the monoid.
        --
        -- For most types, the default definition for 'mconcat' will be
        -- used, but the function is included in the class definition so
        -- that an optimized version can be provided for specific types.
        mconcat :: [a] -> a
        mconcat = foldr mappend mempty

A member of the Haskell community named George Wilson (of the Haskell.org comittee) produces great work from the land down under, and he actually addresses this exact issue in a recent talk. Let’s remove HasDefaultValue and MergeOverridable in favor of Monoid instead:

instance Semigroup CompleteAppConfig where
    a <> b = b

instance Monoid CompleteAppConfig where
    mempty = AppConfig (Identity defaultHost) (Identity defaultPort)

instance Semigroup PartialAppConfig where
    a <> b = AppConfig { host=resolveMaybes host
                       , port=resolveMaybes port
                       }
        where
          resolveMaybes getter = maybe (getter a) Just (getter b)

instance Monoid PartialAppConfig where
    mempty = AppConfig Nothing Nothing

Yes, it’s a tiny bit of a stretch to say that the “default” configuration is semantically equal to mempty but for our AppConfig I’m willing to live with this as a truth. Also, as far as good engineering goes, I’d be remiss if I didn’t mention that there may or may not be other libraries for doing more of this whole process for you (if you know one please email and let me know so I can update this post!) – I find it enjoyable to work through building this stuff which is why I’ve put it in here, but you might want to spend more time in the “discovery” phase than I did, trying to find a library that has all this taken care of for you already, in a reasonable way.

There’s also probably more ways to get some more type safety in here, so be on the lookout! The only way to know when you’ve stepped into well-treaded territory is by taking some time to look around! Make sure to go and take a look at the repository @ the part-1 tag for the full code listing!

UPDATE(10/22/2018) As mentioned at the top of the post, one good way to sidestep all of this work is to take a look at using one of the following community projects:

  • wai-cli (you import a symbol, you get a CLI)
  • magicbane (you import a symbol, you get a scaffolded servant app)
  • summoner (run a command and iteratively walk through lots of different kinds of advanced setup, anywhere from picking dependencies for stack to Travis CI setup)

These are some great community projects to help you sidestep this whole process and you should take a look at them, if not only to see how others organize/solve this problem.

Linting

hlint (it’s on Hackage as well) is a pretty awesome tool and is good to get started using early. I had a small hiccup because it was reporting the Language pragma DeriveAnyClass as unused when it was used (file doesn’t compile without it), but other than that, it’s illuminated me to a bunch of improvements that be made to my code. For example, the use of =<<:

/home/mrman/Projects/foss/haskell-restish-todo/src/Config.hs:112:37: Suggestion: Use =<<
Found:
  join $ fmap readMaybe $ prop "TODO_PORT"
Why not:
  readMaybe =<< prop "TODO_PORT"

What I was trying to do here was basically collapse one level of maybes, and I had no idea that =<< was the tool to help me do it. Linting can be controversial, but if you do decide to do it, the earlier the better (and maybe even use it as a git pre-commit of pre-push hook). Here’s what the .hlint.yaml file in my project root looked like:

- ignore: {name: "Unused LANGUAGE pragma"}

This is less than ideal (I could have made it better by specifying this for only a specific module/file) – but it’ll work.

Testing

OK, we’ve just written quite a bit of configuration parsing machinery – let’s write some tests to ensure that we don’t regress. Haskell code is most in danger when it’s dealing with the outside world – i.e. the moment when we interface with the unsafe outside world (like a local malformatted file) in order to bring it into our properly typed Haskell world. We’ve written a bunch of code that reads configuration files, and this certainly qualifies as a place where we need to double check that what Haskell is doing meets the specifications of what is expected, and that regressions are easily prevented.

I personally like to split my tests into unit, integration and e2e (“acceptance”) tests, where generally the “unit” is a function, integration level is components (we’ll talk more about this in future blog posts as we build out the app), and e2e is testing the whole thing (as in, running the generated executable). Before we get into writing some tests though, we need to sort out which testing framework we’re going to use. I personally really enjoy using hspec (you can read through Users’s Manual for hspec on Github Pages), so we can add that as a dependency on hspec for the testing in package.yaml, along with some targets for easy testing:

package.yaml:

tests:
  haskell-restish-todo-test:
    main:                Spec.hs
    source-dirs:         test
    ghc-options:
    - -threaded
    - -rtsopts
    - -with-rtsopts=-N
    dependencies:
    - haskell-restish-todo

  unit: # <---- this one is new
    main:                Spec.hs
    source-dirs:         test/Unit
    ghc-options:
    - -threaded
    - -rtsopts
    - -with-rtsopts=-N
    dependencies:
    - haskell-restish-todo
    - hspec

Along with the auto-discovery files required by hspec, I also like to add these tests in their own folders and name them accordingly, so let’s add some folder structure (I personally expect a unit tests for src/Config.hs to map to test/Unit/Config.hs).

$ tree test
test
├── Spec.hs
└── Unit
    ├── ConfigSpec.hs
    └── Spec.hs

2 directories, 3 files

test/Spec.hs:

{-# OPTIONS_GHC -F -pgmF hspec-discover #-}

test/Unit/Spec.hs:

{-# OPTIONS_GHC -F -pgmF hspec-discover #-}

test/Unit/Config.hs:

module App.ConfigSpec (spec) where

import Test.Hspec

main :: IO ()
main = hspec $ spec

spec :: Spec
spec = do
  describe "a fake test" $ do
         it "passes" $ do
           True `shouldBe` True

         it "fails" $ do
           False `shouldBe` True

So as you can see I have two Spec.hs files to enable auto-discovery at two levels (one @ test and another at test/Unit). We can run the unit test suite by calling stack test :unit from the command line, to make sure our dummy tests work:

$ stack test :unit
<a bunch of stack-related compilation output elided>
haskell-restish-todo-0.1.0.0: test (suite: unit)


App.Config
  a fake test
      passes
          fails FAILED [1]

Failures:

  test/Unit/ConfigSpec.hs:15:12:
    1) App.Config, a fake test, fails
           expected: True
                   but got: False

  To rerun use: --match "/App.Config/a fake test/fails/"

Randomized with seed 1900722340

Finished in 0.0006 seconds
2 examples, 1 failure

haskell-restish-todo-0.1.0.0: Test suite unit failed
Test suite failure for package haskell-restish-todo-0.1.0.0
    unit:  exited with: ExitFailure 1
    Logs printed to console

Great, looks like unit tests are working, let’s make a few that actually do some useful testing in ConfigSpec.hs (and require the Config module to enable it):

test/Unit/ConfigSpec.hs:

module ConfigSpec (spec) where

import Test.Hspec
import Config as C
import Data.Functor.Identity

main :: IO ()
main = hspec spec

completeAppDefault :: CompleteAppConfig
completeAppDefault = C.defaultValue

partialAppDefault :: PartialAppConfig
partialAppDefault = C.defaultValue

spec :: Spec
spec = do
  describe "defaults" $ do
         it "has localhost as the default host" $
            C.defaultHost `shouldBe` "localhost"

         it "has 5000 as the default port" $
            C.defaultPort `shouldBe` 5000

  describe "default values" $ do
         it "CompleteAppConfig has default host" $
           host completeAppDefault `shouldBe` Identity C.defaultHost
         it "CompleteAppConfig has default port" $
            port completeAppDefault `shouldBe` Identity C.defaultPort

-- ... more tests ...

This tests might not look important, but they force the developer to think twice for changes to values like defaultHost. If you’re changing a value like that, it’s very likely that external systems will be affected, which the types can’t really help you with. This is certainly not enough tests, but it’s enough for a taste as far as this article goes.

Incorporating the configuration parsing code into the application

So right now our main function doesn’t do much – it prints a static message (<SERVER START> if you’ve forgotten). Let’s have it do a little slightly more than nothing – we’ll generate our application configuration, then print out the message. While we’re here, let’s also add a command called show-config so that we can print out what the configuration the binary series and would use without much effort, as a sanity-checking utility:

{-# LANGUAGE RecordWildCards #-}

module Main where

import Config (AppConfig, Host, Port, ProcessEnvironment(..), makeAppConfig)
import Control.Monad (join)
import Data.Semigroup ((<>))
import Lib
import Options.Applicative
import System.Environment (getEnvironment)
import Text.Pretty.Simple (pPrint)

data Options = Options
    { cfgPath :: Maybe FilePath
    , cmd :: Command
    }

data Command = Serve
             | ShowConfig deriving (Eq)

-- | Parser for commands
parseCommands :: Parser Command
parseCommands = subparser commands
    where
      serverCmd :: ParserInfo Command
      serverCmd = info (pure Serve) (progDesc "Start the server")

      showConfigCmd :: ParserInfo Command
      showConfigCmd = info (pure ShowConfig) (progDesc "Show configuration")

      commands :: Mod CommandFields Command
      commands = command "server" serverCmd
                 <> command "show-config" showConfigCmd

-- | Parser for top level options
parseOptions :: Parser (Maybe FilePath)
parseOptions = optional
               $ strOption ( long "config"
                           <> short 'c'
                           <> metavar "FILENAME"
                           <> help "Configuration file (.json/.toml)" )

-- | Top level optparse-applicative parser for the entire CLI
parseCmdLine :: Parser Options
parseCmdLine = Options <$> parseOptions <*> parseCommands

-- | Helper function to access the environment and marshall it into our newtype
pullEnvironment :: IO ProcessEnvironment
pullEnvironment = ProcessEnvironment <$> getEnvironment

-- | IO action that shows the current loaded configuration
showConfig :: Options -> IO ()
showConfig Options{cfgPath=path} = pullEnvironment
                                   >>= makeAppConfig path
                                   >>= pPrint

-- | IO action that runs the server
runServer :: Options -> IO ()
runServer Options{cfgPath=path} = pullEnvironment
                                  >>= makeAppConfig path
                                  >> server

-- | Start up the server and serve requests
server :: IO ()
server = putStrLn "<SERVER START>"

main :: IO ()
main = parseOptions
       >>= process
    where
      cmdParser :: ParserInfo Options
      cmdParser = info parseCmdLine idm

      parseOptions :: IO Options
      parseOptions = execParser cmdParser

      process :: Options -> IO ()
      process opts = case cmd opts of
                       Serve -> runServer opts
                       ShowConfig -> showConfig opts

There’s a bunch more in the code now, gratuitious use of Applicative utilities and and of course use of optparse-applicative. While it may take time to clearly understand exactly how you might piece this code together, it should at the very least be readable, and fall out from following optparse-applicative’s getting started documentation (along with some research on how to use Applicative). Worst comes to worst, just skip optparse-applicative and reach for a simpler command line option parsing tool, or copy and modify the example above.Here’s the help message that gets printed out if we try to just run the binary:

$ stack exec haskell-restish-todo-exe
Missing: COMMAND

Usage: haskell-restish-todo-exe [-c|--config FILENAME] COMMAND

There’s alot to be improved upon here (basically passing more information to optparse-applicative so it can be more helpful), but for now it’s at least correctly specified. Let’s tray and give it the server subcommand without specifying a configuration file:

$ stack exec haskell-restish-todo-exe -- server
<SERVER START>

Great, that still works as we expect – <SERVER START> still gets printed, and while we can’t see the configuration with this command, it must be getting loaded without crashing since we’ve changed the code to do that. Let’s run the newly-added show-config command:

$ stack exec haskell-restish-todo-exe -- show-config
Right
    ( AppConfig
        { host = Identity "localhost"
        , port = Identity 5000
        }
    )

Great, the defaults we’ve set for the AppConfig Identity (AKA CompleteAppConfig) with the mempty implementation of the Monoid typeclass are showing up! Let’s try it again, but this time give a JSON file (saved in /tmp just like before):

$ stack exec haskell-restish-todo-exe -- -c /tmp/test-cfg.json show-config
Right
    ( AppConfig
        { host = Identity "somewhere-else"
        , port = Identity 3000
        }
    )

Awesome! Now let’s try it with TOML:

/tmp/test-cfg.toml:

host = "somewhere-else-in-toml"
host = 3001
$ stack exec haskell-restish-todo-exe -- -c /tmp/test-cfg.toml show-config
Right
    ( AppConfig
        { host = Identity "somewhere-else-toml"
        , port = Identity 3001
        }
    )

Just to make sure that ENV overrides are working as we’d expect, let’s add an ENV override in there:

$ TODO_PORT=3002 stack exec haskell-restish-todo-exe -- -c /tmp/test-cfg.toml show-config
Right
    ( AppConfig
        { host = Identity "somewhere-else-toml"
        , port = Identity 3002
        }
    )

We’ve done it! Relatively robust configuration parsing with good support for sections (thanks to TOML – I’d avoid writing your config in JSON), overridability from file

OPTIONAL: Extending and abstracting the build/test machinery with Make

I’m a big fan of GNU Make, probably because I’ve never seen it used in terrible ways before. I like make mostly because of how ubiquitous it is and how easy it is to grasp the basics – it’s perfect fo ruse across projects I work on, enabling me to provide a relativley uniform interface to the development side of things for others (including future me). Here’s a snippet from the Makefile I’m using for this project:

.PHONY: all build lint test setup \
        print-release-version check-tool-stack

all: build
test: test-unit test-int test-e2e
build: api

VERSION=$(shell awk '/version\:\s+([0-9\.]+)/{print $$2}' haskell-restish-todo.cabal)
MAKEFILE_DIR:=$(strip $(shell dirname $(realpath $(lastword $(MAKEFILE_LIST)))))

check-tool-stack:
ifeq (, $(STACK))
    $(error "`stack` doesn't seem to be installed (https://haskellstack.org)")
endif

print-release-version:
    @echo "${VERSION}"

setup:
    cp ./.dev-util/git/hooks/pre-push.sh .git/hooks/pre-push
    chmod +x .git/hooks/pre-push

lint:
    hlint src/ app/ test/

build: check-tool-stack
    stack build

test-unit: check-tool-stack
    stack test :unit

test-int: check-tool-stack
    stack test :int

test-e2e: check-tool-stack
    stack test :e2e

As you can see there are a couple tricks I’ve learned once and copy-pasted religiously over the years (Make and bash scripting syntax is kind of difficult to remember the exact ins and outs of).

OPTIONAL: Adding git hooks and CI integration to your project

As for git hooks I usually add a pre-push hook (I find pre-commit hooks are too cumbersome). I normally do this by making a folder called .dev or .dev-util and putting hooks in there under a folder called git. Then I make a Makefile target like setup that does nothing more than copying the files into the right places, so for git hooks that would mean a basic shell command like cp .dev/git/pre-push .git/hooks && chmod +x .git/hooks/pre-push is all I need to set up. Here’s what that file might look like:

.dev/git/hooks/pre-push:

#!/bin/env bash

# Attept to lint Haskell code
LINT_CMD="make lint"
eval $LINT_CMD
RESULT=$?
if [ $RESULT -ne 0 ]; then
    echo -e "LINT Failed!\n CMD: $LINT_CMD"
    exit 1
fi

# Run unit & integration tests
TEST_CMD="make test-unit test-int test-e2e"
eval $TEST_CMD
RESULT=$?
if [ $RESULT -ne 0 ]; then
    echo -e "Tests failed!\n CMD: $TEST_CMD"
    exit 1
fi

As you can see this makes heavy use of make, which I like because I can centralize all my logic in the Makefile – this means that whatever you deploy to will need to have make installed and available. As for remote CI, I use Gitlab and the CI services they provide for free (!!) on the free public repo (they also offer free private repos). Getting tests to run is very simple, but is a bit finnicky to get to run efficiently. I’ve written about it in the past, so feel free to look there for more concrete information.

UPDATE: 💪 🎩 intensifies - enterDataKinds and TypeFamilies

Thanks to some feedback in the r/haskell thread for this post there’s a great example submitted by u/chshersh of even more advanced type wizardry we can do here, particularly using the DataKinds and TypeFamilies extensions. If we want even more type wizardry we can folow the example set in Config.hs of the summoner codebase. Here’s an explanation of the code from u/chshersh:

Feel free to ask any questions! But I will try to explain general ideas here. Instead of parametrising over type variable f :: Type -> Type our Config is parametrised over p :: Phase where Phase is a simple enum:

data Phase = Partial | Final

It’s like a tag or just closed set of types. The DataKinds extension allows to lift constructors to type-level, so Final is a value that has Phase type but ‘Final is a type that has Phase type (or Phase kind if you prefer). Later we use :- type family to map those tags to types. So :- is a type family — function from types to types — in a form of operator (because operators look cooler). This function takes type of type Phase and arbitrary type and pattern matchers on Phase to decide what to return:

infixl 3 :-
type family phase :- field where
    'Partial :- field = Last field
    'Final   :- field = field

So in case or partial config we wrap selected fields into Last monoid (to be able to mappend configs easily) and return just field for Final configuration. I’m and an advocate of explicit type signatures, so this code can be slightly clearer if written like this:

type family (phase :: Phase) :- (field :: Type) :: Type where

Somehow the type signature for type family is missing… But this should be done and this can be a useful addition to the existing code!

Basically, this setup actually allows you to be very specific about how exactly the phases of configuration are layered on top of each other. Phase acts as a function at the type level – meaning that a Partial field turns into a Last field, where as a Final field turns into the field itself. The spirit of the code is similar to what’s written here, but is quite a bit more advanced. Take a look! You should also know that this particular brand of magic (DataKinds and TypeFamilies) is what powers the declarative routing approach of servant!

As always, don’t feel pressured to use features/extensions/methods that you’re not comfortable with but remember that one of the best parts of using a language like Haskell is how much you can get the type system to do the heavy lifting. Being comfortable with the advanced features of Haskell allows you to put more work in the hands of the type system, which means the less work you have to do! No one becomes an expert overnight but remember that if you never look into more advanced ways to use the type system to do your bidding, you might as well be writing in another language (well Haskell has other awesome benefits like it’s lightweight threading model, software transactional memory, but hopefully you get my gist).

Wrapup

OK, That’s it for this initial app setup! We haven’t really done much in terms of functionality/business logic but I think it’s important to do this kind of kind of engineering at least once. On this project I’m choosing to do it up front as well over using some boilerplate project or some other librarys for mostly instructive reasons (and because it’s fun to think about this stuff outside of a real project). As long as the time tradeoff isn’t too terrible I think it’s worth doing up front.

On a personal note I thought this article would be super quick to bang out, but just this one part took like ~1 week to write, never mind the fact that the overall idea had to be split into 4 parts. Hope someone out there who’s relatively new to haskell is finding it useful! Tune in next time for Part 2, where I share notes on thinking through the data model and other actually-functionality-related topics of the fake todo web service.

As always, I’m standing on the shoulders of giants for all the tools used in this talk – Haskell itself, servant, optparse-applicative, make, etc. Make sure to support open source software when you can – whether commits or currency.

Did you find this read beneficial? Send me questions/comments/clarifciations.
Want my expertise on your team/project? Send me interesting opportunities!