Years later, REST-ish Services in Haskell Part 4

Categories
Haskell logo + Servant logo

tl;dr - In this post I go through a bunch of improvments and extras to the code from part 3, 2 years after this initial series was started. The improvements range from development aids to post-deployment “observability” (metrics, logging, tracing) and how I’ve handled them in the past. It’s pretty ridiculous to finish a blog series 2 years later, so I also released the code for a Job Board API I wrote (also the frontend that manages it) for those who want to see how a CRUD-y production Haskell app looks.

Multi-part blog post alert

This is a multi part blog post with the following sections:

  1. Part 1: Initial project setup
  2. Part 2: Domain model design and application architecture
  3. Part 3: Building the API by putting it all together
  4. Part 4: Extras and other concerns (Observability, etc) (this post, 2+ years later)

Setting the stage

This is the (much delayed) final post in a 4-part series where we worked our way through various layers of DDD; first the domain models themselves, then components that manipulate them, using the mtl (Monad Transformer Library) approach and ReaderT pattern (as opposed to Free monads or simple monad stacking), culminating in a working API. One particularly interesting bit of this post will likely be the discussion on monad transformers, mtl, the ReaderT pattern, and the approach I use which I’ve dubbed “classy mtl readers”, with a look at the horizon, which to me is Free/FreeR monads.

Without much further ado let’s jump right into how we can improve the code and conventions we’ve used in parts 0-3.

Development Aid: Continuous typechecking/recompilation

REPLs are excellent for improving development speed, and while stack ghci works great for me in terms of simple recompiling (repeatedly typing :r<enter>), a great way to improve this experience is to employ a continuous typechecker/recompiler. Yes, you could wire this up yourself with entr (which is an excellent project btw), but luckily Haskell has some native projects to do it for you as well. I recently watched a fantastic talk on haskell Terminal UIs by the folks over at FPComplete on Youtube which brought this up and explained it very well.

Simply using stack’s --file-watch flag is enough to get some basic functionality working:

$ stack build --pedantic --fast --file-watch

Here’s an example straight from the stack documentation:

stack build --pedantic --haddock --test --exec "echo Yay, it succeeded" --file-watch

Another option which is slightly different is ghcid, thanks to /u/Alexbrainbox on reddit for pointing this out to me in the comments surrounding Part 3 of this blog series. It’s got some great documentation as far as I can see, with a really clear description of what it is on the front:

Either “GHCi as a daemon” or “GHC + a bit of an IDE”. To a first approximation, it opens ghci and runs :reload whenever your source code changes, formatting the output to fit a fixed height console. Unlike other Haskell development tools, ghcid is intended to be incredibly simple. In particular, it doesn’t integrate with any editors, doesn’t depend on GHC the library and doesn’t start web servers.

Development Aid: hfmt for consistent styling

Linting is a developer aid I’ve come to value more as time has progressed, and hfmt seems to be the go-to tool for that in Haskell land. To use hfmt in my project I needed to add the following extra dependency in stack.yaml:

# ... other YAML ...

extra-deps:
- hindent-5.2.7

# ... other YAML ...

After that everything went just as the documentation states:

$ stack install hfmt
# ... lots of build output ...
$ hfmt
# .. big patch outlining all the changes to various files ..
$ hfmt -w

The last command, hfmt -w, runs hfmt and (over)writes the relevant files. I’ll leave it up to you to integrate this with your own environment (whether setting up hfmt to run on every file edit or every file save, or whenever else).

In the vein of hfmt there are other tools that help at various levels, in particular:

And of course it goes without saying, hlint is an excellent tool that you should be using, as I’ve mentioned in part 1 and part 2. I personally derive much more value from hlint than hfmt (mostly because I don’t like some of the choices that hfmt makes).

NOTE - I foudn myself going back and re-formatting a bunch of things that hfmt did… The idea is good (standardizing formatting), and I’m sure hfmt has lots of ways to customize but I think for now I’m going to just keep styling somewhat manually.

Performance: Strictness

The strictness annotation, !, turns on strict evaluation in Haskell. One common practice that I haven’t done on the codebase so far is to make sure that all data declarations have as many fields as is reasonable set to strict evaluation. An SO post on this has some pretty good advice, stating:

A common rule of thumb is to make data structures strict when:

  • You expect to strictly traverse the whole thing and retain it, so there’s no point in the overhead of laziness; or
  • The fields are “small”—less than or equal to the size of a pointer—and you’d like the compiler to unbox them to avoid unnecessary indirections. (-funbox-small-strict-fields is on by default since GHC 7.7.)

Strictness also recently came up in a thread on reddit, and as /u/ElvishJerricco noted in a comment there:

Frankly it requires knowledge of runtime semantics for it to be clear when strictness is a performance advantage. Laziness and strictness are, after all, mostly just optimizations in terminating computations. I might even avoid the concept of laziness until the students are at a point that runtime semantics are relevant to them.

While I don’t claim to understand enough about the Haskell runtimes, thunks, and evaluation to know every instance that you should use strict – most of the time I’m working with some data object, I can get behind having those computations by default be strictly evaluated (as in, the change to the object is made immediately).

Client Ergonomics: Better JSON output

We can make a bunch of improvements on our current code by straightening out our ToJSON and FromJSON instances (from aeson). This was inspired by a recent talk by Joe Kachmar called “Production Haskell Demystified. There are tons of great tips in that talk but the one I want to highlight here is slightly easier-to-use generated JSON for our models. For example with Task:

data Task = Task { tName  :: TaskName
                 , tDesc  :: TaskDesc
                 , tState :: TaskStateValue
                 } deriving (Eq, Show, Read, Generic)

instance ToJSON Task

The default ToJSON instance here will produce objects like this:

{
  "tName": "<name>",
  "tDesc": "<description>",
  "tState": "Finished | InProgress | NotStarted"
}

This is OK, but the prefixing to avoid Haskell’s issue with name clashes for record fields is a bit of an unfortunate side effect for any frontends that might be interacting with the API. Let’s fix this by slightly modifying the default ToJSON instance:

import Data.Aeson(FromJSON (..), ToJSON (..),
                  Value (..), object, (.=), genericToJSON, defaultOptions, Options(..))

-- ... other code ...

instance ToJSON Task where
    toJSON = genericToJSON Data.Aeson.defaultOptions { fieldLabelModifier = (drop 1) }

There are lots of great tips in the talk (for example testing the JSON instances using Arbitrary auto-generated examples), so I would highly recommend watching it.

Client Ergonomics: Generating Swagger/OpenAPI documentation

One of the best things to come out of the modern web development sphere has been an attempt to standardize on how we describe REST-ful APIs. While every API is REST-ful to different degrees (leaving aside the question of whether REST is the right paradigm at all), closely adhering to at least some of the tenets of REST across the board has enabled people to explore creating machinery to document and automate creating APIs – enter Swagger/OpenAPI, RAML and API Blueprint. Thanks to the work of some Haskellers, Haskell hasn’t been left behind – the servant-swagger offers Swagger 2.0 (AKA OpenAPI v1) support (one step back from the most recent which is is OpenAPI v3).

servant-swagger uses a annotation-through-typeclass-definition approach, along with exposing some functionality to make use of the annotations to generate/produce documentation as necessary. How I’ve found myself using it is writing the appropriate instances, and then writing some additional command line subcommand to handle generating swagger and writing it to a file. Here’s what that looks like for the codebase we have so far:

Types.hs:

-- These imports were required for the ToSchema instance for (WithUUID a)
import           Data.Swagger (ToSchema(..), schemaName, NamedSchema(..), Schema(..), toSchema, Referenced(..))
import           Data.Proxy (Proxy(..))
import qualified Data.HashMap.Strict.InsOrd as DHMSIO

-- Easy ToSchema instances for the simpler data types
instance ToSchema Task
instance ToSchema (Complete TaskF)
instance ToSchema (Partial TaskF)

-- Manual ToSchema instance for the (WithUUID a) compound type
-- This needed to be manual since the "uuid" property needs to be injected
-- Why does "uuid" property need to be injected? Since that's how the custom ToJSON instance for (WithUUID a) works
-- The objects that ToJSON (WithUUID a) instance generates have the shape {"uuid": "...", "fielda": "...", ...}
instance ToSchema a => ToSchema (WithUUID a) where
    declareNamedSchema _ = updated
        where
          inner = declareNamedSchema (Proxy :: Proxy a)
          -- Inject the "uuid" property into the named schema for the existential a
          uuidKey = "uuid" -- see ToJSON instance for (WithUUID a)
          uuidReferencedSchema = Inline $ toSchema (Proxy :: Proxy UUID)
          injectUuidProperty = DHMSIO.insert uuidKey uuidReferencedSchema

          -- Modify and rebuild a NamedSchema to inject a "uuid" property
          modifyProps ns@NamedSchema{..} =
              ns { _namedSchemaSchema =
                       _namedSchemaSchema {
                              _schemaProperties = injectUuidProperty $ _schemaProperties _namedSchemaSchema
                            }

                 }
          updated = modifyProps <$> inner

The ToSchema instance is pretty hairy, but would probably look a lot less hairy if I was using lens. Since I’m not it looks bad, but it should still be fairly legible. As far as the actual changes to add the subcommand that generates the Swagger JSON, changes were required in Main.hs as shown below:

-- swaggerOutputPath was added to the command line options object
data Options = Options
  { cfgPath           :: Maybe FilePath
  , cmd               :: Command
  , swaggerOutputPath :: Maybe FilePath -- <--- new
  }

data Command
  = Serve
  | ShowConfig
  | SwaggerGenerate -- <--- new
  deriving (Show, Eq, Read)

-- ... other code ...

-- | Parser for commands
parseCommands :: Parser Command
parseCommands = subparser commands
  where
    serverCmd :: ParserInfo Command
    serverCmd = info (pure Serve) (progDesc "Start the server")

    showConfigCmd :: ParserInfo Command
    showConfigCmd = info (pure ShowConfig) (progDesc "Show configuration")

    swaggerGenerateCmd :: ParserInfo Command
    swaggerGenerateCmd = info (pure SwaggerGenerate) (progDesc "Generate swagger JSON")

    commands :: Mod CommandFields Command
    commands = command "server" serverCmd <>
               command "show-config" showConfigCmd <>
               command "swagger-generate" swaggerGenerateCmd

-- ... other code ...

-- | Top level optparse-applicative parser for the entire CLI
parseCmdLine :: Parser Options
parseCmdLine = Options <$> parseOptions <*> parseCommands <*> parseOptions

-- ... other code ...

swaggerGenerate ::  Options -> IO ()
swaggerGenerate Options{..} = when (isNothing swaggerOutputPath) printPathWarning
                              >> pure (toSwagger (Proxy :: Proxy TodoAPI))
                              >>= pure . encode
                              >>= BSL.writeFile path
                              >> printSuccessMessage
    where
      printPathWarning = putStrLn "[WARN] no swagger output path specified (--swaggerOutputPath)"
      printSuccessMessage = putStrLn $ "[INFO] Generated Swagger JSON @ [" <> path <> "])"
      path = maybe "swagger.json" id swaggerOutputPath

-- ... other code ...

main :: IO ()
main = parseOptions >>= process
  where
    cmdParser :: ParserInfo Options
    cmdParser = info parseCmdLine idm

    parseOptions :: IO Options
    parseOptions = execParser cmdParser

    process :: Options -> IO ()
    process opts =
      case cmd opts of
        Serve           -> runServer opts
        ShowConfig      -> showConfig opts
        SwaggerGenerate -> swaggerGenerate opts

And to run the new subcommand:

$ stack exec haskell-restish-todo-exe swagger-generate

Unfortunately, making these changes forced me to undo the earlier changes to the ToJSON instance for Task (where we stripped the t prefix from tName) – the generated swagger.json seemed to be ignoring that output:

"definitions": {
  "Task": {
    "required": [
      "tName",
      "tDesc",
      "tState"
    ],
    "properties": {
      "tName": {
        "type": "string"
      },
      "tDesc": {
        "type": "string"
      },
      "tState": {
        "type": "string"
      }
    },
    "type": "object"
  },

For that reason I reverted the changes – no point in having nice looking output if the Swagger is wrong – and I didn’t want to spend too much time looking into whether servant-swagger (and swagger2 that it depends on) had the appropriate customization points. Of course there are other ways to solve this problem (there are other solutions to the record naming issue), but looks like the Haskell records issue strikes again!

ASIDE: Lens

While I don’t actually retrofit all the code to use lens I did find a fantastic introductory talk on Lens by Ben Kolera, that I thought would be worth sharing.

Better Code: Custom Preludes

It’s a bit late to be considering this now, but another way we could have improved the code we wrote was by using a custom prelude – a group of functions to use and overwrite some of the partial functions that exist as defaults. The great @sdiehl has some good writing on the subject if you’d like to read more.

An easy (and oft-used) example of how this might make our code safer is the simple function head – if you try and run head [], that value will be undefined (AKA “bottom”), and will crash your program when used, but a safer prelude would specify a total function for head, with a type signature like head :: [a] -> Maybe a.

I’m going to just swap the default prelude out in favor of protolude, and as the documentation indicates, all it takes is adding the NoImplicitPrelude default extension (in package.yaml, since I’m using stack with hpack) and importing Protolude. There isn’t a ton of benefit here, since I wasn’t relying too heavily on unsafe functions in the prelude, but one thing using Protolude gave out of the box was a warning for use of undefined in the code, which is nice.

There were tons of changes (way too many to list here), but I did manage to fit all the changes into one commit so check it out below:

Better Code: Abstracting the monad in our EntityStore to avoid liftIO (💪 🎩)

You might have noticed all the liftIO calls and type casting littering the code, for example:

getTodoByUUID :: UUID -> AppHandler (WithUUID Task)
getTodoByUUID uuid =
  ask >>= \(AppState _ estore) ->
    liftIO
      (getByUUID estore uuid :: IO (Either EntityStoreError (WithUUID Task))) >>=
    rightOrServantErr genericServerError

Since EntityStore is written for the IO monad, we are forced to use liftIO to “lift” the IO computation into our own monadic context AppHandler (WithUUID Task). While this isn’t the end of the world, we can get rid of these pesky liftIOs with (you guessed it) more abstraction! We’re going to write EntityStore so that it’s less picky about the monad that it’s being run in. Being less specific about which monad we run in can enable lots of different usecases (we could run in some MockLogContext that just logs operations for example), but we’re going to be using it just to get rid of the liftIOs littering our handler code. We can get this advanced functionality by abstracting the monad we’ve been using in our operations to a polymorphic m! Let’s put on our strong typing tophats (💪 🎩) and figure it out – basically replace everywhere you see IO with m:

-- | Generalized typeclass for entity storage.
class Monad m => SQLEntityStore store m
    -- | Create an entity
  where
  create ::
       forall entity.
       ( SQLInsertable entity
       , SQLInsertable (WithUUID entity)
       , FromRow (WithUUID entity)
       )
    => store
    -> Validated entity
    -> m (Either EntityStoreError (WithUUID entity))
    -- | Get an entity by ID
  getByUUID ::
       forall entity. (SQLInsertable entity, FromRow (WithUUID entity))
    => store
    -> UUID
    -> m (Either EntityStoreError (WithUUID entity))
    -- | Update an existing entity by ID
  updateByUUID ::
       forall (entity :: FBounded).
       ( SQLInsertable (Complete entity)
       , SQLUpdatable (Partial entity)
       , FromRow (Complete entity)
       )
    => store
    -> UUID
    -> Validated (Partial entity)
    -> m (Either EntityStoreError (WithUUID (Complete entity)))
    -- | Delete an entity by ID
  deleteByUUID ::
       forall entity.
       (SQLInsertable entity, SQLDeletable entity, FromRow entity)
    => store
    -> UUID
    -> m (Either EntityStoreError entity)
    -- | Get a listing of all entities
  list ::
       forall entity. (SQLInsertable entity, FromRow entity)
    => store
    -> m (Either EntityStoreError [entity])

We do have to get specific at some point about which type we’re using, and one place to do that is our AppState:

data AppState = forall estore m.
    ( Monad m
    , SQLEntityStore estore IO
    ) => AppState { appConfig   :: !(Complete AppConfig)
                  , entityStore :: estore
                  }

So what have we gained here? It’s kind of hard to see at this stage, we’ve gotten a bit closer to separating what the code is doing from what it needs to do it’s job. By abstracting to m, we’ve opened up the world to any monad (even if it’s likely a monad stack with IO at the bottom) to run this code, as long as it can do the things we want it to do. This pattern isn’t suuuper immediately useful here, but it’s going to be well worth the effort in the future.

Better Code: Constraining our monad polymorphism with typeclasses (mtl + typeclass pattern) (💪 🎩)

Abstracting the m is good, but it also raises another problem – we want to be OK with any m, but we need to know that the m provides access to a SQLiteEntityStore and other things! All our operations that work with the store require it. There’s a bunch of great talks about this approach that contrast the typeclass approach with the usual monad unstacking and unwrapping:

The second talk comes before the first time-wise but George’s talk cuts more to the meat of what we’re doing here. Long story short, the approach we can see from these two videos is to actually constrain the abstracted monads with good ‘ol Constraints! We can ensure that the monadic context m that we run our handler in has what we need by just creating a typeclass that ensures it, and applying that Constraint to m. Let’s do this for our handlers & custom context:

-- | List all companies
allCompanies :: ( HasDBBackend m db
                , HasCacheBackend m c
                , MonadError ServerError m
                , CompanyStore db m
                ) => SessionInfo -> Maybe Limit -> Maybe Offset -> m (EnvelopedResponse (PaginatedList (ModelWithID Company)))
allCompanies sessionInfo limit offset = requireRole Administrator sessionInfo
                                        >> getDBBackend
                                        >>= \db -> getCacheBackend
                                        >>= \cache -> lookupOrComputeAndSaveEither
                                                      (getCompanyListing cacheKey cache)    -- lookup
                                                      (getAllCompanies db limit offset)     -- compute
                                                      (insertCompanyListing cacheKey cache) -- save
                                        >>= ifLeftEnvelopeAndThrow Err.failedToRetrieveResource
                                        >>= pure . EnvelopedResponse "success" "Successfully retrieved companies"
    where
      cacheKey = CompanyListing limit offset

We’ve gained a bit of flexibility here – if we change our AppHandler monadic context later to some BetterAppHandler, all we have to do is make sure it has the right instances and our handler code will work untouched! At the meta level, we’ve achieved a goal we didn’t even know we had we can start expressing the requirements of handlers in terms of the functionality they need. This expressiveness also becomes a limiting factor (in a good way) – an m only constrained by HasDB has no idea about other functionality that might be present in the monad, and thus can’t use it – it’s our old friend information hiding – if m can send emails, your handler can’t know about it unless you constrain m in the appropriate way, which makes your intent very very clear.

Another good example I’ve come across is a github gist by user ocharles.

To sum all of this up, the approach we’re using here I’ll call the “classy mtl readers” approach, it’s a combination of:

  • typeclasses on abstracted monads for requiring functionality/limiting scope of the abstracted monad
  • mtl with classes as illustrated by ocharles in the gist for combining functionality
  • a flattening of the monad transformer stack to a single ReaderT in the fp complete article

While the results of this approach are accomplishable by lesser paradigms (like just using IO everywhere), the safety/information-hiding it offers is very nice. It’s beautifully abstracted and type-reinforced (💪 🎩) way to write programs.

Future Code: Free Monads and Freer Monads

While I’m pretty happy with the classy mtl readers approach, I’d be remiss if I didn’t tell you about what might be next – people have actually long since been working on this, but there’s something on the horizon that might offer even better semantics, abstraction and type protection: Free/FreeR monads and applicatives.

The link above ( to the gist by @ocharles) is actually a response to some articles about Free Monads:

Free monads interest me more and more these days as I ponder the benefits of expressing what a program (or service) can do as an operational algebra – so a EntityStore can CreateEntity, ReadEntity, etc. After a while I came upon another great blog post, an opinionated guide to haskell in 2018, which laid out the Free/mtl/FreeR dichotomy nicely, and introduced FreeR in a really approachable way to me.

Unfortunately, there are some issues with Free/FreeR namely:

  • You seem to need to choose between Free Monads (for sequential computations) and Free Applicative (for easy parallelism)
  • FreeR is harder than mtl for the compiler to optimize, Free is even worse (supposedly FreeR is fast enough for this inefficiency to not matter)
  • Free applicatives can have optimizable interpreters and Free Monads can’t really.

I think FreeR Monads/Applicatives are incrementally better if not for the abstraction and forced-design they provide, and I want to start moving to them, but we’ll leave that as a topic for another time – maybe I’ll write a little app in both approaches and benchmark it locally in another post – it should be possible to write the actual functions that do the work (monadic functions in the mtl case, executors in the FreeR case) in a completely separate module to make it a fair comparison.

Of course, if you read DeGoes’s second article, he actually covers this by introducing the “Free Transformers” – this moves the computation into the actual data type, but seems to solve basically all the problems with Free. As for the Applicative/Monad choice requirement, it looks like Free Applicative can just be used instead at all points – he also gave a great talk on the subject.

I posted about this on Reddit, and while I basically answered my questions myself after more reading/watching it was good to ask others about this. Thanks to the help of the folks in r/haskell (Edward Kmett even responded!), I understand the limitations a little better now – free applicatives are where I want to be.

Better free monads: polysemy, fused-effects & friends

Back when this post series was first being written, there was a massive penalty for using the Free/R Monads – that drawback has mostly disappeared these days, with the maturation of some awesome community libraries:

How do we know the performance penalty is gone? Some brave souls have done the benchmarking! – you can peruse the criterion directly as well. The best performer on the graph is fused-effects (as of the writing of this post), almost mirroring MTL speed. The tradeoff with using fused-effects is the increased boilerplate, but at this point it’s looking like it’s worth it!

General Robustness

Here are some general things I think are nice to add for robustness.

Response Envelopes

Let’s improve our error catching and error returning semantics, define some envelopes for the data we’re sending back.

data EnvelopedResponse a = EnvelopedResponse
    { status :: String
    , message :: String
    , respData :: a
    } deriving (Generic, Eq, Show)

NOTE Using String is not great here, but this is how the code sits so I’m leaving it in – Text and strict Bytestrings may be better choices.

Pagination

Good APIs have good pagination. While I’m not going to do any fancy cursor-based pagination or something like that, here’s how I handle the usual offset+limit style pagination.

-- For the request side
data PaginationOptions = PaginationOptions { limit   :: Maybe Limit
                                           , offset  :: Maybe Offset
                                           , afterId :: Maybe ID
                                           }

-- For the data modeling side
data PaginatedList a = PaginatedList
    { items :: [a]
    , total :: Int
    } deriving (Generic, Eq, Show)

makePaginatedList :: [a] -> PaginatedList a
makePaginatedList as = PaginatedList as (length as)

instance ToSchema a => ToSchema (PaginatedList a)

instance Functor PaginatedList where
    fmap f list@(PaginatedList oldItems _) = list { items=fmap f oldItems }

instance Foldable PaginatedList where
    foldMap f (PaginatedList oldItems _) = foldMap f oldItems

There’s a smattering of classes here, but hopefully they’re all pretty straight forward, there are a few

Error handling/reporting

Server-side errors should never make it to users – all around the code I actually use rightOrThrow which just throws an error rather than surfacing a proper error that could be shown to users.

throwServantErrorIfLeft :: (MonadError ServerError m, Exception e) => Either e a -> m a
throwServantErrorIfLeft (Left err) = throwError $ Err.makeGenericError (show err)
throwServantErrorIfLeft (Right v)  = return v

makeGenericError :: String -> Servant.ServerError
makeGenericError s = Servant.err500 { Servant.errBody=errStr  }
    where
      errStr = BSU.fromString $ "Server error occurred: " ++ s

ensureNotNothing :: (MonadError e m, Monad m) => Maybe a -> e -> m a
ensureNotNothing Nothing e = throwError e
ensureNotNothing (Just x) _ = return x

ifNothingThrowIOError :: (MonadError e m, Monad m) => e -> Maybe a -> m a
ifNothingThrowIOError = flip ensureNotNothing

ifNothingThrowError :: (Exception e, Monad m) => e -> Maybe a -> m a
ifNothingThrowError e Nothing = throw e
ifNothingThrowError e (Just v) = pure v

whenNothing :: (Monad m) => m a -> Maybe a -> m a
whenNothing action Nothing  = action
whenNothing action (Just x) = return x

whenLeft :: (Monad m, Exception e) => (e -> m a) -> Either e a -> m a
whenLeft action (Left err)  = action err
whenLeft action (Right v) = return v

throwServantErrorIf :: (MonadError e m) => Bool -> e -> m ()
throwServantErrorIf True err = throwError err
throwServantErrorIf False  _ = return ()

ifLeftConvertAndThrow :: (MonadError ServerError m) => Either SomeException r -> m r
ifLeftConvertAndThrow (Left err) = throwError . Err.enveloped . Err.convertToServantError $ err
ifLeftConvertAndThrow (Right v) = pure v

servantConvertErr :: Exception e => e -> ServerError
servantConvertErr = Err.enveloped . Err.convertToServantError

ifNothingEnvelopeAndThrow :: (MonadError ServerError m) => ServerError -> Maybe r -> m r
ifNothingEnvelopeAndThrow e Nothing = throwError (Err.enveloped e)
ifNothingEnvelopeAndThrow e (Just v) = pure v

ifLeftEnvelopeAndThrow :: (MonadError ServerError m, Exception e) => ServerError -> Either e r -> m r
ifLeftEnvelopeAndThrow e (Left err) = throwError (Err.enveloped e)
ifLeftEnvelopeAndThrow e (Right v)  = pure v

throwIf :: (Exception e, Monad m) => Bool -> e -> v -> m v
throwIf True   err _ = throw err
throwIf False  _   v = pure v

throwErrorIf :: Monad m => Bool -> String -> m ()
throwErrorIf True err = error err
throwErrorIf False  _ = return ()

throwIfLeft :: (Exception l, Monad m) => Either l r -> m r
throwIfLeft (Left err) = throw err
throwIfLeft (Right v) = pure v

ifLeftThrow :: (Exception e, Monad m) => e -> Either l r -> m r
ifLeftThrow e (Left _) = throw e
ifLeftThrow _ (Right v) = pure v

That is a lot of utility functions! Let’s get a bit more conrete by showing the use in a real use case:

{-# LANGUAGE DataKinds         #-}
{-# LANGUAGE TypeOperators     #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE FlexibleContexts  #-}

module API.V1.Addresses ( Routes
                        , server
                        ) where

import           Control.Monad.Except (MonadError)
import           Control.Monad.State (liftIO)
import           Servant
import           Types
import           Util (ifNothingThrowIOError)
import qualified Errors as Err

-- ^ User information retrieval endpoints
type Routes = "addresses" :> Capture "addressId" AddressID :> Get '[JSON] (EnvelopedResponse (ModelWithID Address))

server :: (HasDBBackend m db, MonadError ServerError m) => ServerT Routes m
server = findAddressByID

findAddressByID :: (HasDBBackend m db, MonadError ServerError m) => AddressID -> m (EnvelopedResponse (ModelWithID Address))
findAddressByID cid = getDBBackend
                      >>= liftIO . flip getAddressByID cid
                      >>= ifNothingThrowIOError Err.failedToRetrieveResource
                      >>= pure . EnvelopedResponse "success" "Successfully retrieved address"

IDEA: Should Config actually be a component?

Well here’s an interesting idea that occurred to me as I wrote this – should configuration actually be a component in and of itself? All the really slick systems these days (and the well-built system of yore) can deal with changing configuration on the go, and it seems very easily within reach with the way we’ve written everything so far. Haskell has extremely ergonomic concurrency support, and state-of-the-art coordination primitives (whether message passing is your thing or shared memory), maybe I could whip up such a system really quick by moving configuration from a static built-once-at-startup thing to the purview of a component?

I was originally going to leave this out, because it seems too far off the beaten path, and then I saw Control.Concurrent.Chan just sitting there, and I couldn’t help myself. I’m not going to do anything crazy and implement an app-wide event bus or anything, just 1-to-N component communication, in particular, a way for distributing configurations! Wouldn’t a typeclass for components that could be persuaded to update their own configurations and re-initialize? What about the chaining mechanism? How does a theoretical ConfigComponent let other components know that it has a new configuration available for them to see, and for those components to reinitialize?

Security: AuthN

NOTE The techniques in this section are 2+ years old. Please check servant and other libraries for better solutions, or easier to use abstractions than what I’ve laid out below

OK, before we mark our dinky little service as done, let’s handle one of the most important cross-cutting concerns – Authentication (AuthN) and Authorization (AuthZ). This bit can be a little difficult to tease out, but let’s do it. Here’s a scheme that I developed for working with Auth in servant:


{-# LANGUAGE DataKinds             #-}
{-# LANGUAGE TypeOperators         #-}
{-# LANGUAGE OverloadedStrings     #-}
{-# LANGUAGE TypeFamilies          #-}
{-# LANGUAGE FlexibleInstances     #-}
{-# LANGUAGE FlexibleContexts      #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE ScopedTypeVariables   #-}

module AuthN
    ( genAuthServerContext
    , getRoleFromSessionInfo
    , getUserIDFromSessionInfo
    , authCookieName
    , sessionUserID
    , sessionUserRole
    , FullRequest
    , AuthHandler
    , CookieAuth
    ) where

import           Data.Maybe (isNothing, fromJust, fromMaybe)
import           Control.Monad.State (when, liftIO)
import           Data.Aeson (decode)
import           Data.ByteString (ByteString)
import           Data.Typeable (Typeable)
import           Network.HTTP.Types.Header (hCookie)
import           Network.Wai (Request, vault, requestHeaders)
import           Network.Wai.Session (Session)
import           Servant
import           Servant.Server.Experimental.Auth (AuthHandler, mkAuthHandler, AuthServerData)
import           Servant.Server.Internal.Delayed (passToServer)
import           System.Log.Logger (warningM)
import           Types
import           Util (ifNothingThrowIOError, throwErrorIf)
import           Web.ClientSession (Key, decrypt)
import           Web.Cookie (parseCookies)
import qualified Data.ByteString.Lazy.Char8 as BSL8
import qualified Data.Vault.Lazy as V
import qualified Errors as Err
import qualified Data.DateTime as DT

 -- For Servant generalized auth
type instance AuthServerData (AuthProtect "cookie-auth") = SessionInfo

type CookieAuth = AuthProtect "cookie-auth"

sessionUserID :: SessionInfo -> UserID
sessionUserID = userID . sessionUserInfo

sessionUserRole :: SessionInfo -> Role
sessionUserRole = role . sessionUserInfo

authCookieName :: ByteString
authCookieName = "your-app-here"

-- Generate a cookie auth handler, given a session store cookie
genCookieAuthHandler :: Key -> AuthHandler Request SessionInfo
genCookieAuthHandler serverKey = mkAuthHandler handler
    where
      handler req = liftIO DT.getCurrentTime
                    >>= \now -> getAuthCookieValue req
                    >>= ifNothingThrowIOError Err.notLoggedIn
                    >>= decryptCookie serverKey
                    >>= ifNothingThrowIOError Err.invalidAuthHeaders
                    -- Throw error if is cookie has expired
                    >>= \c -> if now > expires c then throwError Err.invalidSession else pure c

getAuthCookieValue :: Request -> Handler (Maybe ByteString)
getAuthCookieValue = return . maybe Nothing (lookup authCookieName) . (parseCookies<$>) . lookup hCookie .  requestHeaders

decryptCookie :: Key -> ByteString -> Handler (Maybe SessionInfo)
decryptCookie key str = return . fromMaybe Nothing $ decode . BSL8.fromStrict <$> decrypt key str

-- ^ Generate an context that contains the necessary auth handler, given a key to use for client side cookie encryption
genAuthServerContext :: Key -> Context '[AuthHandler Request SessionInfo]
genAuthServerContext = (:. EmptyContext) . genCookieAuthHandler

getRoleFromSessionInfo :: SessionInfo -> Role
getRoleFromSessionInfo = role . sessionUserInfo

getUserIDFromSessionInfo :: SessionInfo -> UserID
getUserIDFromSessionInfo = userID . sessionUserInfo

data FullRequest deriving Typeable

instance HasServer api context => HasServer (FullRequest :> api) context where
  type ServerT (FullRequest :> api) m = Request -> ServerT api m

  hoistServerWithContext _ pc nt s = hoistServerWithContext (Proxy :: Proxy api) pc nt . s

  route Proxy context subserver =
    route (Proxy :: Proxy api) context (passToServer subserver getRequest)

    where getRequest req = req

While I use this scheme, it’s important to note that servant-auth now exists (thanks to the hard work of many contributors to Servant, I’ve personally been helped many times by @jkarni and @phadej) – this is what you’re going to want to use for your auth, instead of cobbling it together like I did. Please do not take this code to production!

Security: AuthZ

And on the side of AuthoriZation (“AuthZ”), I’ve got code that looks like this:

{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE ScopedTypeVariables #-}

module AuthZ ( ensureUserIsAdminOrCompanyRep
             , representsCompanyWithID
             , requireRole
             ) where

import           Control.Monad.Except (MonadError, when)
import           Control.Monad.State (liftIO)
import           Control.Monad.Catch (handle)
import           Data.Maybe (fromJust)
import           Servant (throwError)
import           Servant.Server (Handler, ServerError)

import           AuthN (sessionUserID, sessionUserRole)
import           Util (throwServantErrorIfLeft, throwServantErrorIf, servantConvertErr)
import           Types ( UserID
                       , CompanyID
                       , SessionInfo
                       , Role(..)
                       , ModelWithID(..)
                       , HasDBBackend(..)
                       , PaginatedList(..)
                       , DatabaseBackend(..)
                       , CompanyStore(..)
                       , EntityStore(..)
                       )
import qualified Errors as Err

ensureUserIsAdminOrCompanyRep :: ( HasDBBackend m db, MonadError ServerError m, CompanyStore db m )
                                 => CompanyID -> SessionInfo -> m ()
ensureUserIsAdminOrCompanyRep cid sessionInfo = userId `representsCompanyWithID` cid
                                                >>= \isRep -> throwServantErrorIf (isNotAdmin && not isRep) Err.unauthorized
    where
      userId = sessionUserID sessionInfo
      isNotAdmin = (/= Administrator) $ sessionUserRole sessionInfo

representsCompanyWithID :: ( HasDBBackend m db, MonadError ServerError m, CompanyStore db m )
                           => UserID -> CompanyID -> m Bool
representsCompanyWithID uid cid = getDBBackend
                                  >>= flip getCompaniesRepresentedByUserWithID uid
                                  >>= throwServantErrorIfLeft
                                  >>= pure . (cid `elem`) . (mId <$>) . items

requireRole :: MonadError ServerError m => Role -> SessionInfo -> m ()
requireRole r sessionInfo = throwServantErrorIf (sessionUserRole sessionInfo /= r) Err.unauthorized

An aside on web security – as far as I know the most secure, generalizable security configuration seems to be JWTs-as-tokens in cookies, allowing mobile clients to maybe take the API token out of the cookie and use it as a header or whatever else. As far as what to actually store, an opaque identifier is perfectly fine, but JWT confers some advantages/downside:

  • advantage: if instances share JWT keys, they can read each other’s keys and skip a DB check
  • advantage: JWTs are more of a generic concept, understandable across services, future standards will probably be built on it
  • disadvantage/even point: You’ll need to keep a blacklist of JWTs, which is roughly equivalent to the complexity of server-side session tokens, plus the complexity of the hashing and everything else that comes with JWTs.

The real heavy lifting here is being done by HTTPOnly and Secure settings on cookies, so as long as you don’t do anything dumb like pass things that should be secret in a way that is accessible to the browser (in the case of an XSS attack) then you should be roughly OK. Anyway, read the OWASP top 10.

Better Code: Testing

I am by no means a testing guru – I only find myself writing really good tests when I invest a lot of time up front in making testing easy for myself, or finding a perfect fit tooling wise. I find the most value for me comes from E2E tests, and that’s where I spend the most time – this is even more true in Haskell as the old adage (exaggeration really) goes: “if it compiles, it works”.

Outside of E2E testing, I also prioritize regression type testing – problems solved in production should almost always cause commits of tests that test for and ensure the problem never happens again. Engineering perfection means never solving the same problem twice. While many have really touted the effectivenss of property based testing with QuickCheck, I find it hard to value finding what can often be very obtuse/pathological failure cases – I’m often much more interested in ensuring that a certain happy path through the application is protected, and that’s where I spend my time.

I also very much dislike mocking and am much more comfortable with spinning up temporary resources locally before tests and/or suites of tests, going so far as to spin up a local observable instance of a dependency rather than trying to “stub out” calls or replace components with mocks. In my mind, mocking is a near-pointless endeavor – it requires knowledge of shifting internals and takes you further and further away from production (or even testing/staging environments) with every piece of clever code you write.

When it comes to testing, here’s my order of importance:

  • E2E testing (Does the server binary run with the expected options? Can someone send the HTTP requests to register a user account?)
  • Integration testing (If I spin up a temp EntityStore, can I read out entities I’ve written to it inside code?)
  • Unit testing (Does x function work under adverse conditions/bad input)

I don’t find myself writing a lot of unit tests (I’m not sure exactly how ashamed I should be), given the power of the type system (assuming I’m using any of the power) – but of course that would be the right tier for generative testing.

E2E Testing

One of my favorite things about working in the node space is how easy it is to use tools like supertest (built on top of superagent) and simple testing frameworks like test along with the power of nodejs to write tests against a “real” API. Turns out it’s not that hard to do E2E testing in Haskell either, though I’m a bit verbose with my usage:

promotionTriggerTest :: SpecWith TestAppInstanceInfo
promotionTriggerTest = it "should be triggered after a JPR is promoted" runTest
    where
      runTest (port, _, _) = buildClientEnv port
                             -- | Do login and get the cookie for the admin user
                             >>= \clientEnv -> doUserLogin clientEnv adminUserAndPassword
                             >>= \authenticatedReq -> pure ()

                             -- | Create a company to be the employer
                             >> runClientM (createCompany authenticatedReq testCompany) clientEnv
                             >>= shouldBeRight
                             >>= \createCompanyResp -> when (status createCompanyResp /= "success") (error "Failed to create company")

                             -- | Create the job posting
                             >> let jpr = testJPR { rCompanyId=Just $ mId $ respData createCompanyResp }
                                               in runClientM (createJobPostingRequest jpr) clientEnv
                             >>= shouldBeRight
                             >>= \createJPRResp -> when (status createJPRResp /= "success") (error "JPR creation failed")

                             -- | Get the available jobs (job search with no query/filters/etc)
                             >> runClientM (jobSearch Nothing [] [] Nothing Nothing []) clientEnv
                             >>= shouldBeRight
                             >>= \originalJobsResp -> when (status originalJobsResp /= "success") (error "Job retrieval failed")

                             -- | Approve the JPR, using a fake SessionInfo for the admin user
                             >> let jprId     = mId $ respData createJPRResp
                                in runClientM (approveJPR authenticatedReq jprId) clientEnv
                             >>= shouldBeRight
                             >>= \approvedJPRResp -> when (status approvedJPRResp /= "success") (error "JPR approval failed")

                             -- | Retrieve the available jobs again
                             >> runClientM (jobSearch Nothing [] [] Nothing Nothing []) clientEnv
                             >>= shouldBeRight
                             >>= \updatedJobsResp -> when (status updatedJobsResp /= "success") (error "Updated retrieval failed")

                             -- | Ensure that totals changed, the approved job should have shown up (total should now be 1)
                             --   which means the cache should have been invalidated
                             >> let totalAfterUpdate = total $ respData updatedJobsResp
                                    originalTotal    = total $ respData originalJobsResp
                                in originalTotal `shouldNotBe` totalAfterUpdate

The easy Haskell integration of HTTP in combination with Wai.Application and ClientApi means it’s pretty easy to generate a client to connect to your servant API.

Property based testing

There’s an excellent practical talk on the subject.

The particular project I based this post doesn’t have any property based testing but it should. I have to be honest, I’ve just not found that unit (and by extension property-based) testing that necessary in developing apps. I know it’s something I should definitely have but 99% of the time I’ve just forgotten a parameter and most peopl aren’t trying to put negative numbers where positive numbers go (which is something I’d just solve with Natural). This isn’t a good reason to not do proper testing, but it’s my own rationalization.

Logging

hslogger has worked fantastically for me as far as a basic logging solution for Haskell. One thing that I haven’t done with it, that I need to, is structured logging (basically using JSON/some format versus plain strings in your log messages). GELF is a reasonably nice standard for this and I’m thinking of adopting it (I first came upon it when using Graylog). Haskell also happens to have support for GELF with the graylog package, so that’s nice.

Tracing

I’ve written in the past about getting tracing working within my haskell applications, so I’d suggest you look there to see how I’ve gotten on with this. A lot of the libraries that I was trying to evaluate then have grown, so another survey of the landscape is almost surely necessary, but I won’t try and accomplish that in this post. Anyway here’s the code I used to write to Jaeger on a recent project:

data TracingBackend = TracingBackend { tbCfg    :: TracingConfig
                                     , tbLogger :: Maybe Logger
                                     , tbSpans  :: IORef [Span]
                                     , tbMgr    :: Maybe Manager
                                     }

makeConnectedTracingBackend :: TracingConfig -> Maybe Logger -> IO (Either SomeException TracingBackend)
makeConnectedTracingBackend cfg maybeL = try makeBackend
    where
      makeManager = case tracingBackendType cfg of
                      LocalTracingBackend -> pure Nothing
                      Zipkin -> Just <$> newManager defaultManagerSettings
      makeBackend = newIORef []
                    >>= \spans -> TracingBackend cfg maybeL spans <$> makeManager

Again, this code is 2 years old – I’m pretty sure the tracing libraries have gotten much better for Haskell in 2021. OpenTracing joined OpenCensus to become OpenTelemetry – lots of things have happened in this space.

Metrics

I’ve also written about getting basic metrics working with prometheus in the past (basically exposing some basic warp supported metrics to prometheus), but I haven’t done a whole lot of deep thinking on this subject, and don’t feel confident enough with knowing the ecosystem to suggest things here, outside of just using prometheus-related tooling that’s available.

As far as metrics go, I would really recommend reading up on the USE method and RED method for guiding which metrics you really care about and want to expose to ensure you have enough signal coming out of the noise.

Whoa, time flies

So this post series took 2 years to reach it’s conclusion (only because I put off finishing up this post for 2 years). Since it’s pretty ridiculous to not finish a post for 2 years I thought I should do something for the readers who bothered to read this. What I’ve decided to do is completely open source the code or a job board I used to actively work on (TechJobs.Tokyo). I don’t know when I’ll work on it again (I was in the middle of lots of improvements too, adding OAuth, etc), but in the meantime even if the code is old I think it might be helpful to absolute beginners to have it out there in the world.

I don’t know how much Haskell has moved past the code as it sits (I expect libraries like servant and sqlite-simple don’t look too much different), but my basic assumption is having more production-aimed Haskell code out there that people can peruse (or maybe even fork and use? If you’re an absolute madperson) is probably good?

Anyway, check out the code that powers the job board on GitLab:

Wrapup

Well this has to be the longest running series that I’ve ever written – only because the last installment is 2 years late! I haven’t written any Haskell in a while (I’ve been making do with Rust and Typescript) but hopefully I can find time to shave some yaks in th enear future with what I consider to be the best programming language I know.

Like what you're reading? Get it in your inbox