Awesome FOSS Logo
Discover awesome open source software
Launched 🚀🧑‍🚀

REST-ish Services in Haskell: Part 2

Categories
Haskell logo + Servant logo

tl;dr - Work our way through some more type tomfoolery, domain modeling (w/ a light discussion of Domain Driven Design). Next a Component which can operate on the Task domain model, the TaskStore is introduced. Then the types “hit the road” and we build a partial implementation of a SQLiteTaskStore (with assistance from sqlite-simple). The code is available in the haskell-restish-todo repo, @ tag part-2.

UPDATE (11/19/2018)

After some feedback from the r/haskell post I added an extra section on some more interesting approaches to handling the over-typed first section, based on suggestions from helpful people on r/haskell.

Multi-part blog post alert

This is a multi part blog post with the following sections:

  1. Part 1: Initial project setup
  2. Part 2: Domain model design and application architecture (this post)
  3. Part 3: Building the API by putting it all together
  4. Part 4: Extras and other concerns

Setting the stage

So if you’ve been following from the previous blog post, you know that we’ve just set up a bunch of machinery for loading configuration and running the server’s CLI aspect. Unfortunately, as in the first post, we’re not actually going to quite make it to the actual “HTTP” part of this endeavor quite yet – we’re going to spend a little time taking care of arguably more abstract API-adjacent tasks. Today we’re going to build castles in the sand – building up the domain of the solution we’re solving with as much help we can get from the formalisms in Haskell.

As usual, I will try to start off with the simplest approach to solving a problem, and be explicit whenever we find a chance to put on our strong typing tophat 💪 🎩 to solve problems in a similar, slightly more complex, but ultimately “better” way (“better” in this instance means with more verification/work done with the type system/compiler).

Today we’ll be going all the way up to getting a “working” main method which starts up a bunch of components, which is the system. Notice I have not mentioned anything about how we’ll express operations to the outside world (eventually HTTP), and that’s becasue those concerns are orthogonal (on an external layer of the DDD onion model). I’ll be covering that in Part 3 (which is where Servant will actually come in).

In this part, I’ve changed things a little bit from the first part – you can find a link to the guaranteed working at the end of every major section. The code snippets in this article should not be copy-pasted in, but are close to what you need for working code, often pointing out new imports or lanuage extensions used.

OK, enough preamble let’s jump in.

Building our domain models

There’s a bit of a chicken and egg problem with determining our domain model and important objects, and trying to tease out the structure of our application at a high level – you can come at the problem top down (e.g. starting from the HTTP routes an application must provide and determining the types needed) or bottom up (like we will do now). None of this is “real work” yet, but just getting how we’re thinking down on paper – this is also a good time to see where we can leverage the power of Haskell’s type system. It may seem like it’s not important, but this is the step that differentiates architects from junior engineers (which is obvious, this is the architecting bit).

Let’s figure out what our data model looks like. Here are a few motivating questions (assuming we decide that Task is a good place to start):

  • What makes a Task item?
  • What parts of a Task are optional?
  • What information do we need to actually do work to a Task? Is there standard metadata that a Task should always have?
  • Who owns a Task? Is that information part of a Task?

Let’s start with a basic definition of tasks (in a file called Types.hs):

Types.hs:

data TaskState = NotStarted
               | InProgress
               | Completed deriving (Enum, Read, Show)

data Task = Task { tName        :: Text
                 , tDescription :: Text
                 , tState       :: TaskState
                 }

We’ve already sort of stepped in/around a landmine here in the way that Strings work in haskell – there are actually a bunch of types you could choose for Strings – String (equivalent to [Char]), Text, ByteString, and the strict/lazy derivatives of the last two. sdiehl has written a good guide on it, and there are other good posts and SO threads to consider. Here, we’re going to keep things simple-ish and use Text – this means we’ll need to bone up on conversions to/from text later and there will be some complications, but it’s much better than using the very inefficient basic String type. Bytestring might have been another good option, but we’ll forgo any super deep thinking for now.

Even with this pretty simple domain model, there is a big opportunity to make things more robust here – we could actually use some more advanced polymorphism to make our types a little more malleable:

data Task f = Task { tName        :: f Text
                   , tDescription :: f Text
                   , tState       :: f TaskState
                   }

type Complete f = f Identity
type Partial f = f Maybe

We did the same trick in in part-1, but this time I’ve added the generalization of Partial f to represent any similarly-constructed partial type. Just like before, what we’ve gained by adding the f is the ability to differentiate between fully formed Complete Tasks (i.e. Task Identity) and Partial Tasks (Task Maybe) which might be used for merging or partial updates or whatever. In case it’s unclear, a Partial Task (AKA Task Maybe) is a Task in which every property it has, is a Maybe! You can see what this might be like if you just do the replacement:

data Task Maybe = Task Maybe { tName        :: Maybe Text
                             , tDescription :: Maybe Text
                             , tState       :: Maybe TaskState
                             }

This is just like the ability to do mapped types like Partial<T> in typescript. This kind of parametric polymorphism comes in really handy in the likely case that we support a a JSON merge patch standard-compliant PATCH endpoint. In general, we can now avoid writing types that are basically just partial versions of other types, like a TaskUpdate type or TitleOnlyTask, etc. If you’re not comforatble with this brand of polymorphism yet, that’s totally fine! When the time comes you can make seperate types (with names like PartialTask or TaskCreationRequest) in an ad-hoc fashion. We can always change the types later with good assurance that the compiler will find all the bugs for us – don’t get in over your head if you don’t have to just yet.

Whether you make the class simply or in a more complex manner, there’s always a case to be made for making distinct types that expose state information – it enables you to make it impossible to attempt invalid operations with an API you provide. For example, if completing a task has the type signature completeTask :: IncompleteTask -> IO CompletedTask, it becomes impossible to misuse the function to complete a task twice, for example. With that example in mind, let’s put on our strong typing tophats 💪 🎩 and get even more fancy with our type specification – I want to be able to distinguish incomplete and complete tasks at the type level:

data Completed
data InProgress
data NotStarted
data TaskState = Completed | InProgress | NotStarted

data Task f state = Task { tName        :: f Text
                         , tDescription :: f Text
                         , tState       :: f state
                         }

type Complete f = f Identity
type Partial f = f Maybe

type CompleteTask f = f Identity Completed

type IncompletePartialTask f = f Maybe InProgress
type IncompleteTask f = f Identity InProgress

type NotStartedPartialTask f = f Maybe NotStarted
type NotStartedTask f = f Identity NotStarted

Is this too much? Maybe, we’ll find out later – we can always relax the definitions a little bit later and rework things. More importantly now we can actually be even stricter in our function signatures. Let’s get even more persnickety and introduce some newtypes for tName and tDescription:

{-# LANGUAGE GeneralizedNewtypeDeriving #-}

-- ... other code ...

-- Newtypes preventing careless
newtype TaskName = TaskName { getTName :: Text } deriving (Eq, Show)
newtype TaskDesc = TaskDesc { getTDesc :: Text } deriving (Eq, Show)

data Task f state = Task { tName        :: f TaskName
                         , tDescription :: f TaskDesc
                         , tState       :: f state
                         }

Now someone can’t even give us a Text value (from Data.Text) without going through a single checkpoint we control. We could use this to ensure that invalid TaskNames are inconstructable to begin with. But why stop there? Why do a little when we can do entirely too much? Let’s put another tophat on top of our tophat & drown ourselves in types:

{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE StandaloneDeriving #-}
{-# LANGUAGE TypeSynonymInstances #-}
{-# LANGUAGE FlexibleInstances #-}

-- Individual separate types for tasks to enable specifying them as part of (Task f state)
data Finished = FinishedTask deriving (Eq, Read, Show)
data InProgress = InProgressTask deriving (Eq, Read, Show)
data NotStarted = NotStartedTask deriving (Eq, Read, Show)

-- Task state for abstracting over TaskState
data TaskState = Finished
               | InProgress
               | NotStarted deriving (Eq, Enum, Read, Show)

-- Newtypes preventing careless
newtype TaskName = TaskName { getTName :: Text } deriving (Eq, Show)
newtype TaskDesc = TaskDesc { getTDesc :: Text } deriving (Eq, Show)

-- The beefy task class
data Task f state = Task { tName        :: f TaskName
                         , tDescription :: f TaskDesc
                         , tState       :: f state
                         }

-- Completed tasks
type CompletedTask = Task Identity Finished
deriving instance Eq CompletedTask
deriving instance Show CompletedTask

-- Incomplete, partially specified tasks
type IncompletePartialTask = Task Maybe InProgress
deriving instance Eq IncompletePartialTask
deriving instance Show IncompletePartialTask

-- Incomplete, fully specified tasks
type IncompleteTask = Task Identity InProgress
deriving instance Eq IncompleteTask
deriving instance Show IncompleteTask

-- Not started, partially specified tasks
type NotStartedPartialTask = Task Maybe NotStarted
deriving instance Eq NotStartedPartialTask
deriving instance Show NotStartedPartialTask

-- Not started, completely specified tasks
type NotStartedTask = Task Identity NotStarted
deriving instance Eq NotStartedTask
deriving instance Show NotStartedTask

This might be a little insane for our little project, but also is kind of awesome, and the code should be relatively easy to follow. I might be punching above my weight with this level of specification, but this kind of fun is a big part of the joy of Haskell in my mind. I can’t think of another non-ML language (which the majority of the world is using) where syntax is this clear and you can “say what you mean” this rigidly. I am definitely pushing Haskell (look at all the new language extensions along the way!) to get to such a specified model, pushing Haskell to produce even stricter checks for me by tinkering. It’s all in good fun until you encounter a compiler error that you can’t understand. Anyway, let’s get into the language extensions I enabled to make the code snippet(s) above possible:

  • OverloadedStrings - Easier to work with Strings and autoconversions where possible to Text
  • StandaloneDeriving - Forced by the f type in Task, since Haskell doesn’t know what the f type is, and the normal deriving clause has no way to tell it
  • TypeSynonymInstances - Due to the fact that I used StandaloneDeriving on a type-based alias (normally it only takes actual type constructors)… Eventually I could actually remove this for some reason.
  • FlexibleInstances - Standalone instances like Eq IncompleteTask break the requirement of all type variables appearing at most once in instance head (I’m least sure about this one)
  • GeneralizedNewtypeDeriving - To ensure that the newtype instances can be treated the same as the type they’re made of (i.e. TaskName is more or less equivalent to a regular Text)

I want to again note that I’m very very likely going to regret this level of type specification but it’s nice to be able to get it to compile for now at least. It’s absolutely fine to stop at any of the previous steps before I got to this bit of madness, but I did want to see how far I could go – hopefully any problems from here on out will be learning experiences as well.

SIDETRACK/FURTHER READING: Domain Driven Design (DDD)

This is a small rant on why DDD is important - feel free to skip it if you’re full up on hastily stated opinons for the day.

I’ve mentioned it a few times in passing now but Domain Driven Design (DDD) is one of the important schools of thought/patterns/paradigms in software design. The idea of separating and encapsulating your concerns at different layers, the onion model is a thoroughly practical concept and the only good way I’ve seen of truly separating domain concerns and other adjacent code (like JSON marshalling or HTTP requests). It’s very hard to manage complexity well without thinking in a way similar to what DDD suggests – whether you arrive at it yourself or learn about DDD from others.

Handling validation: the Validatable typeclass

OK, so now we have a Task class, but here’s another important distinction we should try to make, we should be able to determine when a Task is not only well-typed/sound but also “valid”, for our defnition of valid. I mentioned it in passing in the previous section, but rather than taking the correct-by-construction route (constraining how Task/TaskName/TaskDesc values can be made), I’m going to solve this problem by introducing a typeclass to deal with the types that can be validated in some way. The concept is simple:

newtype FieldName = FieldName { getFieldName :: Text } deriving (Eq, Show, Read)

data ValidationError = InvalidField FieldName
                     | MissingField FieldName deriving (Eq, Show, Read)

data Validated t = Validated t

class Validatable t where
    validate :: t -> Either ValidationError (Validated t)

Some of the tricks from before are used here but line-by-line the code is straight forward:

  • newtype FieldName is there to ensure that whoever makes a ValidationError is forced to be specific abou the Text they’re talking about
  • ValidationError is an enumeration of the types of errors that could happen during validation
  • Validated t is a product type that basically means nothing outside of the fact that something was validated (the type will be exported, but without the constructor, forcing people to go through Validatable’s validate method)
  • Validatable t typeclass is there to give implementers of Validatable types a chance to set their validation logic

With this in place, let’s define what validation looks like for a the types of Tasks we’ve defined. In general I like to think of a Valid Task (see what I did there?) as a task with a non-empty name and a non-empty description (state can’t possibly be invalid at this point, as long as it’s present). Here’s the implementation for CompletedTask:

instance Validatable CompletedTask where
    validate t = if isEmpty name then
                     Left (InvalidField (FieldName "name"))
                 else
                     if isEmpty description then
                         Left (InvalidField (FieldName "description"))
                     else
                         Right (Validated t)
        where
          name :: DT.Text
          name = getTName $ runIdentity $ tName $ t

          description :: DT.Text
          description = getTDesc $ runIdentity $ tDescription t

          isEmpty :: DT.Text -> Bool
          isEmpty = (==0) . DT.length

The intense types we’ve made have already caused some complexity, look at the unwrapping we need to do to get to the actual name of task in the where clause. I was able to quite easily follow the compiler through my mistakes so I’m not completely spooked yet but I just wanted to note that this is often a result of the additional cleverness from before, if you ever feel you need to turn back and make things simpler (at least at first), don’t hesitate to do so.

There’s another decision made here, which is where I chose to add the Valid a class. The natural choice for me was to put it on the outside – Valid CompletedTask is really Valid (Task Identity Finished), but I could have done something like adding a type parameter (just like we did with Task f state) to include the validation as a part of the task itself. I’ve chosen to consider the task’s status as part of the task, but not the tasks’s validity – this is a design choice on my part.

Anyway, back to the implementation – it looks right, but the nested if statements kind of stick in my craw. Let’s make it a little more generic/clean… How can we do this? I know there’s a set of checks I want to run, and they all must pass for an object to be valid… Thinking about this in code:

type ValidationCheck t = t -> Maybe ValidationError

class Validatable t where
    validate :: t -> Either ValidationError (Validated t)
    validationChecks :: [ValidationCheck t]

Some functional concepts can help us out here – in particular the basics of map/reduce/filtering (where people normally get their introduction to functional programming). Restated, what I want to do is have a list of validation functions run on the given object, and ensure that none of them produce ValidationErrors. I could also solve this by reducing/folding the object over functions, but I think (f)mapping and filtering is the most straight forward way to obtain all the ValidationErrors if they want to.

With this realization, it might make sense to change the Validatable typeclass a bit more… We should be able to deduce a general definition of validate in terms of validationChecks, and maybe add an isValid that just produces a bool, since sometimes that’s all a consumer might care about. We can also now cut down on the actual mechanical code to a list of individual checks. Let’s get out our strong typing tophats 💪 🎩 and see what this might look like:

import qualified Data.Text as DT
import           Data.Either (isRight)
import           Data.Maybe (isJust, fromJust)
import           Data.Functor.Identity (Identity(..))

-- ... other code ...


type FullySpecifiedTask = Task Identity
type PartialTask = Task Maybe

taskNameField :: FieldName
taskNameField = FieldName "name"

taskDescField :: FieldName
taskDescField = FieldName "description"

-- | Helper function to access task name for fully specified task
--   this works for both `FullySpecifiedTask state` (where state can vary, e.g. s`CompletedTask`(~`Task Identity Finished`) or `IncompleteTask`s (~`Task Identity InProgress`)
fsTaskName :: FullySpecifiedTask state -> DT.Text
fsTaskName = DT.strip . getTName . runIdentity . tName

fsTaskDesc :: FullySpecifiedTask state -> DT.Text
fsTaskDesc = DT.strip . getTDesc . runIdentity . tDescription

instance Validatable (FullySpecifiedTask state) where
    validationChecks = [checkName, checkDescription]
        where
          checkName :: (FullySpecifiedTask state) -> Maybe ValidationError
          checkName t = if DT.null (fsTaskName t) then Just (InvalidField taskNameField) else Nothing

          checkDescription :: (FullySpecifiedTask state) -> Maybe ValidationError
          checkDescription t = if DT.null (fsTaskDesc t) then Just (InvalidField taskDescField) else Nothing

pTaskName :: PartialTask state -> Maybe DT.Text
pTaskName = (DT.strip . getTName <$>) . tName

pTaskDesc :: PartialTask state -> Maybe DT.Text
pTaskDesc = (DT.strip . getTDesc <$>) . tDescription

instance Validatable (PartialTask state) where
    validationChecks = [checkName, checkDescription]
        where
          checkName :: (PartialTask state) -> Maybe ValidationError
          checkName = maybe (Just (MissingField taskNameField)) notEmptyIfPresent . pTaskName
              where
                notEmptyIfPresent :: DT.Text -> Maybe ValidationError
                notEmptyIfPresent v = if DT.null v then Just (InvalidField taskNameField) else Nothing

          checkDescription :: (PartialTask state) -> Maybe ValidationError
          checkDescription = maybe (Just (MissingField taskDescField)) notEmptyIfPresent . pTaskDesc
              where
                notEmptyIfPresent :: DT.Text -> Maybe ValidationError
                notEmptyIfPresent v = if DT.null v then Just (InvalidField taskDescField) else Nothing

The code got longer, but I’m pretty satisfied – by defining Validatable on PartialTask state, we’ve defined it for both CompletedTasks and IncompleteTasks, since state is allowed to be unknown, kind of. I can just focus on the list of functions specifying how a value could be invalid. I don’t necessarily live by the DRY principle, but this is definitely DRYer than what we had before, and should pay dividends for every instance we write (which again, is unfortunately a bunch). A few notes about the code:

  • I worked on this iteratively, this code didn’t just come out perfect, obviously – the compiler is your friend
  • I renamed IncompleteTask to InProgressTask to better match the state
  • We’ve solved this in a really mechanical way, not making use of typeclasses like Traversable (validate/validationChecks), BiFunctor (isValid).
  • qualified import of Data.Text is important to prevent namespace clashes with the Prelude for functions like length and null.
  • I totally forgot that null exists for checking empty lists (SO Post). There seems to be a very real penalty for not using it (O(N) vs O(1)) so I added it in and removed all instanced of my little isEmpty helper function.
  • I actually forgot about generator/list comprehensions in Haskell, I’ve been using <$>(fmap) and it’s friends for so long I totally forgot about this built in langauge feature.
  • The lens library would almost certainly make a lot of the code that resulted much much nicer, but I’m not down with it just yet so can’t use it here

Specifying all these instances for all the types synonyms (which are basically pre-set combinations of product tyeps) we specified above is a bit tedious, but now we’ve got a Validatable fully and partially specified Tasks!

SIDETRACK/FURTHER READING: Type quantification (💪 🎩)

The code got longer, but I’m pretty satisfied – by defining Validatable on PartialTask state, we’ve defined it for both CompletedTasks and IncompleteTasks, since state is allowed to be unknown.

If this quote from the previous section threw you a little for a loop, you can blame Haskell. Turns out every time you write a definition like:

someFunc :: a -> Int

You’re actually doing something called type quantification with roots in another thing called System F (which is basically the source of all parametric polymorphism as we know it today). You’re really writing:

someFunc :: forall a. a -> Int

If you just take the english for what it stands for without thinking deeply, you’re defining someFunc for all as – you’re creating a function that no matter what a is, can give back an Int. So the forall is implied in the first example, and explicit in the second. The example might not seem very useful, but forall can be used in a bunch of places, for example:

data Backend = Backend { forall b. TaskStore b => store :: b }

Normally, we’d have to bubble that b up to the definition of Backend (making it Backend b), but here we can actually express that “for any b, as long as it is TaskStore, it can be in this spot”. We’ve managed to actually hide the b from users of Backend! Now they don’t have to care about what the store is used by the Backend. Whether you want this behavior or not is debatable – I like compound type names like Backend SQLite – but at least it’s your choice.

Building Components that act on our domain models

Returning to our regularly schedule programming, we’ve just finished building some beefy bordering on excessive abstractions on top of our basic Task type. Let’s start thinking about things that can create and manipulate these Tasks (our domain model): Components.

While OOP might have fallen out of favor in many places, I’m a staunch believer in componentizing the internal workings of large scale applications I write. Object Oriented Programming is not right/wrong/good/bad by default, it’s just a paradigm – context and implementation are important. Somewhat ironically I think haskell’s heavily functional class system and the way in which typeclasses work is a really good fit for OOP-style thinking – unlike focusing building up object hierarchies as we might do in Python (have you seen their cool new type hinting features?) or Java, we can cleanly separate what data is crucial for a type to know about itself, and what functions should be possible if you’re holding an instance of the class (and even attach data that types should know about themselves at the typeclass level, which we’ve done with validationChecks).

At a higher level, components are interesting because they arrive almost naturally as a tool for separating large arenas of functionality in any large system, if not only for the better ergonomics. Very soon after starting to think up domain models, I find that it’s good to think of the components/functionality that you would need to serve the domain – doing this leads to less and less surprise when you start to build the external side of the system. Again, you could go with just a bag of functions in a bunch of individual module files, but I prefer the creation of Components which are first-class objects in the system because they can help encapsulate necessary runtime properties (like, holding a configuration parameter of all the functions you would call).

What should components actually be able to do?

Here are some things to think about:

  • What is the minimal set of configuration that your component needs to function?
  • Which domain objects does this component touch? In what ways are you going to manipulate domain objects with this component?
  • Thinking a bit ahead what kinds of operations will your API support? In general one or more components will be activated/given work when an API endpoint is hit, writing ergonomic operations is a good idea, because you’ll be using them and thinking in terms of the operations.
  • Does your component need to share state with others? Do they need to stay in sync?
  • Can multiple instances run at a time inside the program? Will they?

These questions are pretty lofty and general for the humble app we’ll be making, but they’re nice food for thought anyway. Going back to the abstract, here’s a wishlist of things that independent Components should generally be able to do:

  • Start up, given an initial configuration, and any other requirements (“init” also happens at this stage)
  • Take in and perform useful work via some interface (typeclass method calls is what we’ll use here)
  • Shutdown to allow for graceful resource release

Here are some goals I might call “stretch” goals – they’re awesome but not absolutely crucial, and can be tacked on in generic ways:

  • Reporting current configuration
  • Dynamically updating configuration in response to external calls
  • Report events related to internal state/actions for anything that might care (i.e. exposing a event stream)

One thing we’re kind of skipping over here is the distinction between a typeclass method call, and an explicit Command-parsing and executing strategy (“command” as in Command Query Response Seperation). In the end I kind of think of Command parsing/taking as more of a transport layer (weirdly enough, in the same way that HTTP is), so I’m going to focus on the most direct methodology here – direct (typeclass) method invocations.

Defining an (overly general) typeclass for Components

Here’s a real simple typeclass that sort of captures the functionality we thought was crucial for some type c:

-- | A simple Component typeclass parametrized over the type of configuration it takes
class Component c where
    start :: c -> IO ()
    stop  :: c -> IO ()

Simple, but not very useful – let’s make this more interesting by composing this typeclass with some more specific ones that will constrain the space of eligible types.

Specialization through typeclass constraints & composition: TaskStore

Let’s take some time to think about what an TaskStore should be/be able to do, in a very specific sense:

  • Be a Component so we can invoke it’s lifecycle (and think about it in that way, even though we’re not doing much with lifecycle now)
  • Create a new Todo instance we’ve created that will be readable at a later time
  • Retrieve a Todo instance (how do we know which one? well we’ll need some kind of identifier)
  • Update an existing Todo instance
  • Delete a Todo instance

As you might have noticed, this is the CRUD pattern (read the first letter of each point) – we’re basically building a database-like abstraction. Despite what might seem like simple duplication, this is actually abstraction – not because we’ll ever use a different backing store (we probably won’t) but more because we can think without worrying about the difficulties of maintaining indices or creating tables or maintaining consistency. We can deal abstract over database specifics (even if it’s just the one database) with this Component, and live in our white castle for a little longer (we’ll deal with making it “real” later).

Let’s write a sketch of the typeclass that encapsulates what we’d expect of a TaskStore:

class Component c where
    start :: c -> IO ()
    stop  :: c -> IO ()

data TaskStoreError = NoSuchTask TaskID
                    | UnexpectedError DT.Text
                      deriving (Eq, Show, Read)

newtype TaskID = TaskID { getTaskID :: Int } deriving (Eq, Show, Read)

class Component c => TaskStore c where
    persistTask  :: c -> Validated (FullySpecifiedTask state) -> Either TaskStoreError (FullySpecifiedTask state)
    completeTask :: c -> TaskID -> Either TaskStoreError CompletedTask
    getTask      :: c -> TaskID -> Either TaskStoreError (FullySpecifiedTask state)
    updateTask   :: c -> TaskID -> PartialTask state -> Either TaskStoreError (FullySpecifiedTask state)
    deleteTask   :: c -> TaskID -> Either TaskStoreError (FullySpecifiedTask state)

Alright, here’s where the theoretical benefit of all the type wrangling we did comes along. By being very specific about our types means that when I write Validated (FullySpecifiedTask state), I can read that as “a validated, fully specified task in some unknown state”. Similarly, when I can be more specific in the case of completeTask, expecting that the method will return to me a CompletedTask, which is the same as a FullySpecifiedTask Finished, in other words “a fully specified task in the completed state”. This specificity, being enforced by the compiler hopefully menas we can prevent ourselves from misusing the API this component presents at the code level – as in you can’t write code that will even try to persist a task that isn’t completely specified.

I’ll get off the strong typing high horse (tm) here and say that while this level of type-level specificity is amazing in my own opinion (and unheard of or table stakes for others, everyone’s different), it’s maybe not the most necessary thing to spend hours on – if you have a deadline, and need to get a feature out the door, you might be better off going with a simple Task type that isn’t parametrized at all, and judging whether you need this extra safety (I think you almost always should do it if you can afford to, it’s why you’re using Haskell and not language X, right?).

Another thing you woud have likely noticed is that we’ve actually required that not only does the type that can function as a TaskStore must already be a Component. Hopefully your head doesn’t feel too heavy with the strong typing tophat 💪 🎩 you’ve been wearing this whole time. Haskell’s typeclass system and excellent support for parametric polymorphism is one of it’s best features, if you can’t tell already. Constraints on typeclasses serve as a way to implement OOP-style inheritance and give us the composability everyone’s been wanting from OOP-focused languages with weaker type systems. Languages other than haskell also have similar systems – Golang’s duck-typing at the interface level approach is kind of like this but a bit looser, Rust has traits which are the closest thing I’ve seen, and of course some JVM based languages are in the mix (off the top of my head Scala’s traits , Clojure’s protocols seem like the right analogues).

Also note that there are two new entities I didn’t discuss much:TaskID and TaskStoreError. The TaskStoreError type will contain (at least for now) my guesses at the ways things could go wrong with this component. Haskell uses values as exceptions and this is a good example of it (even though I haven’t defined it yet). More importantly, this abstraction over “things that can go wrong” (💪 🎩) allows me to unify how different implementations surface their almost certainly implementation-specific errors later, under some type you might call an AppError. This is something that we take for granted in Haskell-land, but it’s absolutely not the case in other languages – people have to actually worry about error classification and knowing what’s what at runtime and stuff in other languages.

TaskID is a bit of a sidetrack, so let’s diverge and talk about it.

SIDETRACK: How do we identify a Task? Enter the WithID product type

We touched on this a tiny bit before, but this really extends to where you think the line is drawn between an object itself and it’s metadata. Is the ID of an object in a database system a property of the object or of some nebulous metadata that sort of follows it around? Analogies/anecdata is bad but this is like thinking that your ID (government-issues, company-issues, even your very name) is part of you in the physical sense. Maybe it doesn’t make sense to think of objects as physical things, but if we go with that assumption, a lot of OOP is going to look very silly. Anyway, I’m making the choice that identification metadata is NOT part of the object. Another decent argument is that you really only seem to need the identity when someone or something needs to find/track you – you can make a case for this piece of data being non-essential. Yet another decent argument is whether the concept of a Task can exist without an ID – the answer is yes (at creation time, since it doesn’t have a server-assigned one, though this might be different if you allowed user-assigned task IDs).

In any case, the choice to treat ID as an external piece of information makes things more difficult due to having to wrap and unwrap this data everytime I think about/use some Task stuff, but it helps me sleep better at night (in my bed inside the ivory tower of theory) so there’s that.

OK, so let’s introduce the WithID type. It’s pretty simple:

WithId a = { id  :: Int
           , obj :: a
           }

This is just about the simplest way we could make this. But exposing sequential numeric ids to external API users is so 1999 (not going to lie I still do it sometimes when I use SQLite), what about v1/v4/v5 UUIDs?

WithId a = { id  :: UUID
           , obj :: a
           }

Thanks to the uuid package we’ve got some nice access to reasonable UUID class that we can actually use. Well now we have a tiny bit of a conundrum, what if we decide to have objects that have both? Indeed, in may database systems there’s actually a hidden column – SQLite has rowid and Postgres has oids which are disabled since 8.0 – and what about the simple case that we want to use numeric IDs for internal API access and UUIDs for external or something (whether you should or not is another question). How should we handle this divergence? At the very least there are three ways:

data WithId a = { id   :: Maybe Int
                , uuid :: Maybe UUID
                , obj  :: a
                }

The idea here is that we just include both, which can be simplified if you just want to commit to always having both be available.

newtype NumericID = { getNumericId :: Int } deriving (Eq, Show, Read)

data TaskID = NumericID
              | UUID deriving (Eq, Show, Read)

data WithId a = { id  :: TaskID
                , obj :: a
                }

The idea here is to create a sum type (really a disjoint union) and store that in the space. This means code we write will deal in TaskIDsThe other choice is to parametrize the choice:

data WithId idt a = { id  :: idt
                    , obj :: a
                    }

type WithNumericId = WithID Int a;
type WithUUID = WithID UUID a;
type WithAllIds = WithID (Int, UUID) a;

This is the most complex of the options, but it provides some interesting tradeoffs. For the cost of additional type complexity, we’ve actually gained the ability to gracefully handle and be very specific about all 3 individual cases, and more specifically reflect the state of things at the type level, and give people who use this type a little more information. I’m itching to choose this, just because it seems fancier, but what are the actual tradeoffs here?

The basic idea here is that to handle this discrepancy (the fact that we might want to use the Int ID sometimes and UUID other times, since this is what we’re allowing for), we’re going to have to write some case code (2 sets of code that look very similar) somewhere. Or do we? One way we abstract (💪 🎩) over these differing implementations is by creating a typeclass that encapsulated what we want to know about these two. In particular, we’re most likely going to be using them in the at least the following arenas:

  • JSON encoding/decoding (to send over the wire)
  • Database lookups (to use as parameters to queries, remember, always use query parameters, not string interpolation)

JSON encoding/decoding is going to be more-or-less taken care of automatically by [aeson’s magic][hackage-aeson] (including use of GHC.Generics), but that second bit is something we have to deal with, most likely – since we’re writing the abstraction that covers the database. The first thing I can think of that is going to be important regardless of what kind of ID is being used is the ability to turn it into a value that SQL can read – If we were doing string interpolation (again don’t), then this would mean Show (toString, essentially) instances. Since we’re going to be using a DB library that can understand Ints as well as Strings, let’s keep the types rich, and assume that all we really want is some functionality like:

class HasSQLID where
    getSQLIDValue a -> ???
-- ??? represents some class that is OK for whatever library we're using to put in a query as a param

As long as we can ensure the implementations backing approach #2 (sum type) and approach #3 (parametrization) can both conform to this typeclass, we’re OK. The difference between doing this implementation seems to be identical. Making the typeclass instances (as in instance HasSQLID WithID vs instance HasSQLID WithNumericID) seem to differ only in where the case happens, whether it’s in getSQLIDValue (in the instance HasSQLID WithID case, since it’s a sum type), or whether it’s done at the type sytem level (you need ~3 different instances of HasSQLID).

Thinking about it a little more though, it turns out we can actually define ProducesSQLID in ~1 instance for WithID idt a though, by using a constraint on the type of idt. As long as we can ensure that idt satisfies ProducesSQLID, we can actually write a generic instance, by just defering to idt’s instance. On the other hand, if we think about the WithID instance a little bit more, we can actually do the same thing, but one step further – we can actually just derive HasSQLID, as long as it’s defined for all of the types it could be (i.e. NumericID and UUID).

So #2 seems like the least-code, least-complex way to do things (despite wanting to go with #3 because it’s got more/cooler type machinery in it), but all this thinking got me thinking about another thing: GADTs! As a solution, it’s kind of a mix between #2 and #3. This is almost the perfect use case for them:

data WithID a where
    UUIDID :: UUID -> a -> WithID a
    IntID  :: Int -> a -> WithID a

I’ve been looking for a chance to use (and finally understand) GADTs for a while (there’s a great lecture/talk on it by Richard Eisenberg on YT), so I’m going to go with this approach, despite the relatively little amount of time I’ve spent talking about it. It’s a terrible way to make the decision but this blog post is long enough and long story short I’ve wanted to see if I coiuld use GADTs naturally somewhere so I gotta move on – if it helps it’s kind of like I picked both #2 and #3 simultaneously.

One more tiny thing – what if I have more metadata coming from the database? Like created_at, or version or updated_at? I’m not going to go into another long spiel but here’s the beginnings of what that might look like:

data WithMetadata a where
    UUIDID   :: UUID -> a -> TaskID UUID
    IntID    :: Int -> a -> TaskID Int
    FullMeta :: FullMeta -> a -> TaskID FullMeta

You can imagine that the FullMeta type is one that consists of another GADT (or just a type full of Maybes to keep it simple), with more metadata specified. Anyway, I’m done thinking about this, my head is starting to hurt (I mean not really, it’s actually fun, but this post is getting terribly long and that’s probably a bad sign for the signal/noise ratio). Let’s move on.

Before I move on, the full version of this code (which excludes FullMeta for simplicity) requires two additional language extensions that should be discussed:

  • GADTs - Needed to use GADTs at all
  • MultiParamTypeClasses - Needed to create the super general typeclass I created for HasSQLID (see the code/commit link below for the full code, it’s pretty different from what’s here)

Note that I also had to go back and update the interfaces we were dealing with to reflect where the WithID class should show up. Here’s an example of what this looked like for persistTask:

persistTask  :: c -> Validated (FullySpecifiedTask state) -> Either TaskStoreError (WithID (FullySpecifiedTask state))

BTW, if I was down with Dependent types in haskell (or at least the current level of support) (💪 🎩), we could get even more specific about the ID that came back from an operation like getTask and ensure it matched the ID that went in (of course this would require dependent typing all the way down to the DB query level). This is one of the places that they would come in handy.

Leaving the ivory tower: making a real TaskStore

Up until now all we’ve really done is write fanciful magic types, typeclasses, and abstractions without implementations. Here’s where the rubber hits the road – we’re going to write a type that is a working TaskStoreComponent (remember, this means one that is a Component and furthermore a TaskStore). Of course we’re going to need an actual way to talk to a database at this point, and while my original plan was to walk through in-memory arrays, hashes, and trees as backing stores for our TaskComponent, I’m going to skip straight to SQLite instead.

Haskell has many libraries for high-level abstraction over SQL-compliant databases, and also for dealing with objects at a level higher than SQL (Object Relational Mapping, AKA “ORMs”), which is what we’re about to do. Unfortunately I’m about to ignore ALL the libraries out there that do a great job of solving this problem, and do it the worst yet most straight-forward way I know – raw parametrized SQL queries without even so much as a single query builder to help.

But wait, we need configuration!

Before we get going, one thing I haven’t covered much but would jump at you while developing this is that we’ve just surfaced another requirement on our TaskComponent’s configuration (and resultingly whatever type it is represented by) – it probably needs to know it’s own state storage location (the FilePath we’re using in this case). This would be a good place to stop and update the configuration-related types we’ve set up in Config.hs, let’s add a TaskStoreConfig that is part of the AppConfig we defined in part 1:

-- ... other code ...

data AppConfig f  = AppConfig
    { host            :: f Host
    , port            :: f Port
    , taskStoreConfig :: f (TaskStoreConfig f)
    }

data TaskStoreConfig f = TaskStoreConfig { tscDBFilePath :: f FilePath }

-- ... other code ...

In addition to just defining TaskStoreConfig and including it in AppConfig, we needed to ensure it was set up to properly be used by the config-reading machinery we’re already using. This is as simple as adding the necessary instances and spending some quality time with the compiler as it lovingly points out everything you’ve done wrong.

-- ... other code ...

type CompleteTaskStoreConfig = TaskStoreConfig Identity
deriving instance Generic CompleteTaskStoreConfig
deriving instance Eq CompleteTaskStoreConfig
deriving instance Show CompleteTaskStoreConfig
deriving instance FromJSON CompleteTaskStoreConfig

type PartialTaskStoreConfig = TaskStoreConfig Maybe
deriving instance Generic PartialTaskStoreConfig
deriving instance Eq PartialTaskStoreConfig
deriving instance Show PartialTaskStoreConfig
deriving instance FromJSON PartialTaskStoreConfig

instance Semigroup CompleteTaskStoreConfig where
    a <> b = b

instance Monoid CompleteTaskStoreConfig where
    mempty = TaskStoreConfig (Identity defaultTaskStoreFilePath)

instance Semigroup PartialTaskStoreConfig where
    a <> b = TaskStoreConfig { tscDBFilePath=resolveMaybes tscDBFilePath }
        where
          resolveMaybes :: (PartialTaskStoreConfig -> Maybe a) -> Maybe a
          resolveMaybes getter = maybe (getter a) Just (getter b)

instance Monoid PartialTaskStoreConfig where
    mempty = TaskStoreConfig Nothing

JK, it wasn’t simple – these changes took >10 minutes to sort out – initially it was simple, but the compiler had a bit to say. The first wrinkle was how to distribute the f in AppConfig f – In this case f (TaskStoreConfig f) is the easy and correct choice, because if a task store config is not completely specified, an App store config must not be completely specified. This mean working through a bunch of other instances (mostly copy pasting with slight modification), and some quality time with the compiler. Another thing was I ended up doing refactoring mergePartial into a typeclass called AbsorbPartial partial, due to the need to recursively absorb partials for nested config objects (TaskStoreConfig is inside AppConfig). Check out the commit, it’s gnarly, considering how simple I thought the change would be.

On to actually using sqlite-simple

We’re going to be using sqlite-simple to talk to our SQLite database file so after adding it to package.yaml as a dependency for lib and start defining the actual database type in Types.hs:


data SQLiteTaskStore = SQLiteTaskStore
    { stsCfg  :: CompleteTaskStoreConfig
    , stsConn :: Maybe Connection
    }

All we really need is the configuration (which will tell us how to make the connection and any other settings we need), and the actual connection. Of course, when this soon-to-be component is made, the connection won’t be present yet, so I’ve used a Maybe to represent those two cases. Now that we’ve got the type, we can start stubbing out the instances for Component and TaskStore:

instance Component SQLiteTaskStore where
    start = undefined
    stop = undefined

instance TaskStore SQLiteTaskStore where
    persistTask  = undefined
    completeTask = undefined
    getTask      = undefined
    updateTask   = undefined
    deleteTask   = undefined

Obviously, there are a few issues with some assumptions and code we’ve written so far:

  • The signature for start actually doesn’t really work out – while in other languages we might have been able to mutate state, in Haskell we can’t, so if we want to create a new SQLiteTaskStore that’s connected (has a Just Connection inside), we need to make a whole new object.

  • Up until now we haven’t incorporated IO into the definition for a TaskStore at all – basically every time we return something it’s going to be returned from inside an IO context (due to the need to access the database).

As for the issue with the Component c type class definition, let’s introduce another typeclass that captures the quality we want – being able to construct a Component:

class Component c => Constructable c cfg err where
    construct :: cfg -> IO (Either err c)

Now, when we try and create a TaskStore at one point, we can connect to the database, and produce a connected one or produce an error detailing why we couldn’t produce one. Another way to solve this problem might have been to modify the Component typeclass directly, I think this separation is good and a little more composable/flexible and explicit in it’s requirement – maybe not every component will have a cfg or an error class.

As for the second issue, we’ll just need to edit the definition of TaskSTore and add more IOs:

class Component c => TaskStore c where
    persistTask  :: c -> Validated (FullySpecifiedTask state) -> IO (Either TaskStoreError (WithID (FullySpecifiedTask state)))
    completeTask :: c -> TaskID -> IO (Either TaskStoreError (WithID CompletedTask))
    getTask      :: c -> TaskID -> IO (Either TaskStoreError (WithID (FullySpecifiedTask state)))
    updateTask   :: c -> TaskID -> PartialTask state -> IO (Either TaskStoreError (WithID (FullySpecifiedTask state)))
    deleteTask   :: c -> TaskID -> IO (Either TaskStoreError (WithID (FullySpecifiedTask state)))

After writing this out, any attempt to start writing persistTask should immediately set off warning bells – even assuming we have a TaskStore with a Just Connection in it we can use, how in the world do we turn a FullySpecifiedTask state into a bunch of params we could use in a INSERT INTO x VALUES (...) DB query? Well the answer requires spending some time with sqlite-simple to figure out how to use it the directions are pretty easy to follow, but let’s skip ahead a bit (feel free to read up on sqlite-simple if you’d like):

import           Data.Int (Int64)
import           Database.SQLite.Simple (Connection, ToRow(..), FromRow(..), SQLData(..), field)
import           Database.SQLite.Simple.FromRow (RowParser)
import           Database.SQLite.Simple.ToField (ToField(..))
import           Database.SQLite.Simple.FromField (fieldData, ResultError(ConversionFailed), FieldParser, Field, FromField(..), returnError)

--- ... other code ...

newtype TaskName = TaskName { getTName :: DT.Text } deriving (Eq, Show)
deriving instance FromField TaskName

newtype TaskDesc = TaskDesc { getTDesc :: DT.Text } deriving (Eq, Show)
deriving instance FromField TaskDesc

-- ... other code ...

-- | A validated object's ToRow is just the same as it's contained object's ToRow
--   this can probably be auto-derived but let's write it manually for now.
instance ToRow a => ToRow (Validated a) where
    toRow = toRow . getValidatedObj

-- | UUIDs need to be converted to text before they can turn into fields
instance ToField UUID where
    toField = SQLText . toText

instance ToField TaskState where
    toField = SQLText . DT.pack . show

-- | ToRow (WithID a) can be generically performed if we just always put the ID first
--   this introduces the requirement that ids should always come first.
instance ToRow a => ToRow (WithID a) where
    toRow (UUIDID id_ obj) = [toField id_] <> toRow (obj)
    toRow (Int64ID id_ obj) = [toField id_] <> toRow (obj)

instance ToRow (FullySpecifiedTask TaskState) where
    toRow t = toRow (fsTaskName t, fsTaskDesc t, fsTaskState t)

instance FromRow a => FromRow (WithID a) where
    -- While normally FromRow instances are written like: `ValueConstructor <$> field <*> field ...`
    -- I can't figure out how to cleanly construct and build the partial result using applicatives
    -- since I need to pull out the ID, set it aside, then work on the rest, *then* use the right GADT constructor for WithId a
    fromRow = field -- pull out first field (we assume it's the ID)
              >>= \idSQLData -> fromRow -- parse the rest of the fields into an `a` value
              >>= chooseCtor idSQLData -- Given the SQLData, use the right GADT constructor on the id + the `a` value
        where
          chooseCtor sqldata = case sqldata of
                                 (SQLText txt) -> \obj -> case fromText txt of
                                                            Nothing -> throw (ConversionFailed (show sqldata) "Text" "Invalid UUID failed fromText conversion")
                                                            Just uuid -> pure $ UUIDID uuid obj
                                 (SQLInteger int) -> \obj -> pure $ Int64ID (fromIntegral int) obj
                                 _ -> throw (ConversionFailed (show sqldata) "???" "Unrecognized contents in ID field (no valid WithID GADT constructor)")

instance FromRow a => FromRow (Identity a) where
    fromRow = fromRow

instance FromField a => FromField (Identity a) where
    fromField = fromField

instance FromField TaskState where
    fromField f = case fieldData f of
                    SQLText txt -> pure $ read $ DT.unpack txt
                    fd -> returnError ConversionFailed f "Unexpected TaskState field type"

instance (FromField state) => FromRow (FullySpecifiedTask state) where
    fromRow = Task <$> field <*> field <*> field

instance (FromField state) => FromRow (PartialTask state) where
    fromRow = Task <$> field <*> field <*> field

Some notes about the code above:

  • The FromRow instance for WithID a was particularly difficult since I couldn’t use the normal applicative method that sqlite-simple lets you use – they did the hard work so I could write something like Task <$> field <*> field <*> ..., and I did the hard work to make sure I couldn’t do that (due to using a compound data type like WithId a and making it a GADT). I tried to write the code as clearly as possible otherwise to offset the complexity.
  • I also had to define some trivial instances for Identity a along the way (sqlite-simple defines one for Maybe a as well).
  • I went back and added auto-derived FromField instances for TaskName and TaskDesc
  • The FromRow instances for FullySpecifiedTask and PartialTask are very easy given that FromField is defined for everything
  • I avoided using DeriveAnyClass since I wasn’t sure if there were default implementations for everything

Yes, reasoning about and writing all these instances took a while (and I’ve used sqlite-simple before) – don’t be scared away by the amount of code that was necessary. I did made things quite a bit harder for myself by using a GADT (WithID a) and by choosing slightly more complicated data structures (inclusion of f & state in Task f state), but all in all I felt like I understood things as I as going through it step-by-step, but it definitely took at least a few minutes at every compiler to realize where I was mistaken about what I thought was happening versus what the compiler was seeing.

Trying to think objectively about the amount of mental effort expended it might be approach self-inflicted masochism… Good thing I’m not thinking about it too much. A part of me can justify this with the likely fact that if I wasn’t doing the same amount of work in another language it would be because the code I was writing was buggy code that didn’t consider all the cases – or I would have spent this time trying to understand some other library that did the work.

By now it’s pretty obvious but sqlite-simple works by defining and using a whole bunch of instances of things like FromRow and ToRow (similar to aeson), with FromField and ToField as ancillary typeclasses. From what I can tell sqlite-simple doesn’t support generic deriving of the FromRow and ToRow types (like aeson does using GHC.Generics), so I’ve written instances manually. We should have enough to write out to work out a definition for persistTask:

disconnectionError :: IO (Either TaskStoreError a)
disconnectionError = pure $ Left $ Disconnected "Store is disconnected"

makeGenericInsertError :: SomeException -> IO (Either TaskStoreError a)
makeGenericInsertError = pure . Left . UnexpectedError . ("INSERT command failed: " <>) . DT.pack . show

saveAndReturnTask :: ToField state => Connection -> WithID (FullySpecifiedTask state) -> IO (Either TaskStoreError (WithID (FullySpecifiedTask state)))
saveAndReturnTask c t = catch doInsert makeGenericInsertError
    where
      doInsert = execute c "INSERT INTO tasks (uuid, name, desc, state) VALUES (?,?,?,?)" t
                 >> pure (Right t)

instance TaskStore SQLiteTaskStore where
    persistTask  :: SQLiteTaskStore -> Validated (FullySpecifiedTask TaskState) -> IO (Either TaskStoreError (WithID (FullySpecifiedTask TaskState)))
    persistTask store (Validated newTask) = maybe disconnectionError _handler $ stsConn store
        where
          -- | _handler does all the real work of persisting a task
          _handler conn = (flip UUIDID newTask <$> nextRandom) -- Use a random UUIDV4 to make a new `WithID (FullySpecifiedTask state)`
                          -- Insert the task
                          >>= saveAndReturnTask conn

    -- ... other typeclass methods that are *still undefined*

It’s pretty clear that I should start moving this implementation (along with all the ToX/FromX instances) out of Types.hs, so it’s been moved to Components/TaskStore/SQLite.hs, and I turned on InstanceSigs so I could keep the signatures close to the implementation code. A few notes on the implementation:

  • I had to get specific about FullySpecifiedTask state, using TaskState (which is the what we want anyway)
  • Note that most of the functions that actually do real work are outside the implementation code and are just free floating functions
  • Having a Maybe Connection means I need to use the maybe helper method to handle when the connection is missing
  • The dangerous SQL-y parts are wrapped in a catch from Control.Exception so we should be safe

NOTE on the gratiuitous use of IO: Other monads and monad stacks exist, and even though IO might not be the right one in the end, there’s no need to worry. We’ll get more abstract in the next post (I’ll give you a hint, this whole component itself is abstractable/generalizable, not just the generous helping of IO) – for now, let’s keep it breezy and make peace with a specification that allows for anything to happen (which is what I imagine when I see IO at least), as long as the results we expect (Either TaskStoreError a) come back. Also keep in mind this popular haskell meme:

haskell monadic io brain meme

It’s popular meme because there’s a hint of truth – IO is the console.log/println/putStrln of monadic state management; old reliable by your side since day 1. Sometimes it’s easier and more comfortable to just reach for your old friends. We’ll be exploring more of this area in the next post.

At this point things compile but I’m pretty worried about the raw SQL string I’m using there. Other than the fact that it’s a general code smell to just have a string sitting there like that (at the very least we could factor it out so it could be reused), while the query is right there’s a bit of an issue… The query looks right but how do we know the backing SQLite database even has a tasks table?

SIDETRACK: Migrations

As with any other language, we need some concept of migrations (or some kind of check at least) to ensure that our program-external database management system is in a consistent state when we start using it! We wrote a query that assumes a table called tasks exists, but it may not exist in the on-disk database’s schema. The easy way to do this is to make sure at SQLiteTaskStore construction time (where we first connect to the database and make our Just Connection), the database is migrated to the version we expect and throws an error otherwise. We can express this requirement on our component with a typeclass:

-- | Holds a database version (expected to be a monotonically increasing number)
newtype SQLMigrationVersion = SQLMigrationVersion { getMigrationVersion :: Int } deriving (Eq, Show, Read)

-- | Holds a SQL Query
newtype SQLMigrationQuery = SQLMigrationQuery { getMigrationQuery :: DT.Text } deriving (Eq, Show, Read)

-- | Specifies a `SQLMigrationVersion` that is the source of a migration path
type FromSQLMigrationVersion = SQLMigrationVersion

-- | Specifies a `SQLMigrationVersion` that is the target of a migration path
type ToSQLMigrationVersion = SQLMigrationVersion

data SQLMigration = SQLMigration
    { smFrom  :: FromSQLMigrationVersion
    -- ^ The starting migration version
    , smTo    :: ToSQLMigrationVersion
    -- ^ The ending migration version
    , smQuery :: SQLMigrationQuery
    -- ^ Query to execute to perform the migration (also responsible)
    }

data MigrationError = NoMigrationPath -- ^ A path between the migrations could not be found
                    | MigrationQueryFailed FromSQLMigrationVersion ToSQLMigrationVersion -- ^ An individual migration query failed
                      deriving (Eq, Show)

class TaskStore store => HasMigratableDB store where
    -- | Retreive the desired version, this is normally just statically set @ compile time
    desiredVersion :: store -> IO SQLMigrationVersion

    -- | A list of available migrations that will be used by `migrateTo` to create a path from current (via `getCurrentVersion`) to `desiredVersion`
    availableMigrations :: store -> IO [SQLMigration]

    -- | Retrieve the current version of the database
    getCurrentVersion :: store -> IO SQLMigrationVersion

    -- | Perform migrations to get to the current version
    migrate :: SQLiteTaskStore -> IO (Either MigrationError ())
    migrate store = desiredVersion store >>= migrateTo store

    -- | Finds and executes a path to the requested ToSQLMigration from
    --   Currently when looking through `availableMigrations`, a monotonically increasing version number is assumed,
    --   This means paths are made from version to version generally in one version increments (1 --[migrateTo]--> 2 --[migrateTo]-> 3)
    migrateTo :: store -> ToSQLMigrationVersion -> IO (Either MigrationError ())

You might notice that I just suddenly realized that I absolutely under-utilizing Haskell’s wonderful doc-commenting system. Fintan Halpenny wrote an amazing summary post on some doc coment features that is worth a read, it was a huge eye-opener for me and I’ve kept it bookmarked for moments such as these. At some point I’ll probably go through the whole project and document lots more stuff (I know everyone says it but maybe it will actually happen this time).

We didn’t necessarily have to write this as a typeclass but I personally like the organization effect, if only for ergonomics. When someone defines a HasMigratableSQLDB, they will be defining it for some specific type (in our case SQLiteTaskStore), and can think in those terms, rather than managing a grouping of functions and constants in a file. Either way, here’s a simplistic implementation:

Components/TaskStore/SQLite.hs:


instance FromRow SQLMigrationVersion where
    fromRow = fromRow

-- | Helper function for making migration failed errors
makeMigrationFailedError :: SQLMigration -> SomeException -> IO (Either MigrationError a)
makeMigrationFailedError m = pure . Left . MigrationQueryFailed from to . DT.pack . show
    where
      from = smFrom m
      to = smTo m

executeMigration :: Connection -> SQLMigration -> IO (Either MigrationError ())
executeMigration conn m = catch runQuery (makeMigrationFailedError m)
    where
      query = Query $ getMigrationQuery $ smQuery m
      runQuery = withTransaction conn (execute_ conn query)
                 >> pure (Right ())

-- | Helper function for making `VersionFetchFailed` `MigrationError`s
makeVersionFetchFailedError :: SomeException -> IO (Either MigrationError a)
makeVersionFetchFailedError = pure . Left . VersionFetchFailed . ("Unexpected version fetch failure: " <>) . DT.pack . show

getDBMigrationVersion :: Connection -> IO (Either MigrationError SQLMigrationVersion)
getDBMigrationVersion c = catch runQuery makeVersionFetchFailedError
    where
      getVersionQuery = Query "PRAGMA user_version;" -- Happens to return 0 if never set before in SQLite
      runQuery = query_ c getVersionQuery
                 >>= \results -> pure $ case results of
                                          [v, _] -> Right v
                                          [] -> Left (VersionFetchFailed "Version retrieval query returned no results")

instance HasMigratableDB SQLiteTaskStore where
    desiredVersion :: SQLiteTaskStore -> IO SQLMigrationVersion
    desiredVersion _ = pure (SQLMigrationVersion 1)

    availableMigrations :: SQLiteTaskStore -> IO [SQLMigration]
    availableMigrations _ = pure $ sort migrations

    getCurrentVersion :: SQLiteTaskStore -> IO (Either MigrationError SQLMigrationVersion)
    getCurrentVersion = maybe _error _handler . stsConn
        where
          _error = pure $ Left $ VersionFetchFailed "Fetching current version failed"
          _handler = getDBMigrationVersion

    migrateTo :: SQLiteTaskStore -> ToSQLMigrationVersion -> IO (Either MigrationError ())
    migrateTo s expected = maybe unexpectedMigrationErr tryHandler $ stsConn s
        where
          unexpectedMigrationErr :: IO (Either MigrationError ())
          unexpectedMigrationErr = pure $ Left $ UnexpectedMigrationError "Failed to retrieve DB connection"

          convertToUnexpectedError :: SomeException -> IO (Either MigrationError ())
          convertToUnexpectedError = const $ pure $ Left $ UnexpectedMigrationError ""

          tryHandler :: Connection -> IO (Either MigrationError ())
          tryHandler conn = catch (handler conn) convertToUnexpectedError

          -- | Recursively (!) runs all migrations by
          --   There's quite a bit of wasted effort in here, but it's probably good enough (assuming it finishes :)).
          handler :: Connection -> IO (Either MigrationError ())
          handler conn = availableMigrations s
                         -- ^ Get the list of current migrations
                         >>= \usableMigrations -> getCurrentVersion s
                         -- ^  Get the current version
                         >>= rightOrThrow
                         -- ^  Get the current version
                         >>= \current -> pure (findNextMigration usableMigrations current)
                         -- ^ Determine the next migration
                         >>= \case
                             -- | We're either done or something went wrong
                             Nothing -> pure $ if current == expected then Right () else Left NoMigrationPath
                             -- | Perform a single migration then recur
                             Just m -> executeMigration conn m
                                     >>= rightOrThrow
                                     >> handler conn

          -- | We are assuming monotonically increasing version numbers here, and that there exists at least
          --   *one* migration between every version (i.e. v1->v2, v2->v3, etc). This is a bad assumption to make generally,
          --   but I'm OK with it since this is generally how most people make migrations in my mind, implementation can change later if need be
          findNextMigration ms current = find ((current+1==) . smFrom) ms

Components/TaskStore/Migrations/SQlite:

{-# LANGUAGE QuasiQuotes #-}
{-# LANGUAGE OverloadedStrings #-}

module Components.TaskStore.Migrations.SQLite where

import NeatInterpolation (text)
import Types (SQLMigration(..), SQLMigrationQuery(..))

migrations :: [SQLMigration]
migrations =
    [SQLMigration
     { smFrom=0
     , smTo=1
     , smQuery=
         SQLMigrationQuery
         [text|
               CREATE TABLE tasks(
                 uuid TEXT PRIMARY KEY NOT NULL,
                 name TEXT NOT NULL,
                 description TEXT NOT NULL,
                 status TEXT NOT NULL
               );
               |]
     }
    ]

Some notes on the code above:

  • More instance shenanigans for the newtype’d SQLMigrationVersion class (it’s just an Int)
  • Again, most of the hard work is done outside of the typeclass instance and just called to
  • hlint put me on to the LambdCase language extension which is pretty handy
  • Haskell doesn’t have good support for HEREDOCs, but Thanks to Nikita Volkov for making an easy to use neat-interpolation library that does this

Again, it’s important to note here that there are many other libraries out there to do migrations on hackage, and some teams are actually more comfortable doing migrations outside the API (middle layer) code itself with libraries like Flyway. The above code is a relatively hastily written, simple implementation showing how you could write migration code yourself, you almost definitely don’t want to use this in production, though in my opinion it’s “good enough”. Database schema migrations can get very complicated very quickly, but usually not due to the machinery above – it’s usually due to the hard-to-see effects of migration code, interdependencies and difficulty in corralling disparate data and semantics that make them difficult, usually migration machinery is set-it-and-forget-it.

Let’s make sure some of this stuff actually works

OK, we’ve written two pretty big pieces of functionality, and up until now we’ve gone by the the basic “if it compiles it works” philosphy. While that statement is generally true (one of the reasons I love Haskell), Haskell programmers are at their most vulnerable when dealing with the outside world – in this case SQLite – and we should write some tests to make sure we’re not doing things terribly wrong, which will serve to prevent regressions going forward.

Since a test of the TaskStore component is not quite at whole-app level (neither CLI nor HTTP API), I’d categorize the test we’re about to write as an integration test. SQLite makes it pretty easy to do this test since we can use :memory: as the location of our file, which is fantastic for not persisting data across tests – we can even run these tests in parallel, all we have to do is make sure to start completely different object instances (or use one and clear it every time). We’ve only implemented a small bit of the actual API (persistTask), and we’ve just barely gotten a theoretically working migration implementation, so we should be good to at least test that creating and saving one task doesn’t crash and burn.

One problem though – pretty early on into the writing of the tests I realized that I hadn’t actually made the Constructable typeclass instance for SQLiteTaskStore, which means I couldn’t construct one… Here’s the instance I needed:

instance Constructable SQLiteTaskStore CompleteTaskStoreConfig TaskStoreError where
    construct :: CompleteTaskStoreConfig -> IO (Either TaskStoreError SQLiteTaskStore)
    construct cfg = catch makeStore connectionFailure
        where
          dbPath :: String
          dbPath = show $ runIdentity $ tscDBFilePath cfg

          makeStore :: IO (Either TaskStoreError SQLiteTaskStore)
          makeStore = open dbPath
                      >>= \conn -> pure (Right (SQLiteTaskStore cfg (Just conn)))

          connectionFailure :: SomeException -> IO (Either TaskStoreError SQLiteTaskStore)
          connectionFailure = pure . Left . ConnectionFailure . ("Failed to connect to DB: "<>) . DT.pack . show

Now that we can actually construct a SQLiteTaskStore, let’s get on to testing:

module Components.TaskStore.SQLiteSpec (spec) where

import Components.TaskStore.SQLite (SQLiteTaskStore)
import Types (Constructable(..), HasMigratableDB(..), TaskStore(..), TaskStoreError, SQLMigrationVersion(..))
import Config (defaultCompleteTaskStoreConfig)
import Data.Either (isRight)
import Control.Monad.IO.Class (liftIO)
import Util (rightOrThrow)

import Test.Hspec

makeDefaultStore :: IO (Either TaskStoreError SQLiteTaskStore)
makeDefaultStore = construct defaultCompleteTaskStoreConfig

main :: IO ()
main = hspec spec

spec :: Spec
spec = do
  describe "task store creation" $ do
         it "works with the default config" $ \_ -> liftIO makeDefaultStore
                                                    >>= (`shouldBe` True) . isRight

  describe "task store migration" $ do
         it "migrates with the default config (0 -> 1)" $ \_ -> liftIO makeDefaultStore
                                                                -- Default stores
                                                                >>= rightOrThrow
                                                                -- migrate migrates to `desiredVersion`
                                                                >>= migrate
                                                                >>= (`shouldBe` True) . isRight

Trying to run the first test worked out just fine, but my worst fears came true with the second, an error occurred!

DEBUG: GHCI hanging when running getCurrentVersion

My first assumption was that I must have gotten something wrong with the recursive code in migrateTo, but upon closer inspection, sqlite-simple was throwing an unexpected AsyncCancelled that I had never seen before (^C is where I stopped the ghci because it was hung):

*Main Components.TaskStore.SQLiteSpec
λ main

Components.TaskStore.SQLite
  task store creation
    works with the default config
  task store migration
^C
Current version: Left (VersionFetchFailed "Unexpected version fetch failure: AsyncCancelled")
usable migrations: [SQLMigration {smFrom = SQLMigrationVersion {getMigrationVersion = 0}, smTo = SQLMigrationVersion {getMigrationVersion = 1}, smQuery = SQLMigrationQuery {getMigrationQuery = "CREATE TABLE tasks(\n  uuid TEXT PRIMARY KEY NOT NULL,\n  name TEXT NOT NULL,\n  description TEXT NOT NULL,\n  status TEXT NOT NULL\n);\n"}}]
Interrupted.

I’d never seen this error before I double checked some other code I’ve written with sqlite-simple and it was was exactly the same, so I started to figure I might have come unto an actual GHC bug. I checked the version of GHC that Stack was using:

$ stack ghc -- --version

And I saw that it was 8.4.3 – which instantly triggered some vague memories of issues with the 8.4.3 GHC release and figured this might be related. What I did was actually to move forward (!!) in LTS Stack resolvers (and resultingly in the GHC release the comes bundled), after taking a look at list of stackage releases on the Stackage home page – I went with resolver: lts-12.18 (in stack.yaml) and re-ran stack build for all the targets I cared about:

$ stack build
$ stack build :int
$ stack build :unit

After a while stack was done re-installing everything, so I went into the GHCI with the :int tag and ran the tests again, and got the same error. At this point, I looked back at the code I had and it’s running on resolver 11.6 so I tried with that instead, and was greeted with a different error:

[4 of 7] Compiling Config           ( src/Config.hs, .stack-work/dist/x86_64-linux-ncurses6/Cabal-2.0.1.0/build/Config.o)

/home/mrman/Projects/foss/haskell-restish-todo/src/Config.hs:72:10: error:
Not in scope: type constructor or class ‘Semigroup’
|
72 | instance Semigroup CompleteAppConfig where
|          ^^^^^^^^^

/home/mrman/Projects/foss/haskell-restish-todo/src/Config.hs:78:10: error:
Not in scope: type constructor or class ‘Semigroup’
|
78 | instance Semigroup PartialAppConfig where
|          ^^^^^^^^^

/home/mrman/Projects/foss/haskell-restish-todo/src/Config.hs:104:10: error:
Not in scope: type constructor or class ‘Semigroup’
|
104 | instance Semigroup CompleteTaskStoreConfig where
|          ^^^^^^^^^

/home/mrman/Projects/foss/haskell-restish-todo/src/Config.hs:110:10: error:
Not in scope: type constructor or class ‘Semigroup’
|
110 | instance Semigroup PartialTaskStoreConfig where
|          ^^^^^^^^^

Completed 3 action (s).

So the move back presented a few issues:

  • In this earlier snapshot of the resolver 11.6, Semigroup is not accessible so I had to import it from Data.Semigroup explicitly
  • Explicit Data.Monoid imports had to be added as well
  • An instance of FromField for SQLData from a newer version of sqlite-simple had to be copied into my code– I guess the version the revolver was using didn’t have it.

After doing all this, the code finally started to build again and I got the same error, but different, and this time it didn’t crash GHCI:

$ stack build :int
haskell-restish-todo-0.1.0.0: test (suite: int)


Components.TaskStore.SQLite
  task store creation
    works with the default config
  task store migration
Current version: Left (VersionFetchFailed "Unexpected version fetch failure: <<loop>>")
... other output ...
migrates with the default config (0 -> 1) FAILED [1]

So good news and bad news here – the ghci isn’t hanging anymore, but I am stil having a problem with querying PRAGMA user_version from the :memory: SQLite DB. Thanks to a Stack Overflow thread on the subject it looks like <<loop>> means that the RTS system detected an endless loop at runtime. Going back and taking a hard look at the code again, this error happens even with the recursive bits commented out:

    migrateTo :: SQLiteTaskStore -> ToSQLMigrationVersion -> IO (Either MigrationError ())
    migrateTo s expected = maybe unexpectedMigrationErr tryHandler $ stsConn s
        where
          unexpectedMigrationErr :: IO (Either MigrationError ())
          unexpectedMigrationErr = pure $ Left $ UnexpectedMigrationError "Failed to retrieve DB connection"

          convertToUnexpectedError :: SomeException -> IO (Either MigrationError ())
          convertToUnexpectedError = const $ pure $ Left $ UnexpectedMigrationError ""

          tryHandler :: Connection -> IO (Either MigrationError ())
          tryHandler conn = catch (handler conn) convertToUnexpectedError

          -- | Recursively (!) runs all migrations by
          --   There's quite a bit of wasted effort in here, but it's probably good enough (assuming it finishes :)).
          handler :: Connection -> IO (Either MigrationError ())
          handler conn = availableMigrations s
                         -- ^ Get the list of current migrations
                         >>= \usableMigrations -> getCurrentVersion s
                         >>= \currentVersion -> putStrLn ("Current version: " ++ show currentVersion)
                         >> putStrLn ("usable migrations:" ++ show usableMigrations)
                         >> pure (Left (UnexpectedMigrationError "NOPE"))
                         -- -- ^  Get the current version
                         -- >>= rightOrThrow
                         -- -- ^  Get the current version
                         -- >>= \current -> pure (findNextMigration usableMigrations current)
                         -- -- ^ Determine the next migration
                         -- >>= \case
                         --     -- | We're either done or something went wrong
                         --     Nothing -> pure $ if current == expected then Right () else Left NoMigrationPath
                         --     -- | Perform a single migration then recur
                         --     Just m -> executeMigration conn m
                         --               >>= rightOrThrow
                         --               >> handler conn

          -- | We are assuming monotonically increasing version numbers here, and that there exists at least
          --   *one* migration between every version (i.e. v1->v2, v2->v3, etc). This is a bad assumption to make generally,
          --   but I'm OK with it since this is generally how most people make migrations in my mind, implementation can change later if need be
          findNextMigration ms current = find ((current+1==) . smFrom) ms

To refresh your memory (since you’re on this bug hunt with me now), this is what getCurrentVersion and the relevant helper methods look like:

    getCurrentVersion :: SQLiteTaskStore -> IO (Either MigrationError SQLMigrationVersion)
    getCurrentVersion = maybe _error _handler . stsConn
        where
          _error = pure $ Left $ VersionFetchFailed "Fetching current version failed"
          _handler = getDBMigrationVersion

getDBMigrationVersion :: Connection -> IO (Either MigrationError SQLMigrationVersion)
getDBMigrationVersion c = catch runQuery makeVersionFetchFailedError
    where
      getVersionQuery = Query "PRAGMA user_version;" -- Happens to return 0 if never set before in SQLite
      runQuery = (query_ c getVersionQuery :: IO [SQLMigrationVersion])
                 >>= \results -> pure $ case results of
                                          [v, _] -> Right v
                                          [] -> Left (VersionFetchFailed "Version retrieval query returned no results")

Looks like the loop has to be in here, but weirdly enough, if I remove query_ (and replace it with something like pure (MigrationVersion 0) it works (and doesn’t infinite loop in the recursive bit because that’s commented out)… All of a sudden it hit me – here’s the definition of FromRow for SQLMigrationField:

instance FromRow SQLMigrationVersion where
    fromRow = fromRow

Yep, I’m an idiot – this is obviously not an OK definition of FromRowthis is where the infinite loop ghc is running into is! What I needed was:

instance FromRow SQLMigrationVersion where
    fromRow = SQLMigrationVersion <$> field

What a dumb thing to do, some part of me thought that somehow ghc was equating the fromRow definition there to the fromRow for Int because SQLMigrationVersion is a newtype. Once I re-ran the weird error was gone:

Progress 1/2: haskell-restish-todo-0.1.0.0
Components.TaskStore.SQLite
  task store creation
      works with the default config
        task store migration
        Current version: Right (SQLMigrationVersion {getMigrationVersion = 0})

But wait, since SQLMigrationVersion is a newtype maybe I can just use a standalone deriving instance to get the right answer? Now that I have some working code, I tried deleting the FromRow instance for SQLMigrationVersion and replacing it with a standalone deriving instance:

{-# LANGUAGE StandaloneDeriving #-}
{-# LANGUAGE GeneralizedNewtypeDeriving #-}

-- instance FromRow SQLMigrationVersion where
--     fromRow = SQLMigrationVersion <$> field

deriving instance FromRow SQLMigrationVersion

Weirdly enough GHC spits out this error:

/home/mrman/Projects/foss/haskell-restish-todo/src/Components/TaskStore/SQLite.hs:156:1: error:
     Couldn't match type Int with SQLMigrationVersion
        arising from a use of ghc-prim-0.5.1.1:GHC.Prim.coerce

It can’t figure out that SQLMigrationVersion is actually Int, which is weird, because in my mind GeneralizedNewtypeDeriving was giving us exactly that. Since FromRow is DB implementation specific, it’s not really right to move FromRow to Types.hs and try to do it there… Now that I’ve actually solved the loop issue, I think that might be what was causing the AsyncCancelled error I was seeing – assuming the sqlite-simple version in the newer resolvers does it’s work in an async context, and the async context was crashing when it encountered the bad FromRow instance I wrote. Unfortunately the error is still present after upgrading my resolver version so I’m going to go ahead and leave the instance in there as-is (no deriving for me!).

With this issue taken care of (and all the dumb infinitely loop instances I wrote removed), we can get back to actually trying to test the migration functionality.

Running the migration test, Round 2

OK, with everything back to normal, let’s see what the migration test returns:

Components.TaskStore.SQLite
  task store creation
    works with the default config
  task store migration
    migrates with the default config (0 -> 1) FAILED [1]

Failures:

  /home/mrman/Projects/foss/haskell-restish-todo/test/Integration/Components/TaskStore/SQLiteSpec.hs:30:70:
  1) Components.TaskStore.SQLite, task store migration, migrates with the default config (0 -> 1)
       expected: Right ()
        but got: Left (UnexpectedMigrationError "src/Components/TaskStore/SQLite.hs:(174,41)-(176,119): Non-exhaustive patterns in case\n")

Good news/bad news:

  • I didn’t have a recursion error at all
  • There’s still a bug, but this time with a non-exhaustive case

DEBUG: Fixing the non-exhaustive case

Non exhaustive case errors like this are never supposed to happen in Haskell. The offending case is:

                 >>= \results -> pure $ case results of
                                          [v, _] -> Right v
                                          [] -> Left (VersionFetchFailed "Version retrieval query returned no results")

Welp, it’s a bit obvious that the only case I didn’t cover here was the case I actually care about – the [v] case! I mixed up the syntax as well, here’s the fixed version:

                 >>= \results -> pure $ case results of
                                          (v:_) -> Right v
                                          _     -> Left (VersionFetchFailed "Version retrieval query returned no results")

With this out of the way, let’s run it again!

Failures:

  /home/mrman/Projects/foss/haskell-restish-todo/test/Integration/Components/TaskStore/SQLiteSpec.hs:30:70:
  1) Components.TaskStore.SQLite, task store migration, migrates with the default config (0 -> 1)
       expected: Right ()
        but got: Left NoMigrationPath

Oh no, looks like my shitty algorithm can’t find a migration path from 0 to 1, with only one migration in there…

DEBUG: Fixing my quick and dirty migration logic

Looksl ike the logic was quick, dirty, and wrong – weirdly enough the issue was actually in the simplest function – determining whether there is a next step. Here’s teh definition:

          findNextMigration ms current = find ((current+1==) . smFrom) ms

Can you see what’s wrong? Well it’s pretty obvious now, but smFrom should at least be an smTo – we want to know that the next migration goes to current + 1, not that it goes from current + 1. I beefed up the assumption a little bit and fixed it:

          isNextStep current migration = smFrom migration == current && smTo migration == current + 1
          findNextMigration migrations current = find (isNextStep current) migrations

When I run it again, I get:

Failures:

  /home/mrman/Projects/foss/haskell-restish-todo/test/Integration/Components/TaskStore/SQLiteSpec.hs:30:70:
    1) Components.TaskStore.SQLite, task store migration, migrates with the default config (0 -> 1)
           expected: Right ()
                   but got: Left (UnexpectedMigrationError "MigrationQueryFailed (SQLMigrationVersion {getMigrationVersion = 0}) (SQLMigrationVersion {getMigrationVersion = 1}) \"SQLite3 returned ErrorError while attempting to perform prepare \\\"CREATE TABLE tasks(\\\\n  uuid TEXT PRIMARY KEY NOT NULL,\\\\n  name TEXT NOT NULL,\\\\n  description TEXT NOT NULL,\\\\n  status TEXT NOT NULL\\\\n);\\\\n\\\": table tasks already exists\"")

Which is much better since the migration is being run at all, but stil obviously an issue since the query is failing.

DEBUG: In-memory DB already contains tasks table

As you can see from the previous error output, table tasks already exists – which is ridiculous, because I’m working on an in-memory database (:memory:), that should be empty. Realistically the only way this would happen is if I migrated twice, or if my migration logic had an infinite loop. We can rule out the infinite loop case since the MigrationQueryFailed error gives us the versions it was trying to migrate from and to (MigrationQueryFailed (SQLMigrationVersion {getMigrationVersion = 0}) (SQLMigrationVersion {getMigrationVersion = 1}) means that we were going from 0 -> 1).

After a few seconds of headscratching, I realized the problem – I never added code to update PRAGMA user_version! Another boneheaded move! In particular, I should have been calling PRAGMA user_version = <update version> after every successful migration! Here’s the updated code:

executeMigration :: Connection -> SQLMigration -> IO (Either MigrationError ())
executeMigration conn m = catch runQuery (makeMigrationFailedError m)
    where
      migrationQuery = Query $ getMigrationQuery $ smQuery m
      versionUpdateQuery = Query $ ("PRAGMA user_version = " <>) . DT.pack . show $ getMigrationVersion $ smTo m

      migrateAndUpdateVersion = execute_ conn migrationQuery
                                >> execute_ conn versionUpdateQuery
      runQuery = withTransaction conn migrateAndUpdateVersion
                 >> pure (Right ())

After that fix, all is well:

*Main Components.TaskStore.SQLiteSpec
λ main

Components.TaskStore.SQLite
  task store creation
    works with the default config
  task store migration
    migrates with the default config (0 -> 1)

Finished in 0.0021 seconds
2 examples, 0 failures

Now I can add one more test, to make sure that we can persist a task without failing:

-- generateTask :: IO (Validated NotStartedTask)
generateTask :: IO (Validated (Task Identity TaskState))
generateTask = rightOrThrow $ validate $ Task { tName        = Identity $ TaskName "example"
                                              , tDescription = Identity $ TaskDesc "this is a example task"
                                              , tState       = Identity NotStarted
                                              }

  describe "task store persistTask" $ do
         it "works with default config" $ \_ -> liftIO makeDefaultStore
                                                >>= rightOrThrow
                                                >>= \store -> migrate store
                                                >> generateTask
                                                >>= \expected -> persistTask store expected
                                                >>= rightOrThrow
                                               -- | Ensure that the ID is non-empty when printed, and the object we got back is right
                                                >>= \actualWithID -> pure (showID actualWithID /= "" && withoutID actualWithID == getValidatedObj expected)
                                                >>= shouldBe True

You might be wondering why generateTask has the signature IO (Validated (Task Identity TaskState) and not IO (Validated NotStartedTask) – well that signature would be more ergonomic but there’s a problem – GHC can’t tell that Validated NotStartedTask (expanded, Validated (Task Identity NotStarted)) is the *same thing as a Task Identity TaskState, since TaskState can be multiple values. To restate, once persistTask returns us an object, we don’t know what state it’s going to be in when it comes back from the database! Remember, we actually abstract over TaskState all over persistTask (which allows us to accept pre-completed or in-progress tasks as they are):

    persistTask  :: c -> Validated (FullySpecifiedTask TaskState) -> IO (Either TaskStoreError (WithID (FullySpecifiedTask TaskState)))

I had to add some instances but all in all, the test is what we want. As you might have expected at this point, when we run it, it fails:

Failures:

  /home/mrman/Projects/foss/haskell-restish-todo/test/Integration/Components/TaskStore/SQLiteSpec.hs:58:10:
    1) Components.TaskStore.SQLite, task store persistTask, works with default config
           uncaught exception: TaskStoreError
                  UnexpectedError "INSERT command failed: SQLite3 returned ErrorError while attempting to perform prepare \"INSERT INTO tasks (uuid, name, desc, state) VALUES (?,?,?,?)\": table tasks has no column named desc"

Looks like I’ve missnamed a column (description to be precise). Easily enough I can change the query and make sure it matches with the migration:

saveAndReturnTask :: ToField state => Connection -> WithID (FullySpecifiedTask state) -> IO (Either TaskStoreError (WithID (FullySpecifiedTask state)))
saveAndReturnTask c t = catch doInsert makeGenericInsertError
    where
      doInsert = execute c "INSERT INTO tasks (uuid, name, desc, state) VALUES (?,?,?,?)" t
                 >> pure (Right t)

(yes, I know this sucks and is very abstractable/fixable – the next post will cover how to make this better)

While we’re there, I also misnamed the status column in the SQLite migration so I’ve fixed that as well. After fixing those issues and rerunning it, we get… a successful test run! Looks like persist task is indeed doing to the database and back and bringing us back what we put in, with an ID!

I really want to keep the ability to specify a tasks completion directly (by using types like CompletedTask, but it looks like have to re-work how to represent this, or figure out a way to tell the compiler that these two types are identical.

This section got way longer than I expected, but hopefully you got a good loook at my amateur debugging up close and weren’t too turned off by the issues I ran to. The bugs in my code were pretty silly – writing nonsensical instances, a non-exhaustive case (which I’m still a little surprised Haskell allowed me to do in the first place), and forgetting to update the version in the database itself.

Now we have a working test for the two big pieces so far we implemented – migrations and persistTask! To run the tests we can run:

$ stack build && stack test :unit :int
.. other output ...

haskell-restish-todo-0.1.0.0: test (suite: int)

Components.TaskStore.SQLite
  task store creation
    works with the default config
  task store migration
    migrates with the default config (0 -> 1)
  task store persistTask
    works with default config

Finished in 0.0011 seconds
3 examples, 0 failures

haskell-restish-todo-0.1.0.0: Test suite int passed
haskell-restish-todo-0.1.0.0: test (suite: unit)

Config
  defaults
    has localhost as the default host
    has 5000 as the default port
  default values
    CompleteAppConfig has default host
    CompleteAppConfig has default port
    PartialAppConfig has no default host
    PartialAppconfig has no default port

Finished in 0.0008 seconds
6 examples, 0 failures

haskell-restish-todo-0.1.0.0: Test suite unit passed

What about the other TaskStore methods?

I’m actually going ot skip the rest of the implementation (and you should too), becasue we’re going to actually generalize TaskStore next time, into an EntityStore. With the contents of this post, you should be able to imagine just how to write the implementations for completeTask, getTask, and updateTask, but it would suck to actually do, because we’d be writing very very Task-specific queries, when we’re going to have some other models in our domain. The next post will go into how we can generalize our TaskStore into an EntitySTore and write these functions (except for completeTask` of course, exactly once for domain models moving forward.

TLDR; The code

The code is posted up on Gitlab w/ the part-2 tag for your perusing pleasure. Note that it’s incomplete as it’s missing implementations for methods like completeTask, but don’t bother with them – we’re going to further generalize and abstract an EntityStore out of our TaskStore in the next post. persistTask should show the general flow, if you get the itch to write some code yourself.

Wrapup

Hopefully this post has been a decent practical guide to writing/modeling domains in Haskell and made it less daunting! While a lot of the methods used (like the migration strategy) are a bit questionable to use in production, they at the very least work reliably and are pretty well specified. We went a bit crazy with types this post and caused a bit of complications (defining instances, figuring out implementations) but again I want to remind everyone that you can ignore vast swaths of the advanced typing approaches until you’re ready, I tried to show the progression and decision making process I go through when I try to use the power Haskell offers. You can Get Stuff Done (tm) with Haskell without any of the advanced type stuff, and still enjoy the benefits of haskell on a simpler level.

Just to reassure everyone, we haven’t veered too far off course – the steps put forth in this post continue to represent the first steps of getting a decently architected REST-ish webservice up and running (domain models are important), but there is lots of tools out there that will get you much further than this much faster (ex. Postgrest, yesod, and in other languages, frameworks like Rails or Django), but if you are interested in building the internals at a lower level, hoping to have less batteries included and (hopefully) a better understanding of what’s happening, hopefully you’ve picked up a trick or two.

The next entry in this post is going to get hot and heavy in a hurry – believe it or not there’s a lot of abstraction at various levels and streamlining of the logic that has been left out in this one, along with the actual use of servant to serve HTTP or hooking up our domain TaskStore component to anything. The next post will focus on those things, generalizing, and we’ll get to making the actual API, finally!

UPDATE: Even more 💪 🎩 , thanks to r/haskell commenters

In the earliest sections, we created a Task type that was parametrized over two values – f for completeness (using Identity vs Maybe as f), and state for task completion status (the TaskStatus GADT and associated types). The point of doing this for me was to be able to differentiate a case like a partial record of a completed task (Task Maybe Finished) versus a fully specified incomplete task (Task Identity NotStarted). While the amount of benefit of adding the second parametrization, state, is questionable, I thought at least the partial/completely specified record distinction was interesting.

After sharing this post on Reddit, a few suggestions were made on how to clean up this code. While I’m not sure all of them are the way to go, they do deserve sharing, so here they are.

Rank2Classes / Conkin

u/blamario suggested the use of Rank2classes or Conkin. This offers a TemplateHaskell approach to automating the generation of the usual typeclasses surrounding a record type (Functor, Applicative, Foldable, etc) without writing them myself, to make it easier to interoperably use the types. Put another way, from what I understand rank2classes will make it much easier to write code that deals f and state (in Task f state) abstractly. the documentation for rank2classes is excellent and contains a better more in-depth description.

Conkin provides a similar solution and has similarly great and detailed documentation, check it out.

u/blamario provided a pretty in-depth description of how these types would be useful so please feel free to check that out as well.

DataKinds

u/Tarmen brought suggesting using the DataKinds extension (along with a detailed explanation, including one using type families thanks u/Tarmen!), I agreed that it might just be the concept to help simplify the Task f state definitions. Kinds as a concept (in haskell) and use of the DataKinds extension in my own code are somewhat new to me but I also found an excellent primer on the subject to re-enforce what I thought I already knew.

I personally liked this suggestion the most because it was minimally invasive syntax and project (dependency) wise. I’m in the process of reading up and trying to better understand.

NOTE - Definitely check out the primer I linked to earlier – outside of being an excellent post, the section on datatype promotion is extremely relevant and well explained.

KindSignatures

u/sjakobi suggested using the KindSignatures language extension to be more explicit about the Kinds that were being used. This suggestion was made separately but works well with the suggestion to use the DataKinds extension. The example provided by (u/sjakobi) was pretty explanatory:

{-# LANGUAGE KindSignatures #-}

data Task (f :: * -> *) = Task ...

type Partial (f :: (* -> *) -> *) = f Maybe

As you can see, the KindSigntaures extension allows you to be more explicit about the kind level action going on. I do not have the deep understanding to try and explain Haskell’s kind system (and any attempt would have made this post even longer) but the primer I linked to earlier in this section is a pretty great read.