Awesome FOSS Logo
Discover awesome open source software
Launched 🚀🧑‍🚀

Adding a very naive in-memory cache to my Haskell web app

Categories

tl;dr - I added some caching to an app I’m writing in Haskell using Servant. The implementation is pretty naive, but I’ve seen some speedups (as one would expect from skipping a database access), and am glad I was able to build such a simple solution in a language as expressive as haskell. Skip to the end TLDR section to see all the code laid out!

FAIR WARNING - this will is NOT an interesting article about caching algorithms or a quirk in GHC or optimization strategies. It’s just some notes on how I did some pre-mature optimization implementing a very naive and simple caching mechanism for some endpoints on a Servant-powered web application I’m writing. It also served as a good chance for me to start using Haskell’s STM and concurrency features.

Thanks to working in Haskell on this program for so long, I’ve almost forgotten that the large majority of my code is completely pure and stateless. It was a good feeling to be somewhat confused and how I would maintain a piece of often considered global state like a cache in haskell, and how I would deal with actually changing a value that had already been set. Thanks to immutability-by-default, I’ve been writing trivially parallelizable code this whole time.

Before starting, I thought that this was something I could actually get through fairly quickly, possibly within a day if I could get into a good “flow”. One of the best indicators of good “flow” when I’m writing Haskell is when the compiler doesn’t have to yell at me much (if at all). Rather than being the task master, the compiler becomes your buddy that pipes up every once in a while, and you never forget which parts of the system get affected by changes and generally move the whole system forward at one time, instead of just moving one bit and dealing with the fallout. Enough about that, let’s get into what I did.

Step 0: Some thinking about the problem

There’s a famous saying in Computer Science (attributed to Phil Karlton) that the only two hard things in Computer Science – cache invalidation and naming things. Now that I’m tackling one of them (albeit very naively), My initial assessment that this was going to be pretty simple worries me quite a bit. The concept is simple:

  • Do expensive database computation
  • Save the result
  • Use it later when the same exact query comes in
  • If the data affected by the query has changed, flush all or part of the cache

Astute readers will note that the if the data affected by the query has changed is the hard bit.

Since the project is a job board, I have an endpoint (Servant handler) jobsFTS (job Full Text Search) which takes a JobQuery type that completely describes the query being done on the database side. Since the state I care about for the hard bit is the database, and it can change at any time, it makes sense to consider the database state as a kind of explicit input to the jobsFTS function. So I have one explicit input (the JobQuery), and one implicit input (the database state). The function returns a PaginatedList JobWithCompany (yay nice and readable types), so that’s what my cache is going to have to store.

So what about the hard part? The cache invalidation part? I think I can avoid over thinking it: every time a job posting is added/removed/activated/inactivated, I just invalidate ALL the JobQuery caches. This is a VERY inefficient cache, but I’m fine with it, due to the current frequency of jobs actually being added/removed/activated/inactivated. All I need to do now is store the results (PaginatedList JobWithCompany values) in a concurrent/thread-safe way.

This is the good ol’ RTFM part. I spend a good bit of time reading up on Haskell’s STM wiki page along with the actual docs for the stm package. For some reason there were actually a bunch of dead links on the wiki page. At this point I’m very used to reading haskell generated documentation so I was very comfortable reading the docs of the stm package, but I could see how this would be very offputting for someone new to the language.

Step 0.1: Have the application structure in place already

Before we get into what I did, I want to point out that there’s a very real amount of tech-debt/cruft/code (however you view it) already existing – I’m starting from a app that I architected according to how I like to do things, so you’ll have to work a little harder than I do to understand the structure.

I think the structure of my application is really simple though, I’ve found and continue to find that the component based approached really strikes the right balance for me. Very simply put, if you organize your application in terms of big concerns like “sending emails” or “interacting with the database” or “caching”, you have a very organic separation and organization of code. Well organized code makes a huge difference in a large codebase. These large concerns often have some shared functionality, things like setting themselves up, or tearing themselves down, or getting status, and a language that gives you reasonably ergonomic approach to interfaces (I really like Golang’s approach for this, and of course Haskell’s typeclasses are stellar). Up until a while ago I’d associated this approach with bloated software, but it was important for me to realize personally that just because you wanted to add <Something>Components all over your code didn’t mean that your code would become the horror that is Java’s AbstractBeanComponentFactory-style frameworks like Spring. Of course, I didn’t invent this pattern, one of the best places I’ve seen it implemented is Stuart Sierra’s Component for clojure.

In my app I already have the structure in place, so today I’m only going to be adding a CacheBackend component (the name is a little unfortunate but I’m too lazy to change it, looks like I just hit the second hard thing in computer science). I was able to copy-pasta some code from my SearchBackend component ([I’ve written about it here][search-backend-post]), and do a quick s/Search/Cache/g and get on the way much quicker. If you don’t have this kind of structure in your own app, I’d highly recommend looking into whether it makes sense for you.

Step 1: Start building the CacheBackend

Just a few minutes of coding and I’m feeling crazy productive and like I’m doing things right, here’s what the early code (that doesn’t compile) looked like:

{-# LANGUAGE OverloadedStrings #-}

module Cache.CacheBackend ( makeConnectedCacheBackend
                          , CacheBackend(..)
                          , CacheKey(..)
                          ) where

import           Config (LoggerConfig(..), CacheConfig(..))
import           Data.Maybe (Maybe)

data LocalMemory = LocalMemory { searchCfg       :: CacheConfig
                               , searchLogger    :: Maybe Logger
                               }

data CacheKey = ActiveJobFTS
              \| JobFTS deriving Eq                                                                           |

data CacheBackendError = UnexpectedFailure deriving (Eq)

instance Show CacheBackendError where
show UnexpectedFailure = "An unexpected failure occurred"

makeConnectedCacheBackend :: CacheConfig -> Maybe Logger -> IO (Either SomeException CacheBackend)
makeConnectedCacheBackend c maybeL = connectCache SQLiteFTS { cacheCfg=c
, cacheLogger=maybeL
}

logErrAndReturn :: CacheBackend -> String -> SomeException -> IO CacheBackend
logErrAndReturn c msg err = logMsg c ERROR (msg <> ": " <> show err) >> return c

class CacheBackend c where
getCacheLogger :: c -> Maybe Logger

-- ^ Connect to the cache backend
connect :: c -> IO (Either SomeException s)

-- ^ Look up value from the cache
lookup :: CacheKey -> c -> r

-- ^ Check whether a key has a value
hasValue :: CacheKey -> c -> Bool

-- ^ Invalidate a value already in the cache
invalidate :: CacheKey -> c -> r

instance HasLogger CacheBackend where
  getComponentLogger (CacheBackend _ l _ _) = l

Of course, I forgot one thing – the type for the container that was going to be stored inside LocalMemoryCache it should be something like TMVar (HashMap CacheKey (PaginatedList JobWithCompany)), in the simplest case. make sure to read up on TMVars if you haven’t already. Unfortunately, since I’m going to be storing various results (not just PaginatedList JobWithCompany values) in there, I can’t use a regular HashMap CacheKey ??? (the ??? values would be heterogenous, I could wrap them to smoothe them out but for this first implementation I’ll hold off). It turns out there’s an option to use Data-HMap for a heterogeneous map, but I don’t want the extra complexity – I’ll do it the dumb simple way for now:

data LocalMemoryCache = LocalMemoryCache { searchCfg          :: CacheConfig
                                         , searchLogger       :: Maybe Logger
                                         , scJobFTS           :: TMVar (HashMap CacheKey (PaginatedList JobWithCompany))
                                         , scActiveJobOnlyFTS :: TMVar (HashMap CacheKey (PaginatedList JobWithCompany))
                                         }

Step 2: Actually start writing compilable code

Now that the types look like they could/should work, it’s time to try and sneak one past the compiler – I use the usual undefined trick to mock out implelentations of methods and get to work:

instance HasLogger CacheBackend where
  getComponentLogger = cacheLoger -- realized that my approach to this was pretty bad, could just use the getter, didn't have to do the pattern matching

instance CacheBackend LocalMemoryCache where
  getCacheLogger = cacheLogger

  connect = undefined

  lookup = undefined

  hasValue = undefined

  invalidate = undefined

With that stuff mocked out, I tried a compile, and… I forgot a few things:

  • The CacheConfig type was unspecified (I needed to make a bunch of changes in Config.hs
  • Tons of little errors, typos, previous code from the copy pasta, etc
  • Forgot to properly name and implement the constructor for a CacheBackend
  • Import the appropriate STM package and types (modifying .cabal and restarting ghci)

Here’s what the code that compiled looks like:

{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE DataKinds         #-}

module Cache.CacheBackend ( makeConnectedCacheBackend
                          , CacheBackend(..)
                          , CacheKey(..)
                          ) where

import           Data.Monoid ((<>))
import           Config (LoggerConfig(..), CacheConfig(..))
import           Control.Concurrent.STM (STM)
import           Control.Concurrent.STM.TMVar (TMVar, newEmptyTMVar)
import           Control.Monad.State (liftIO)
import           Control.Exception (SomeException)
import           Data.HashMap.Strict as HMS (HashMap)
import           Data.Maybe (Maybe)
import           System.Log.Logger (Logger, Priority(..))
import           Types (HasLogger(..), JobWithCompany, PaginatedList)

data CacheBackend = LocalMemoryCache { cacheCfg            :: CacheConfig
                                      , cacheLogger        :: Maybe Logger
                                      , scJobFTS           :: STM (TMVar (HashMap CacheKey (PaginatedList JobWithCompany)))
                                      , scActiveJobOnlyFTS :: STM (TMVar (HashMap CacheKey (PaginatedList JobWithCompany)))
                                      }
                     -- \| NewImplementation might go here

data CacheKey = JobFTS
         \| ActiveJobOnlyFTS deriving Eq

data CacheBackendError = UnexpectedFailure deriving (Eq)

instance Show CacheBackendError where
    show UnexpectedFailure = "An unexpected failure occurred"

makeConnectedCacheBackend :: CacheConfig -> Maybe Logger -> IO (Either SomeException CacheBackend)
makeConnectedCacheBackend c maybeL = connect (LocalMemoryCache c maybeL newEmptyTMVar newEmptyTMVar)

logErrAndReturn :: CacheBackend -> String -> SomeException -> IO CacheBackend
logErrAndReturn c msg err = logMsg c ERROR (msg <> ": " <> show err) >> return c

class Cache c where
    getCacheLogger :: c -> Maybe Logger

    -- ^ Connect to the cache backend
    connect :: c -> IO (Either SomeException c)

    -- ^ Look up value from the cache
    lookup :: CacheKey -> c -> r

    -- ^ Check whether a key has a value
    hasValue :: CacheKey -> c -> Bool

    -- ^ Invalidate a value already in the cache
    invalidate :: CacheKey -> c -> r

instance HasLogger CacheBackend where
    getComponentLogger = cacheLogger

instance Cache CacheBackend where
    getCacheLogger = cacheLogger

    connect = undefined

    lookup = undefined

    hasValue = undefined

    invalidate = undefined

Step 3: Add implementations

I’m a little fuzzy on how I should be using the STMmonad, and whether I wrote things correctly but since just about everything else is what I THINK I want (at least the types are what they should be), I went ahead with writing the implementations. At this point one more worry I had was the fact that I would now have to juggle a few monads – IO, my Servant custom handler monad (WithApplicationGlobals Handler <something>), and now STM as well. Mixing monads often gets me confused (especially when >>= produces a value in a monad I didn’t expect, but I think I’ll be able to liftIO or use the -IO versions of any functions to avoid issues.

Turns out afer a little bit more looking I found out I was using the wrong STM primitive, TVar is what I wanted. As I started writing, things started getting wild (more and more language extensions being required, which worries me because I often don’t understand them fully). In particular, DeriveGeneric, FlexibleContexts and UndecidableInstances kept seeming like solutions to problems I was facing, all to make CacheKeya hashable value. Here’s the code I was puzzled with:

data CacheBackend = LocalMemoryCache { cacheCfg            :: CacheConfig
                                      , cacheLogger        :: Maybe Logger
                                      , scJobFTS           :: STM (TVar (HMS.HashMap CacheKey (PaginatedList JobWithCompany)))
                                      , scActiveJobOnlyFTS :: STM (TVar (HMS.HashMap CacheKey (PaginatedList JobWithCompany)))
                                      }
                     -- \| NewImplementation might go here

data CacheKey = JobFTS JobQuery
             \| ActiveJobOnlyFTS JobQuery deriving (Generic, Eq)

getCacheEntry :: CacheKey ->  STM (TVar (HMS.HashMap CacheKey a)) -> IO (Either CacheBackendError a)
getCacheEntry k stm = atomically stm
               >>= atomically . readTVar
               >>= pure . maybe (Left UnexpectedFailure) Right . HMS.lookup k

-- ... a ways down ...

instance Cache CacheBackend where
    getCacheLogger = cacheLogger

    connect = return . Right -- This will have to chance once I have a non-local-memory type of CacheBackend

    lookup key = getCacheEntry key . scJobFTS

    hasValue = undefined

    invalidate = undefined

Note the change to TVar and the start of using STM and the implementations I’ve just started writing. After letting those issues slide for now, I got down to one error that took me quite a while to fix:

/<redacted>/src/Cache/CacheBackend.hs:77:18: error:
     Couldn't match type r with PaginatedList JobWithCompany
      r is a rigid type variable bound by
        the type signature for:
          lookup :: forall r.
                    CacheKey -> CacheBackend -> IO (Either CacheBackendError r)
        at /<redacted>/src/Cache/CacheBackend.hs:77:5
      Expected type: CacheBackend -> IO (Either CacheBackendError r)
        Actual type: CacheBackend
                     -> IO (Either CacheBackendError (PaginatedList JobWithCompany))
     In the expression: getCacheEntry key . scJobFTS
      In an equation for lookup:
          lookup key = getCacheEntry key . scJobFTS
      In the instance declaration for Cache CacheBackend
     Relevant bindings include
        lookup :: CacheKey
                  -> CacheBackend -> IO (Either CacheBackendError r)
          (bound at /<redacted>/src/Cache/CacheBackend.hs:77:5)

I was trying to use a class to just return any old thing (in checkCacheEntry/lookup :: CacheKey -> c -> IO (Either CacheBackendError r)), but the problem is that they need to be the same kind of thing all the time for every cache entry if I want to do that – they can’t be heterogenous. I was pretty stumped at this point at why the compiler was giving me those errors, so I did something I do pretty often – ignored the problem completely and went to work on something else!.

Here’s the code for removing entries:

removeCacheEntry :: CacheKey ->  STM (TVar (HMS.HashMap CacheKey a)) -> IO ()
removeCacheEntry k stm = atomically stm
                         >>= \v -> atomically (readTVar v)
                         >>= pure . HMS.delete k
                         >>= atomically . writeTVar v

class Cache c where
    -- ... other declarations ...

    -- ^ Invalidate a value already in the cache
    invalidate :: CacheKey -> c -> IO ()

instance Cache CacheBackend where
    -- ... other declarations ...

    invalidate k@(JobFTS _) = removeCacheEntry k . scJobFTS
    invalidate k@(ActiveJobOnlyFTS _) = removeCacheEntry k . scActiveJobOnlyFTS

Writing this code reminded me that I needed to do some unwrapping, scJobFTS (which is the getter for the job full text search mapping) couldn’t handle every type of caching I wanted to do. I couldn’t ignore the problem I ran into for very long, so I started looking into what a Rigid Type Variable was, and basically just found the answer that it’s a “user-specified type”. What was causing r, the “user-specified type”, to make the compiler unhappy? I was still stumped at this point, but I did know that regular HashMaps do this as well – the dumbest but working way I could do this would be just to devolve to using very explicit lookup methods:

class Cache c where
    -- ... other declarations ...

    -- ^ Look up value from the cache
    lookupJobFTS :: CacheKey -> c -> IO (Either CacheBackendError (PaginatedList JobWithCompany))
    lookupActiveJobOnlyFTS :: CacheKey -> c -> IO (Either CacheBackendError (PaginatedList JobWithCompany))

instance Cache CacheBackend where
    -- ... other declarations ...

    lookupJobFTS k@(JobFTS _) = getCacheEntry k . scJobFTS
    lookupActiveJobOnlyFTS k@(ActiveJobOnlyFTS _) = getCacheEntry k . scActiveJobOnlyFTS

    -- ... other declarations ...

Of course update functionality is also pretty important:

insertCacheEntry :: CacheKey -> a -> STM (TVar (HMS.HashMap CacheKey a)) -> IO ()
insertCacheEntry k v stm = atomically stm
                           >>= \tv -> atomically (readTVar tv)
                           >>= pure . HMS.insert k v
                           >>= atomically . writeTVar tv
class Cache c where
    -- ... other declarations ...

    -- ^ Insert value(s) into the cache (will replace existing entries)
    insertJobFTS :: CacheKey -> PaginatedList JobWithCompany -> c -> IO ()
    insertActiveJobOnlyFTS :: CacheKey -> PaginatedList JobWithCompany -> c -> IO ()
    -- ... other declarations ...

instance Cache CacheBackend where
    -- ... other declarations ...

    insertJobFTS k@(JobFTS _) v  = insertCacheEntry k v . scJobFTS
    insertActiveJobOnlyFTS k@(ActiveJobOnlyFTS _) v = insertCacheEntry k v . scActiveJobOnlyFTS

    -- ... other declarations ...

With retrospect, the answer to actually solving the issue I was having turns out to be wrapping the values in some sort of union type that would cover all their possibilities – just like the CackeKey does for all the different kinds of keys. I didn’t actually end up doing this (I ran with the just-write-a-bunch-of-lookup-functions), but just wanted to note that here.

Step 4: Integrating the cache backend into the app

Now that the CacheBackend theoretically works (it compiles at least, and the types do what they’re supposed to), now it’s time to make sure my CacheBackend gets started with the rest of the things the app needs:

-- ... other imports ...
import           Cache.CacheBackend (CacheBackend(..), makeConnectedCacheBackend)
-- ... lots of code  ...

startApp :: AppConfig -> IO ()
startApp c = do
  -- ... lots of other code...

  -- Set up the cache backend
  cacheLogger <- buildLogger "App.Cache" (cacheLogLevel cacheCfg)
  cacheBackendOrError <- makeConnectedCacheBackend cacheCfg (Just cacheLogger)
  throwErrorIf (isLeft cacheBackendOrError) ("Failed to initialize Cache backend: \n" ++ showLeft cacheBackendOrError)
  let (Right cacheBackend) = cacheBackendOrError

-- ... lots of other code ...
data ApplicationGlobals = ApplicationGlobals { globalConfig           :: AppConfig
                                             , globalBackend          :: SqliteBackend
                                             , globalMailer           :: MailerBackend
                                             , globalUserContentStore :: UserContentStoreBackend
                                             , globalSearchBackend    :: SQLiteFTS
                                             , globalCacheBackend     :: CacheBackend -- NEW!
                                             , globalLogger           :: Logger
                                             , globalCookieKey        :: WCS.Key
                                             }

EZ PZ compile, though the code for handling the potential failure is very messy. Now theoretcially my cache backend is working, so it’s time to actually use it in some endpoints. Before I do that though, some additions to make it more ergonomic to use and help debugging:

  • Add logging to the cache backend to log when it’s used @ DEBUG level
  • Add get-or-set type utility function that either uses a cached value or if there isn’t one runs the computation and saves it in cache
  • Need to use the utility function on some endpoint, let’s say jobFTS

It took a long time to get these three relatively simple seeming tasks done, but here’s the code for a handler and the lookupOrCompute function (the helper I mentioned):

-- ^ Job search (only checks active, becuase that's all that's indexed), using the available search backend
jobFTS :: Maybe String
       -> [JobIndustry]
       -> [CompanyID]
       -> Maybe Limit
       -> Maybe Offset
       -> [TagName]
       -> WithApplicationGlobals Handler (EnvelopedResponse (PaginatedList JobWithCompany))
jobFTS term is cs limit offset tags = getBackendWithSearchAndCache
                                      >>= \(db, cache, search) -> liftIO (try (lookupOrCompute (getJobListing cacheKey cache) (doSearch search db)))
                                      >>= ifLeftThrowServantError -- If the lookup/compute failed then throw the error as is
                                      >>= ifNothingThrowError (Err.enveloped Err.jobSearchFailed) -- if the lookup compute worked, but returned nothing
                                      >>= pure . EnvelopedResponse "success" "Successfully completed search"
    where
      massagedJQ = massage $ JobQuery (trimSearchTerm term) is cs limit offset tags
      cacheKey = JobFTS massagedJQ
      -- Do the search with the backend
      doSearch searchBackend dbBackend = searchJobs massagedJQ searchBackend
                                         >>= either (error "Job FTS failed") pure
                                         >>= hydrateSearchResultJobIDs dbBackend

--- ... a ways down ...

-- ^ Lookup or compute a value from cache
lookupOrCompute :: IO (Maybe a) -> IO (Maybe a) -> IO (Maybe a)
lookupOrCompute lookup compute = lookup >>= maybe compute (pure . Just)

Not too shabby! Here’s a pretty bad bug I noticed though – I actually wasn’t using the MassagedJQ, when I should have been (“massaging” is just what I called a light round of user input validation and enhancement of the search terms). After getting that small bug out of the way, I let ’er rip!

Turns out there is a huge problem, both actions are hapening (computes are happening after the results should have been saved) which means the values aren’t being saved, and mostly because I’ve been using TVar all wrong – what I needed to store wasn’t the STM TVar (an action in the STM monad that produces a TVar when it’s run), I needed to store the actual TVar itself!

Here’s how I figured this out (this is straighto ut of my notes):

  • Saving just wasn’t working despite doing exactly what it should be doing (right methods getting called, right log messages getting printed, except every cache check was a miss)
  • Something is super broken, saving is not working properly
  • Look back at how I’m supposed to be using TVars
  • Wonder if I can use TVars outside the STM monad – mabye I should add an STM monad transformer to the servant app…? do they have to share context?
  • Nah that seems kinda sucky, because STM contexts get generated when you read though…
  • Here’s where I hit the realization that maybe I was doing it wrong, I should just use the TVar by itself

Now that it’s working, I get a SWEET speedup. Man it feels good – and I’m not even doing a robust job in any sense of the word, but the speedup is amazing (16ms -> 3ms)

Difference between a miss and a hit

Step 5: Fixes, adding tests

Here’s the bit where I go around and fix up stuff I’ve been ignoring and adding tests to make sure this stuff works in perpetuity. First order of business was going around and changing the SearchBackend to use the Massaged JobQuery rather than just the job query in various places since I stumbled upon that bug. Next was adding some tests – I’ll save you the story on how I started getting around to writing the tests, here’s some code:

-- List of JWCs
jwcList :: PaginatedList JobWithCompany
jwcList = PaginatedList jwcs 2
    where
      jwcs = [ JobWithCompany (ModelWithID 1 (makeJob True 1 1)) (ModelWithID 1 testCompany)
             , JobWithCompany (ModelWithID 2 (makeJob True 1 2)) (ModelWithID 1 testCompany)
             ]

-- Query that is used when someone visits the main page
basicQuery :: Massaged JobQuery
basicQuery = Massaged $ JobQuery "" [] [] (Just 10) (Just 0) []

activeOnlyJobCacheKey :: CacheKey
activeOnlyJobCacheKey = ActiveJobOnlyFTS basicQuery

allJobCacheKey :: CacheKey
allJobCacheKey = JobFTS basicQuery

main :: IO ()
main = hspec spec

spec :: Spec
spec = around withCacheBackend $ do
  describe "setup" $ it "works without a logger" $ \c -> isNothing (cacheLogger c) `shouldBe` True

  describe "value insertion" $ do
    it "inserts job FTS results" $ \c -> insertJobListing allJobCacheKey c jwcList `shouldReturn` ()
    it "inserts active job only FTS results" $ \c -> insertJobListing activeOnlyJobCacheKey c jwcList `shouldReturn` ()

  describe "value retrieval" $ do
    it "retrieves inserted job FTS results" $ \c -> insertJobListing allJobCacheKey c jwcList
                                                    >> getJobListing allJobCacheKey c
                                                    >>= shouldBeSomething
                                                    >>= (`shouldBe`jwcList)
    it "retrieves inserted active job only FTS results" $ \c -> insertJobListing activeOnlyJobCacheKey c jwcList
                                                                >> getJobListing activeOnlyJobCacheKey c
                                                                >>= shouldBeSomething
                                                                >>= (`shouldBe`jwcList)

  describe "hasValue" $ do
    it "hasValue doesn't find not-inserted job FTS results" $ \c -> hasValue allJobCacheKey c `shouldReturn` False
    it "hasValue doesn't find not-inserted active job only FTS results" $ \c -> hasValue activeOnlyJobCacheKey c `shouldReturn` False

    it "hasValue finds inserted job FTS results" $ \c -> insertJobListing allJobCacheKey c jwcList
                                                         >> hasValue allJobCacheKey c
                                                         >>= (`shouldBe`True)
    it "hasValue finds inserted active job only FTS results" $ \c -> insertJobListing activeOnlyJobCacheKey c jwcList
                                                                     >> hasValue activeOnlyJobCacheKey c
                                                                     >>= (`shouldBe`True)

  describe "invalidation" $ do
    it "works for job FTS results" $ \c -> insertJobListing allJobCacheKey c jwcList
                                           >> getJobListing allJobCacheKey c
                                           >>= shouldBeSomething
                                           >> invalidate allJobCacheKey c
                                           >> getJobListing allJobCacheKey c
                                           >>= (`shouldBe`Nothing)

    it "works for active job only FTS results" $ \c -> insertJobListing activeOnlyJobCacheKey c jwcList
                                                       >> getJobListing activeOnlyJobCacheKey c
                                                       >>= shouldBeSomething
                                                       >> invalidate allJobCacheKey c
                                                       >> getJobListing allJobCacheKey c
                                                       >>= (`shouldBe`Nothing)

BONUS: Hastily implemented timed cache invalidation

Yay for super easy async life in Haskell! Some things are better cached by time and not explicit cache invalidation so for a super simple solution I just spin off a green thread (a very helpful SO post if you’re new to the subject) that will handle the deletion in the future.

timedCacheInvalidation :: (HasLogger c, Cache c) => CacheKey -> c -> Int -> IO ()
timedCacheInvalidation k c ms = void $ forkIO (delayedInvalidation k c ms)
    where
      delayedInvalidation k c ms  = threadDelay ms
                                    >> try (invalidate k c)
                                    >>= logTimedCacheInvalidation k c

logTimedCacheInvalidation :: HasLogger c => CacheKey -> c -> (Either SomeException ()) -> IO ()
logTimedCacheInvalidation k c (Left _) = logMsg c DEBUG ("Timed cache invalidation FAILED for key: " <> show k)
logTimedCacheInvalidation k c (Right _) = logMsg c DEBUG ("Timed cache invalidation SUCCESS for key: "<> show k)

insertCompanyStats k@(Stats _) c v  = insertCacheEntry k v (scCompanyStats c)
                                      >>= \res -> timedCacheInvalidation k c invalidationTimeMs
                                      >> logCacheUpdate c res
insertCompanyStats _ _ _ = throw InvalidCacheKey
    where
      cfg = getCacheConfig c
      invalidationTimeMs = defaultCacheInvalidation cfg

Pretty simple code there, with a single call to forkIO. fork being in the name might scare those of us with more low level backgrounds, as it can be tricky to use, but rest assured, the docs say that it just sparks off “one lightweight, unbound thread”. Almost suspicious of how easy it was, I wrote a test to ensure it worked

describe "timed invalidation" $
  it "works for company stats" $ \c -> insertCompanyStats companyStatsCacheKey c testCompanyStats
                                       >> getCompanyStats companyStatsCacheKey c
                                       >>= shouldBeSomething
                                       >> threadDelay (oneSecondMs `div` 2) -- defaultThread delay is one second, wait 2 just in case
                                       >> getCompanyStats companyStatsCacheKey c
                                       >>= \beforeDeletion -> threadDelay oneSecondMs -- defaultThread delay is one second, wait 2 just in case
                                       >> getCompanyStats companyStatsCacheKey c
                                       >>= \after -> isJust beforeDeletion && isNothing after `shouldBe` True

Where could I improve/why is this so naive?

There are tons of reasons, but here are the ones I can see at least

  • More cache keys – there are other things to cache! (going to suck a little bit because of the way this is built, going to have to add accessors and concretely typed lookup/update methods) – I did this after Step 5
  • Would be nice to have more dynamic lookup and accessor functions , it’s silly to have so many lookup functions in the typeclass
  • No memory/size bounding checks currently
  • Relatively dumb/simple cache invalidation semantics, I could always do better/be smarter about how to invalidate
  • Just about zero thought to performance, no stress testing, just back of the napkin one-off measurements

I’m going to leave all these worries for another day :).

TLDR

Here’s all the code completed!

The jobFTS handler in my API code:

-- ^ Job search (only checks active, becuase that's all that's indexed), using the available search backend
jobFTS :: Maybe String
       -> [JobIndustry]
       -> [CompanyID]
       -> Maybe Limit
       -> Maybe Offset
       -> [TagName]
       -> WithApplicationGlobals Handler (EnvelopedResponse (PaginatedList JobWithCompany))
jobFTS term is cs limit offset tags = getBackendWithCacheAndSearch -- get various backends
                                      >>= \(db, cache, search) -> liftIO (try (lookupOrComputeAndSave (getJobListing cacheKey cache) (doSearch search db) (insertJobListing cacheKey cache)))
                                      >>= ifLeftThrowServantError -- If the lookup/compute failed then throw the error as is
                                      >>= ifNothingThrowIOError (Err.enveloped Err.jobSearchFailed) -- if the lookup compute worked, but returned nothing
                                      >>= pure . EnvelopedResponse "success" "Successfully completed search"
    where
      massagedJQ = massage $ JobQuery (trimSearchTerm term) is cs limit offset tags
      cacheKey = ActiveJobOnlyFTS massagedJQ
      -- Do the search with the backend
      doSearch searchBackend dbBackend = searchJobs massagedJQ searchBackend
                                         >>= either (error "Job FTS failed") pure
                                         >>= hydrateSearchResultJobIDs dbBackend


--- ... near the bottom with the other helper code ...

-- ^ Lookup or compute a value from cache
lookupOrComputeAndSave :: IO (Maybe a) -> IO (Maybe a) -> (a -> IO ()) -> IO (Maybe a)
lookupOrComputeAndSave lookup compute save = lookup >>= maybe computeAndSave (pure . Just)
    where
      computeAndSave = compute
                       >>= maybe (pure Nothing) (\res -> save res >> pure (Just res))

Most of the CacheBackend code:

{-# LANGUAGE OverloadedStrings    #-}
{-# LANGUAGE DeriveGeneric        #-}
{-# LANGUAGE DataKinds            #-}
{-# LANGUAGE FlexibleContexts     #-}
{-# LANGUAGE UndecidableInstances #-}

module Cache.CacheBackend ( makeConnectedCacheBackend
                          , CacheBackend(..)
                          , Cache(..)
                          , CacheKey(..)
                          ) where

import           Config (LoggerConfig(..), CacheConfig(..))
import           Control.Concurrent (forkIO, threadDelay)
import           Control.Concurrent.STM (STM, atomically)
import           Control.Concurrent.STM.TVar (TVar, newTVarIO, readTVar, writeTVar)
import           Control.Exception (Exception, SomeException, throw, try)
import           Control.Monad.State (liftIO, void)
import           Data.Hashable (Hashable)
import           Data.Maybe (Maybe)
import           Data.Monoid ((<>))
import           GHC.Generics (Generic)
import           System.Log.Logger (Logger, Priority(..))
import           Types (HasLogger(..), JobQuery, JobWithCompany, Massaged(..), PaginatedList, UserID, Company, CompanyStats, CompanyID, Limit, Offset, ModelWithID, Tag, TagID)
import qualified Data.HashMap.Strict as HMS

data CacheBackend = LocalMemoryCache { cacheCfg            :: CacheConfig
                                     , cacheLogger        :: Maybe Logger
                                     , scJobFTS           :: TVar (HMS.HashMap CacheKey (PaginatedList JobWithCompany))
                                     , scActiveJobOnlyFTS :: TVar (HMS.HashMap CacheKey (PaginatedList JobWithCompany))
                                     , scCompanyStats     :: TVar (HMS.HashMap CacheKey CompanyStats)
                                     -- ... more stores in the cache ...
                                     }
                    -- \| NewImplementation might go here

data CacheKey = JobFTS (Massaged JobQuery)
              | ActiveJobOnlyFTS (Massaged JobQuery)
              | Stats CompanyID deriving (Generic, Show, Eq)
              -- ... some cache keys have been omitted ...

instance Hashable CacheKey => Hashable CacheKey

data CacheBackendError = InvalidCacheKey
                       | UnexpectedFailure deriving (Eq)

instance Exception CacheBackendError

instance Show CacheBackendError where
    show UnexpectedFailure = "An unexpected failure occurred" -- these could be more descriptive for sure
    show InvalidCacheKey = "Invalid cache key"

makeConnectedCacheBackend :: CacheConfig -> Maybe Logger -> IO (Either SomeException CacheBackend)
makeConnectedCacheBackend cfg maybeL = sequence [newTVarIO HMS.empty, newTVarIO HMS.empty]
                                       >>= \[jobFTS, activeJobOnlyFTS] -> newTVarIO HMS.empty
                                       >>= \companyStats -> connectCacheBackend LocalMemoryCache { cacheCfg=cfg
                                                                                                 , cacheLogger=maybeL
                                                                                                 , scJobFTS=jobFTS
                                                                                                 , scActiveJobOnlyFTS=activeJobOnlyFTS
                                                                                                 , scCompanyStats=companyStats
                                                                                                 -- ... more setters ...
                                                                                                 }

logErrAndReturn :: CacheBackend -> String -> SomeException -> IO CacheBackend
logErrAndReturn c msg err = logMsg c ERROR (msg <> ": " <> show err) >> return c

getCacheEntry :: CacheKey ->  TVar (HMS.HashMap CacheKey a) -> IO (Maybe a)
getCacheEntry k tv = atomically (readTVar tv)
                    >>= pure . HMS.lookup k

-- ^ Insert an entry into a given shared concurrent hash map, returning the updated map
insertCacheEntry :: CacheKey -> a -> TVar (HMS.HashMap CacheKey a) -> IO (HMS.HashMap CacheKey a)
insertCacheEntry k newValue tv = atomically (readTVar tv)
                                 >>= pure . HMS.insert k newValue
                                 >>= \updatedMap -> atomically (writeTVar tv updatedMap)
                                 >> pure updatedMap

checkCacheEntry :: CacheKey ->  TVar (HMS.HashMap CacheKey a) -> IO Bool
checkCacheEntry k tv = atomically (readTVar tv)
                       >>= pure . HMS.member k

removeCacheEntry :: CacheKey ->  TVar (HMS.HashMap CacheKey a) -> IO ()
removeCacheEntry k tv = atomically (readTVar tv)
                        >>= pure . HMS.delete k
                        >>= atomically . writeTVar tv

emptyCache :: TVar (HMS.HashMap CacheKey a) -> IO ()
emptyCache tv = atomically (writeTVar tv HMS.empty)

timedCacheInvalidation :: (HasLogger c, Cache c) => CacheKey -> c -> Int -> IO ()
timedCacheInvalidation k c ms = void $ forkIO (delayedInvalidation k c ms)
    where
      delayedInvalidation k c ms  = threadDelay ms
                                    >> try (invalidate k c)
                                    >>= logTimedCacheInvalidation k c

logCacheHitOrMiss :: (HasLogger c, Cache c) => c -> CacheKey -> Maybe a -> IO (Maybe a)
logCacheHitOrMiss c k Nothing = logMsg c DEBUG ("Cache miss for key "<>show k) >> return Nothing
logCacheHitOrMiss c k res@(Just a) = logMsg c DEBUG ("Cache hit for key "<> show k) >> return res

logCacheUpdate :: (HasLogger c, Show a) => c -> a -> IO ()
logCacheUpdate c = logMsg c DEBUG . ("Cache Updated: "<>) . show

logTimedCacheInvalidation :: HasLogger c => CacheKey -> c -> Either SomeException () -> IO ()
logTimedCacheInvalidation k c (Left _) = logMsg c DEBUG ("Timed cache invalidation FAILED for key: " <> show k)
logTimedCacheInvalidation k c (Right _) = logMsg c DEBUG ("Timed cache invalidation SUCCESS for key: "<> show k)

class Cache c where
    getCacheLogger :: c -> Maybe Logger
    getCacheConfig :: c -> CacheConfig

    -- ^ Connect to the cache backend
    connectCacheBackend :: c -> IO (Either SomeException c)

    -- ^ Look up value(s) from the cache
    getJobListing :: CacheKey -> c -> IO (Maybe (PaginatedList JobWithCompany))
    getCompanyStats :: CacheKey -> c -> IO (Maybe CompanyStats)
    -- ... more getters :( ...

    -- ^ Insert value(s) into the cache (will replace existing entries)
    insertJobListing :: CacheKey -> c -> PaginatedList JobWithCompany -> IO ()
    insertCompanyStats :: CacheKey -> c -> CompanyStats -> IO ()
    -- ... more inserters :( ...

    -- ^ Check whether a key has a value
    hasValue :: CacheKey -> c -> IO Bool

    -- ^ Invalidate a value already in the cache
    invalidate :: CacheKey -> c -> IO ()

    -- ^ Invalidate caches
    invalidateAllJobListings :: c -> IO ()
    invalidateCompanyListing :: c -> IO ()
    -- ... slightly less but still more invalidators :( ...


instance HasLogger CacheBackend where
    getComponentLogger = cacheLogger

instance Cache CacheBackend where
    getCacheLogger = cacheLogger

    getCacheConfig = cacheCfg

    connectCacheBackend = return . Right -- This will have to chance once I have a non-local-memory type of CacheBackend

    getJobListing k@(JobFTS _) c = getCacheEntry k (scJobFTS c)
                                   >>= logCacheHitOrMiss c k
    getJobListing k@(ActiveJobOnlyFTS _) c = getCacheEntry k (scActiveJobOnlyFTS c)
                                             >>= logCacheHitOrMiss c k

    getCompanyStats k@(Stats _) c = getCacheEntry k (scCompanyStats c)
                                    >>= logCacheHitOrMiss c k
    getCompanyStats _ _ = throw InvalidCacheKey

    -- ... more pairs of getters :( ...

    insertJobListing k@(JobFTS _) c v  = insertCacheEntry k v (scJobFTS c) >>= logCacheUpdate c
    insertJobListing k@(ActiveJobOnlyFTS _) c v = insertCacheEntry k v (scActiveJobOnlyFTS c) >>= logCacheUpdate c
    insertJobListing _ _ _ = throw InvalidCacheKey

    insertCompanyStats k@(Stats _) c v  = insertCacheEntry k v (scCompanyStats c)
                                          >>= \res -> timedCacheInvalidation k c invalidationTimeMs
                                          >> logCacheUpdate c res
        where
          invalidationTimeMs = (cacheDefaultTimedInvalidationMs . getCacheConfig) c

    insertCompanyStats _ _ _ = throw InvalidCacheKey

    -- ... more pairs of inserters :( ...

    hasValue k@(JobFTS _) = checkCacheEntry k . scJobFTS
    hasValue k@(ActiveJobOnlyFTS _) = checkCacheEntry k . scActiveJobOnlyFTS
    hasValue k@(Stats _) = checkCacheEntry k . scCompanyStats
    -- ... more hasValue pattern completions ...

    invalidate k@(JobFTS _) = removeCacheEntry k . scJobFTS
    invalidate k@(ActiveJobOnlyFTS _) = removeCacheEntry k . scActiveJobOnlyFTS
    invalidate k@(Stats _) =  removeCacheEntry k . scCompanyStats
    -- ... more invalidate pattern completions ...

    invalidateAllJobListings c = emptyCache (scJobFTS c)
                                 >> emptyCache (scActiveJobOnlyFTS c)

    invalidateCompanyListing c = emptyCache (scCompanyListing c)

Wrapping up

So it was pretty fun to implement all this, and I hope you learned from some of my mistakes at least and got a taste for what novice (hopefully amateur?) Haskell looks like! Feel free to drop me a line if you see a just absolutely horrid mistake/bad approach, I’d love to learn more and get to know what I’m doing wrong/right.