tl;dr - I added some caching to an app I’m writing in Haskell using Servant. The implementation is pretty naive, but I’ve seen some speedups (as one would expect from skipping a database access), and am glad I was able to build such a simple solution in a language as expressive as haskell. Skip to the end TLDR section to see all the code laid out!
FAIR WARNING - this will is NOT an interesting article about caching algorithms or a quirk in GHC or optimization strategies. It’s just some notes on how I did some pre-mature optimization implementing a very naive and simple caching mechanism for some endpoints on a Servant-powered web application I’m writing. It also served as a good chance for me to start using Haskell’s STM and concurrency features.
Thanks to working in Haskell on this program for so long, I’ve almost forgotten that the large majority of my code is completely pure and stateless. It was a good feeling to be somewhat confused and how I would maintain a piece of often considered global state like a cache in haskell, and how I would deal with actually changing a value that had already been set. Thanks to immutability-by-default, I’ve been writing trivially parallelizable code this whole time.
Before starting, I thought that this was something I could actually get through fairly quickly, possibly within a day if I could get into a good “flow”. One of the best indicators of good “flow” when I’m writing Haskell is when the compiler doesn’t have to yell at me much (if at all). Rather than being the task master, the compiler becomes your buddy that pipes up every once in a while, and you never forget which parts of the system get affected by changes and generally move the whole system forward at one time, instead of just moving one bit and dealing with the fallout. Enough about that, let’s get into what I did.
There’s a famous saying in Computer Science (attributed to Phil Karlton) that the only two hard things in Computer Science – cache invalidation and naming things. Now that I’m tackling one of them (albeit very naively), My initial assessment that this was going to be pretty simple worries me quite a bit. The concept is simple:
Astute readers will note that the if the data affected by the query has changed is the hard bit.
Since the project is a job board, I have an endpoint (Servant handler) jobsFTS
(job Full Text Search) which takes a JobQuery
type that completely describes the query being done on the database side. Since the state I care about for the hard bit is the database, and it can change at any time, it makes sense to consider the database state as a kind of explicit input to the jobsFTS
function. So I have one explicit input (the JobQuery
), and one implicit input (the database state). The function returns a PaginatedList JobWithCompany
(yay nice and readable types), so that’s what my cache is going to have to store.
So what about the hard part? The cache invalidation part? I think I can avoid over thinking it: every time a job posting is added/removed/activated/inactivated, I just invalidate ALL the JobQuery caches. This is a VERY inefficient cache, but I’m fine with it, due to the current frequency of jobs actually being added/removed/activated/inactivated. All I need to do now is store the results (PaginatedList JobWithCompany
values) in a concurrent/thread-safe way.
This is the good ol’ RTFM part. I spend a good bit of time reading up on Haskell’s STM wiki page along with the actual docs for the stm
package. For some reason there were actually a bunch of dead links on the wiki page. At this point I’m very used to reading haskell generated documentation so I was very comfortable reading the docs of the stm
package, but I could see how this would be very offputting for someone new to the language.
Before we get into what I did, I want to point out that there’s a very real amount of tech-debt/cruft/code (however you view it) already existing – I’m starting from a app that I architected according to how I like to do things, so you’ll have to work a little harder than I do to understand the structure.
I think the structure of my application is really simple though, I’ve found and continue to find that the component based approached really strikes the right balance for me. Very simply put, if you organize your application in terms of big concerns like “sending emails” or “interacting with the database” or “caching”, you have a very organic separation and organization of code. Well organized code makes a huge difference in a large codebase. These large concerns often have some shared functionality, things like setting themselves up, or tearing themselves down, or getting status, and a language that gives you reasonably ergonomic approach to interfaces (I really like Golang’s approach for this, and of course Haskell’s typeclasses are stellar). Up until a while ago I’d associated this approach with bloated software, but it was important for me to realize personally that just because you wanted to add <Something>Component
s all over your code didn’t mean that your code would become the horror that is Java’s AbstractBeanComponentFactory
-style frameworks like Spring. Of course, I didn’t invent this pattern, one of the best places I’ve seen it implemented is Stuart Sierra’s Component for clojure.
In my app I already have the structure in place, so today I’m only going to be adding a CacheBackend
component (the name is a little unfortunate but I’m too lazy to change it, looks like I just hit the second hard thing in computer science). I was able to copy-pasta some code from my SearchBackend
component ([I’ve written about it here][search-backend-post]), and do a quick s/Search/Cache/g
and get on the way much quicker. If you don’t have this kind of structure in your own app, I’d highly recommend looking into whether it makes sense for you.
CacheBackend
Just a few minutes of coding and I’m feeling crazy productive and like I’m doing things right, here’s what the early code (that doesn’t compile) looked like:
{-# LANGUAGE OverloadedStrings #-}
module Cache.CacheBackend ( makeConnectedCacheBackend
, CacheBackend(..)
, CacheKey(..)
) where
import Config (LoggerConfig(..), CacheConfig(..))
import Data.Maybe (Maybe)
data LocalMemory = LocalMemory { searchCfg :: CacheConfig
, searchLogger :: Maybe Logger
}
data CacheKey = ActiveJobFTS
\| JobFTS deriving Eq |
data CacheBackendError = UnexpectedFailure deriving (Eq)
instance Show CacheBackendError where
show UnexpectedFailure = "An unexpected failure occurred"
makeConnectedCacheBackend :: CacheConfig -> Maybe Logger -> IO (Either SomeException CacheBackend)
makeConnectedCacheBackend c maybeL = connectCache SQLiteFTS { cacheCfg=c
, cacheLogger=maybeL
}
logErrAndReturn :: CacheBackend -> String -> SomeException -> IO CacheBackend
logErrAndReturn c msg err = logMsg c ERROR (msg <> ": " <> show err) >> return c
class CacheBackend c where
getCacheLogger :: c -> Maybe Logger
-- ^ Connect to the cache backend
connect :: c -> IO (Either SomeException s)
-- ^ Look up value from the cache
lookup :: CacheKey -> c -> r
-- ^ Check whether a key has a value
hasValue :: CacheKey -> c -> Bool
-- ^ Invalidate a value already in the cache
invalidate :: CacheKey -> c -> r
instance HasLogger CacheBackend where
getComponentLogger (CacheBackend _ l _ _) = l
Of course, I forgot one thing – the type for the container that was going to be stored inside LocalMemoryCache
it should be something like TMVar (HashMap CacheKey (PaginatedList JobWithCompany))
, in the simplest case. make sure to read up on TMVar
s if you haven’t already. Unfortunately, since I’m going to be storing various results (not just PaginatedList JobWithCompany
values) in there, I can’t use a regular HashMap CacheKey ???
(the ???
values would be heterogenous, I could wrap them to smoothe them out but for this first implementation I’ll hold off). It turns out there’s an option to use Data-HMap
for a heterogeneous map, but I don’t want the extra complexity – I’ll do it the dumb simple way for now:
data LocalMemoryCache = LocalMemoryCache { searchCfg :: CacheConfig
, searchLogger :: Maybe Logger
, scJobFTS :: TMVar (HashMap CacheKey (PaginatedList JobWithCompany))
, scActiveJobOnlyFTS :: TMVar (HashMap CacheKey (PaginatedList JobWithCompany))
}
Now that the types look like they could/should work, it’s time to try and sneak one past the compiler – I use the usual undefined
trick to mock out implelentations of methods and get to work:
instance HasLogger CacheBackend where
getComponentLogger = cacheLoger -- realized that my approach to this was pretty bad, could just use the getter, didn't have to do the pattern matching
instance CacheBackend LocalMemoryCache where
getCacheLogger = cacheLogger
connect = undefined
lookup = undefined
hasValue = undefined
invalidate = undefined
With that stuff mocked out, I tried a compile, and… I forgot a few things:
CacheConfig
type was unspecified (I needed to make a bunch of changes in Config.hs
CacheBackend
STM
package and types (modifying .cabal
and restarting ghci
)Here’s what the code that compiled looks like:
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE DataKinds #-}
module Cache.CacheBackend ( makeConnectedCacheBackend
, CacheBackend(..)
, CacheKey(..)
) where
import Data.Monoid ((<>))
import Config (LoggerConfig(..), CacheConfig(..))
import Control.Concurrent.STM (STM)
import Control.Concurrent.STM.TMVar (TMVar, newEmptyTMVar)
import Control.Monad.State (liftIO)
import Control.Exception (SomeException)
import Data.HashMap.Strict as HMS (HashMap)
import Data.Maybe (Maybe)
import System.Log.Logger (Logger, Priority(..))
import Types (HasLogger(..), JobWithCompany, PaginatedList)
data CacheBackend = LocalMemoryCache { cacheCfg :: CacheConfig
, cacheLogger :: Maybe Logger
, scJobFTS :: STM (TMVar (HashMap CacheKey (PaginatedList JobWithCompany)))
, scActiveJobOnlyFTS :: STM (TMVar (HashMap CacheKey (PaginatedList JobWithCompany)))
}
-- \| NewImplementation might go here
data CacheKey = JobFTS
\| ActiveJobOnlyFTS deriving Eq
data CacheBackendError = UnexpectedFailure deriving (Eq)
instance Show CacheBackendError where
show UnexpectedFailure = "An unexpected failure occurred"
makeConnectedCacheBackend :: CacheConfig -> Maybe Logger -> IO (Either SomeException CacheBackend)
makeConnectedCacheBackend c maybeL = connect (LocalMemoryCache c maybeL newEmptyTMVar newEmptyTMVar)
logErrAndReturn :: CacheBackend -> String -> SomeException -> IO CacheBackend
logErrAndReturn c msg err = logMsg c ERROR (msg <> ": " <> show err) >> return c
class Cache c where
getCacheLogger :: c -> Maybe Logger
-- ^ Connect to the cache backend
connect :: c -> IO (Either SomeException c)
-- ^ Look up value from the cache
lookup :: CacheKey -> c -> r
-- ^ Check whether a key has a value
hasValue :: CacheKey -> c -> Bool
-- ^ Invalidate a value already in the cache
invalidate :: CacheKey -> c -> r
instance HasLogger CacheBackend where
getComponentLogger = cacheLogger
instance Cache CacheBackend where
getCacheLogger = cacheLogger
connect = undefined
lookup = undefined
hasValue = undefined
invalidate = undefined
I’m a little fuzzy on how I should be using the STM
monad, and whether I wrote things correctly but since just about everything else is what I THINK I want (at least the types are what they should be), I went ahead with writing the implementations. At this point one more worry I had was the fact that I would now have to juggle a few monads – IO
, my Servant custom handler monad (WithApplicationGlobals Handler <something>
), and now STM
as well. Mixing monads often gets me confused (especially when >>=
produces a value in a monad I didn’t expect, but I think I’ll be able to liftIO
or use the -IO
versions of any functions to avoid issues.
Turns out afer a little bit more looking I found out I was using the wrong STM primitive, TVar is what I wanted. As I started writing, things started getting wild (more and more language extensions being required, which worries me because I often don’t understand them fully). In particular, DeriveGeneric
, FlexibleContexts
and UndecidableInstances
kept seeming like solutions to problems I was facing, all to make CacheKey
a hashable value. Here’s the code I was puzzled with:
data CacheBackend = LocalMemoryCache { cacheCfg :: CacheConfig
, cacheLogger :: Maybe Logger
, scJobFTS :: STM (TVar (HMS.HashMap CacheKey (PaginatedList JobWithCompany)))
, scActiveJobOnlyFTS :: STM (TVar (HMS.HashMap CacheKey (PaginatedList JobWithCompany)))
}
-- \| NewImplementation might go here
data CacheKey = JobFTS JobQuery
\| ActiveJobOnlyFTS JobQuery deriving (Generic, Eq)
getCacheEntry :: CacheKey -> STM (TVar (HMS.HashMap CacheKey a)) -> IO (Either CacheBackendError a)
getCacheEntry k stm = atomically stm
>>= atomically . readTVar
>>= pure . maybe (Left UnexpectedFailure) Right . HMS.lookup k
-- ... a ways down ...
instance Cache CacheBackend where
getCacheLogger = cacheLogger
connect = return . Right -- This will have to chance once I have a non-local-memory type of CacheBackend
lookup key = getCacheEntry key . scJobFTS
hasValue = undefined
invalidate = undefined
Note the change to TVar
and the start of using STM
and the implementations I’ve just started writing. After letting those issues slide for now, I got down to one error that took me quite a while to fix:
/<redacted>/src/Cache/CacheBackend.hs:77:18: error:
• Couldn't match type ‘r’ with ‘PaginatedList JobWithCompany’
‘r’ is a rigid type variable bound by
the type signature for:
lookup :: forall r.
CacheKey -> CacheBackend -> IO (Either CacheBackendError r)
at /<redacted>/src/Cache/CacheBackend.hs:77:5
Expected type: CacheBackend -> IO (Either CacheBackendError r)
Actual type: CacheBackend
-> IO (Either CacheBackendError (PaginatedList JobWithCompany))
• In the expression: getCacheEntry key . scJobFTS
In an equation for ‘lookup’:
lookup key = getCacheEntry key . scJobFTS
In the instance declaration for ‘Cache CacheBackend’
• Relevant bindings include
lookup :: CacheKey
-> CacheBackend -> IO (Either CacheBackendError r)
(bound at /<redacted>/src/Cache/CacheBackend.hs:77:5)
I was trying to use a class to just return any old thing (in checkCacheEntry
/lookup :: CacheKey -> c -> IO (Either CacheBackendError r)
), but the problem is that they need to be the same kind of thing all the time for every cache entry if I want to do that – they can’t be heterogenous. I was pretty stumped at this point at why the compiler was giving me those errors, so I did something I do pretty often – ignored the problem completely and went to work on something else!.
Here’s the code for removing entries:
removeCacheEntry :: CacheKey -> STM (TVar (HMS.HashMap CacheKey a)) -> IO ()
removeCacheEntry k stm = atomically stm
>>= \v -> atomically (readTVar v)
>>= pure . HMS.delete k
>>= atomically . writeTVar v
class Cache c where
-- ... other declarations ...
-- ^ Invalidate a value already in the cache
invalidate :: CacheKey -> c -> IO ()
instance Cache CacheBackend where
-- ... other declarations ...
invalidate k@(JobFTS _) = removeCacheEntry k . scJobFTS
invalidate k@(ActiveJobOnlyFTS _) = removeCacheEntry k . scActiveJobOnlyFTS
Writing this code reminded me that I needed to do some unwrapping, scJobFTS
(which is the getter for the job full text search mapping) couldn’t handle every type of caching I wanted to do. I couldn’t ignore the problem I ran into for very long, so I started looking into what a Rigid Type Variable was, and basically just found the answer that it’s a “user-specified type”. What was causing r
, the “user-specified type”, to make the compiler unhappy? I was still stumped at this point, but I did know that regular HashMap
s do this as well – the dumbest but working way I could do this would be just to devolve to using very explicit lookup methods:
class Cache c where
-- ... other declarations ...
-- ^ Look up value from the cache
lookupJobFTS :: CacheKey -> c -> IO (Either CacheBackendError (PaginatedList JobWithCompany))
lookupActiveJobOnlyFTS :: CacheKey -> c -> IO (Either CacheBackendError (PaginatedList JobWithCompany))
instance Cache CacheBackend where
-- ... other declarations ...
lookupJobFTS k@(JobFTS _) = getCacheEntry k . scJobFTS
lookupActiveJobOnlyFTS k@(ActiveJobOnlyFTS _) = getCacheEntry k . scActiveJobOnlyFTS
-- ... other declarations ...
Of course update functionality is also pretty important:
insertCacheEntry :: CacheKey -> a -> STM (TVar (HMS.HashMap CacheKey a)) -> IO ()
insertCacheEntry k v stm = atomically stm
>>= \tv -> atomically (readTVar tv)
>>= pure . HMS.insert k v
>>= atomically . writeTVar tv
class Cache c where
-- ... other declarations ...
-- ^ Insert value(s) into the cache (will replace existing entries)
insertJobFTS :: CacheKey -> PaginatedList JobWithCompany -> c -> IO ()
insertActiveJobOnlyFTS :: CacheKey -> PaginatedList JobWithCompany -> c -> IO ()
-- ... other declarations ...
instance Cache CacheBackend where
-- ... other declarations ...
insertJobFTS k@(JobFTS _) v = insertCacheEntry k v . scJobFTS
insertActiveJobOnlyFTS k@(ActiveJobOnlyFTS _) v = insertCacheEntry k v . scActiveJobOnlyFTS
-- ... other declarations ...
With retrospect, the answer to actually solving the issue I was having turns out to be wrapping the values in some sort of union type that would cover all their possibilities – just like the CackeKey
does for all the different kinds of keys. I didn’t actually end up doing this (I ran with the just-write-a-bunch-of-lookup-functions), but just wanted to note that here.
Now that the CacheBackend
theoretically works (it compiles at least, and the types do what they’re supposed to), now it’s time to make sure my CacheBackend
gets started with the rest of the things the app needs:
-- ... other imports ...
import Cache.CacheBackend (CacheBackend(..), makeConnectedCacheBackend)
-- ... lots of code ...
startApp :: AppConfig -> IO ()
startApp c = do
-- ... lots of other code...
-- Set up the cache backend
cacheLogger <- buildLogger "App.Cache" (cacheLogLevel cacheCfg)
cacheBackendOrError <- makeConnectedCacheBackend cacheCfg (Just cacheLogger)
throwErrorIf (isLeft cacheBackendOrError) ("Failed to initialize Cache backend: \n" ++ showLeft cacheBackendOrError)
let (Right cacheBackend) = cacheBackendOrError
-- ... lots of other code ...
data ApplicationGlobals = ApplicationGlobals { globalConfig :: AppConfig
, globalBackend :: SqliteBackend
, globalMailer :: MailerBackend
, globalUserContentStore :: UserContentStoreBackend
, globalSearchBackend :: SQLiteFTS
, globalCacheBackend :: CacheBackend -- NEW!
, globalLogger :: Logger
, globalCookieKey :: WCS.Key
}
EZ PZ compile, though the code for handling the potential failure is very messy. Now theoretcially my cache backend is working, so it’s time to actually use it in some endpoints. Before I do that though, some additions to make it more ergonomic to use and help debugging:
jobFTS
It took a long time to get these three relatively simple seeming tasks done, but here’s the code for a handler and the lookupOrCompute
function (the helper I mentioned):
-- ^ Job search (only checks active, becuase that's all that's indexed), using the available search backend
jobFTS :: Maybe String
-> [JobIndustry]
-> [CompanyID]
-> Maybe Limit
-> Maybe Offset
-> [TagName]
-> WithApplicationGlobals Handler (EnvelopedResponse (PaginatedList JobWithCompany))
jobFTS term is cs limit offset tags = getBackendWithSearchAndCache
>>= \(db, cache, search) -> liftIO (try (lookupOrCompute (getJobListing cacheKey cache) (doSearch search db)))
>>= ifLeftThrowServantError -- If the lookup/compute failed then throw the error as is
>>= ifNothingThrowError (Err.enveloped Err.jobSearchFailed) -- if the lookup compute worked, but returned nothing
>>= pure . EnvelopedResponse "success" "Successfully completed search"
where
massagedJQ = massage $ JobQuery (trimSearchTerm term) is cs limit offset tags
cacheKey = JobFTS massagedJQ
-- Do the search with the backend
doSearch searchBackend dbBackend = searchJobs massagedJQ searchBackend
>>= either (error "Job FTS failed") pure
>>= hydrateSearchResultJobIDs dbBackend
--- ... a ways down ...
-- ^ Lookup or compute a value from cache
lookupOrCompute :: IO (Maybe a) -> IO (Maybe a) -> IO (Maybe a)
lookupOrCompute lookup compute = lookup >>= maybe compute (pure . Just)
Not too shabby! Here’s a pretty bad bug I noticed though – I actually wasn’t using the MassagedJQ
, when I should have been (“massaging” is just what I called a light round of user input validation and enhancement of the search terms). After getting that small bug out of the way, I let ’er rip!
Turns out there is a huge problem, both actions are hapening (computes are happening after the results should have been saved) which means the values aren’t being saved, and mostly because I’ve been using TVar
all wrong – what I needed to store wasn’t the STM TVar
(an action in the STM
monad that produces a TVar
when it’s run), I needed to store the actual TVar
itself!
Here’s how I figured this out (this is straighto ut of my notes):
STM
monad – mabye I should add an STM monad transformer to the servant app…? do they have to share context?Now that it’s working, I get a SWEET speedup. Man it feels good – and I’m not even doing a robust job in any sense of the word, but the speedup is amazing (16ms -> 3ms)
Here’s the bit where I go around and fix up stuff I’ve been ignoring and adding tests to make sure this stuff works in perpetuity. First order of business was going around and changing the SearchBackend
to use the Massaged JobQuery
rather than just the job query in various places since I stumbled upon that bug. Next was adding some tests – I’ll save you the story on how I started getting around to writing the tests, here’s some code:
-- List of JWCs
jwcList :: PaginatedList JobWithCompany
jwcList = PaginatedList jwcs 2
where
jwcs = [ JobWithCompany (ModelWithID 1 (makeJob True 1 1)) (ModelWithID 1 testCompany)
, JobWithCompany (ModelWithID 2 (makeJob True 1 2)) (ModelWithID 1 testCompany)
]
-- Query that is used when someone visits the main page
basicQuery :: Massaged JobQuery
basicQuery = Massaged $ JobQuery "" [] [] (Just 10) (Just 0) []
activeOnlyJobCacheKey :: CacheKey
activeOnlyJobCacheKey = ActiveJobOnlyFTS basicQuery
allJobCacheKey :: CacheKey
allJobCacheKey = JobFTS basicQuery
main :: IO ()
main = hspec spec
spec :: Spec
spec = around withCacheBackend $ do
describe "setup" $ it "works without a logger" $ \c -> isNothing (cacheLogger c) `shouldBe` True
describe "value insertion" $ do
it "inserts job FTS results" $ \c -> insertJobListing allJobCacheKey c jwcList `shouldReturn` ()
it "inserts active job only FTS results" $ \c -> insertJobListing activeOnlyJobCacheKey c jwcList `shouldReturn` ()
describe "value retrieval" $ do
it "retrieves inserted job FTS results" $ \c -> insertJobListing allJobCacheKey c jwcList
>> getJobListing allJobCacheKey c
>>= shouldBeSomething
>>= (`shouldBe`jwcList)
it "retrieves inserted active job only FTS results" $ \c -> insertJobListing activeOnlyJobCacheKey c jwcList
>> getJobListing activeOnlyJobCacheKey c
>>= shouldBeSomething
>>= (`shouldBe`jwcList)
describe "hasValue" $ do
it "hasValue doesn't find not-inserted job FTS results" $ \c -> hasValue allJobCacheKey c `shouldReturn` False
it "hasValue doesn't find not-inserted active job only FTS results" $ \c -> hasValue activeOnlyJobCacheKey c `shouldReturn` False
it "hasValue finds inserted job FTS results" $ \c -> insertJobListing allJobCacheKey c jwcList
>> hasValue allJobCacheKey c
>>= (`shouldBe`True)
it "hasValue finds inserted active job only FTS results" $ \c -> insertJobListing activeOnlyJobCacheKey c jwcList
>> hasValue activeOnlyJobCacheKey c
>>= (`shouldBe`True)
describe "invalidation" $ do
it "works for job FTS results" $ \c -> insertJobListing allJobCacheKey c jwcList
>> getJobListing allJobCacheKey c
>>= shouldBeSomething
>> invalidate allJobCacheKey c
>> getJobListing allJobCacheKey c
>>= (`shouldBe`Nothing)
it "works for active job only FTS results" $ \c -> insertJobListing activeOnlyJobCacheKey c jwcList
>> getJobListing activeOnlyJobCacheKey c
>>= shouldBeSomething
>> invalidate allJobCacheKey c
>> getJobListing allJobCacheKey c
>>= (`shouldBe`Nothing)
Yay for super easy async life in Haskell! Some things are better cached by time and not explicit cache invalidation so for a super simple solution I just spin off a green thread (a very helpful SO post if you’re new to the subject) that will handle the deletion in the future.
timedCacheInvalidation :: (HasLogger c, Cache c) => CacheKey -> c -> Int -> IO ()
timedCacheInvalidation k c ms = void $ forkIO (delayedInvalidation k c ms)
where
delayedInvalidation k c ms = threadDelay ms
>> try (invalidate k c)
>>= logTimedCacheInvalidation k c
logTimedCacheInvalidation :: HasLogger c => CacheKey -> c -> (Either SomeException ()) -> IO ()
logTimedCacheInvalidation k c (Left _) = logMsg c DEBUG ("Timed cache invalidation FAILED for key: " <> show k)
logTimedCacheInvalidation k c (Right _) = logMsg c DEBUG ("Timed cache invalidation SUCCESS for key: "<> show k)
insertCompanyStats k@(Stats _) c v = insertCacheEntry k v (scCompanyStats c)
>>= \res -> timedCacheInvalidation k c invalidationTimeMs
>> logCacheUpdate c res
insertCompanyStats _ _ _ = throw InvalidCacheKey
where
cfg = getCacheConfig c
invalidationTimeMs = defaultCacheInvalidation cfg
Pretty simple code there, with a single call to forkIO
. fork
being in the name might scare those of us with more low level backgrounds, as it can be tricky to use, but rest assured, the docs say that it just sparks off “one lightweight, unbound thread”. Almost suspicious of how easy it was, I wrote a test to ensure it worked
describe "timed invalidation" $
it "works for company stats" $ \c -> insertCompanyStats companyStatsCacheKey c testCompanyStats
>> getCompanyStats companyStatsCacheKey c
>>= shouldBeSomething
>> threadDelay (oneSecondMs `div` 2) -- defaultThread delay is one second, wait 2 just in case
>> getCompanyStats companyStatsCacheKey c
>>= \beforeDeletion -> threadDelay oneSecondMs -- defaultThread delay is one second, wait 2 just in case
>> getCompanyStats companyStatsCacheKey c
>>= \after -> isJust beforeDeletion && isNothing after `shouldBe` True
There are tons of reasons, but here are the ones I can see at least
lookup
/update
methods) – I did this after Step 5lookup
and accessor functions , it’s silly to have so many lookup functions in the typeclassI’m going to leave all these worries for another day :).
Here’s all the code completed!
The jobFTS
handler in my API code:
-- ^ Job search (only checks active, becuase that's all that's indexed), using the available search backend
jobFTS :: Maybe String
-> [JobIndustry]
-> [CompanyID]
-> Maybe Limit
-> Maybe Offset
-> [TagName]
-> WithApplicationGlobals Handler (EnvelopedResponse (PaginatedList JobWithCompany))
jobFTS term is cs limit offset tags = getBackendWithCacheAndSearch -- get various backends
>>= \(db, cache, search) -> liftIO (try (lookupOrComputeAndSave (getJobListing cacheKey cache) (doSearch search db) (insertJobListing cacheKey cache)))
>>= ifLeftThrowServantError -- If the lookup/compute failed then throw the error as is
>>= ifNothingThrowIOError (Err.enveloped Err.jobSearchFailed) -- if the lookup compute worked, but returned nothing
>>= pure . EnvelopedResponse "success" "Successfully completed search"
where
massagedJQ = massage $ JobQuery (trimSearchTerm term) is cs limit offset tags
cacheKey = ActiveJobOnlyFTS massagedJQ
-- Do the search with the backend
doSearch searchBackend dbBackend = searchJobs massagedJQ searchBackend
>>= either (error "Job FTS failed") pure
>>= hydrateSearchResultJobIDs dbBackend
--- ... near the bottom with the other helper code ...
-- ^ Lookup or compute a value from cache
lookupOrComputeAndSave :: IO (Maybe a) -> IO (Maybe a) -> (a -> IO ()) -> IO (Maybe a)
lookupOrComputeAndSave lookup compute save = lookup >>= maybe computeAndSave (pure . Just)
where
computeAndSave = compute
>>= maybe (pure Nothing) (\res -> save res >> pure (Just res))
Most of the CacheBackend
code:
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE UndecidableInstances #-}
module Cache.CacheBackend ( makeConnectedCacheBackend
, CacheBackend(..)
, Cache(..)
, CacheKey(..)
) where
import Config (LoggerConfig(..), CacheConfig(..))
import Control.Concurrent (forkIO, threadDelay)
import Control.Concurrent.STM (STM, atomically)
import Control.Concurrent.STM.TVar (TVar, newTVarIO, readTVar, writeTVar)
import Control.Exception (Exception, SomeException, throw, try)
import Control.Monad.State (liftIO, void)
import Data.Hashable (Hashable)
import Data.Maybe (Maybe)
import Data.Monoid ((<>))
import GHC.Generics (Generic)
import System.Log.Logger (Logger, Priority(..))
import Types (HasLogger(..), JobQuery, JobWithCompany, Massaged(..), PaginatedList, UserID, Company, CompanyStats, CompanyID, Limit, Offset, ModelWithID, Tag, TagID)
import qualified Data.HashMap.Strict as HMS
data CacheBackend = LocalMemoryCache { cacheCfg :: CacheConfig
, cacheLogger :: Maybe Logger
, scJobFTS :: TVar (HMS.HashMap CacheKey (PaginatedList JobWithCompany))
, scActiveJobOnlyFTS :: TVar (HMS.HashMap CacheKey (PaginatedList JobWithCompany))
, scCompanyStats :: TVar (HMS.HashMap CacheKey CompanyStats)
-- ... more stores in the cache ...
}
-- \| NewImplementation might go here
data CacheKey = JobFTS (Massaged JobQuery)
| ActiveJobOnlyFTS (Massaged JobQuery)
| Stats CompanyID deriving (Generic, Show, Eq)
-- ... some cache keys have been omitted ...
instance Hashable CacheKey => Hashable CacheKey
data CacheBackendError = InvalidCacheKey
| UnexpectedFailure deriving (Eq)
instance Exception CacheBackendError
instance Show CacheBackendError where
show UnexpectedFailure = "An unexpected failure occurred" -- these could be more descriptive for sure
show InvalidCacheKey = "Invalid cache key"
makeConnectedCacheBackend :: CacheConfig -> Maybe Logger -> IO (Either SomeException CacheBackend)
makeConnectedCacheBackend cfg maybeL = sequence [newTVarIO HMS.empty, newTVarIO HMS.empty]
>>= \[jobFTS, activeJobOnlyFTS] -> newTVarIO HMS.empty
>>= \companyStats -> connectCacheBackend LocalMemoryCache { cacheCfg=cfg
, cacheLogger=maybeL
, scJobFTS=jobFTS
, scActiveJobOnlyFTS=activeJobOnlyFTS
, scCompanyStats=companyStats
-- ... more setters ...
}
logErrAndReturn :: CacheBackend -> String -> SomeException -> IO CacheBackend
logErrAndReturn c msg err = logMsg c ERROR (msg <> ": " <> show err) >> return c
getCacheEntry :: CacheKey -> TVar (HMS.HashMap CacheKey a) -> IO (Maybe a)
getCacheEntry k tv = atomically (readTVar tv)
>>= pure . HMS.lookup k
-- ^ Insert an entry into a given shared concurrent hash map, returning the updated map
insertCacheEntry :: CacheKey -> a -> TVar (HMS.HashMap CacheKey a) -> IO (HMS.HashMap CacheKey a)
insertCacheEntry k newValue tv = atomically (readTVar tv)
>>= pure . HMS.insert k newValue
>>= \updatedMap -> atomically (writeTVar tv updatedMap)
>> pure updatedMap
checkCacheEntry :: CacheKey -> TVar (HMS.HashMap CacheKey a) -> IO Bool
checkCacheEntry k tv = atomically (readTVar tv)
>>= pure . HMS.member k
removeCacheEntry :: CacheKey -> TVar (HMS.HashMap CacheKey a) -> IO ()
removeCacheEntry k tv = atomically (readTVar tv)
>>= pure . HMS.delete k
>>= atomically . writeTVar tv
emptyCache :: TVar (HMS.HashMap CacheKey a) -> IO ()
emptyCache tv = atomically (writeTVar tv HMS.empty)
timedCacheInvalidation :: (HasLogger c, Cache c) => CacheKey -> c -> Int -> IO ()
timedCacheInvalidation k c ms = void $ forkIO (delayedInvalidation k c ms)
where
delayedInvalidation k c ms = threadDelay ms
>> try (invalidate k c)
>>= logTimedCacheInvalidation k c
logCacheHitOrMiss :: (HasLogger c, Cache c) => c -> CacheKey -> Maybe a -> IO (Maybe a)
logCacheHitOrMiss c k Nothing = logMsg c DEBUG ("Cache miss for key "<>show k) >> return Nothing
logCacheHitOrMiss c k res@(Just a) = logMsg c DEBUG ("Cache hit for key "<> show k) >> return res
logCacheUpdate :: (HasLogger c, Show a) => c -> a -> IO ()
logCacheUpdate c = logMsg c DEBUG . ("Cache Updated: "<>) . show
logTimedCacheInvalidation :: HasLogger c => CacheKey -> c -> Either SomeException () -> IO ()
logTimedCacheInvalidation k c (Left _) = logMsg c DEBUG ("Timed cache invalidation FAILED for key: " <> show k)
logTimedCacheInvalidation k c (Right _) = logMsg c DEBUG ("Timed cache invalidation SUCCESS for key: "<> show k)
class Cache c where
getCacheLogger :: c -> Maybe Logger
getCacheConfig :: c -> CacheConfig
-- ^ Connect to the cache backend
connectCacheBackend :: c -> IO (Either SomeException c)
-- ^ Look up value(s) from the cache
getJobListing :: CacheKey -> c -> IO (Maybe (PaginatedList JobWithCompany))
getCompanyStats :: CacheKey -> c -> IO (Maybe CompanyStats)
-- ... more getters :( ...
-- ^ Insert value(s) into the cache (will replace existing entries)
insertJobListing :: CacheKey -> c -> PaginatedList JobWithCompany -> IO ()
insertCompanyStats :: CacheKey -> c -> CompanyStats -> IO ()
-- ... more inserters :( ...
-- ^ Check whether a key has a value
hasValue :: CacheKey -> c -> IO Bool
-- ^ Invalidate a value already in the cache
invalidate :: CacheKey -> c -> IO ()
-- ^ Invalidate caches
invalidateAllJobListings :: c -> IO ()
invalidateCompanyListing :: c -> IO ()
-- ... slightly less but still more invalidators :( ...
instance HasLogger CacheBackend where
getComponentLogger = cacheLogger
instance Cache CacheBackend where
getCacheLogger = cacheLogger
getCacheConfig = cacheCfg
connectCacheBackend = return . Right -- This will have to chance once I have a non-local-memory type of CacheBackend
getJobListing k@(JobFTS _) c = getCacheEntry k (scJobFTS c)
>>= logCacheHitOrMiss c k
getJobListing k@(ActiveJobOnlyFTS _) c = getCacheEntry k (scActiveJobOnlyFTS c)
>>= logCacheHitOrMiss c k
getCompanyStats k@(Stats _) c = getCacheEntry k (scCompanyStats c)
>>= logCacheHitOrMiss c k
getCompanyStats _ _ = throw InvalidCacheKey
-- ... more pairs of getters :( ...
insertJobListing k@(JobFTS _) c v = insertCacheEntry k v (scJobFTS c) >>= logCacheUpdate c
insertJobListing k@(ActiveJobOnlyFTS _) c v = insertCacheEntry k v (scActiveJobOnlyFTS c) >>= logCacheUpdate c
insertJobListing _ _ _ = throw InvalidCacheKey
insertCompanyStats k@(Stats _) c v = insertCacheEntry k v (scCompanyStats c)
>>= \res -> timedCacheInvalidation k c invalidationTimeMs
>> logCacheUpdate c res
where
invalidationTimeMs = (cacheDefaultTimedInvalidationMs . getCacheConfig) c
insertCompanyStats _ _ _ = throw InvalidCacheKey
-- ... more pairs of inserters :( ...
hasValue k@(JobFTS _) = checkCacheEntry k . scJobFTS
hasValue k@(ActiveJobOnlyFTS _) = checkCacheEntry k . scActiveJobOnlyFTS
hasValue k@(Stats _) = checkCacheEntry k . scCompanyStats
-- ... more hasValue pattern completions ...
invalidate k@(JobFTS _) = removeCacheEntry k . scJobFTS
invalidate k@(ActiveJobOnlyFTS _) = removeCacheEntry k . scActiveJobOnlyFTS
invalidate k@(Stats _) = removeCacheEntry k . scCompanyStats
-- ... more invalidate pattern completions ...
invalidateAllJobListings c = emptyCache (scJobFTS c)
>> emptyCache (scActiveJobOnlyFTS c)
invalidateCompanyListing c = emptyCache (scCompanyListing c)
So it was pretty fun to implement all this, and I hope you learned from some of my mistakes at least and got a taste for what novice (hopefully amateur?) Haskell looks like! Feel free to drop me a line if you see a just absolutely horrid mistake/bad approach, I’d love to learn more and get to know what I’m doing wrong/right.