Adding Sqlite Powered Fts Search To A Servant Powered Haskell App

Adding light and easy Full Text Search with SQLite3 to my servant-powered web app

vados

17 minute read

tl;dr - You may not need a big robust beautifully crafted DB like Postgres every time you build an app. Sometimes SQlite is probably enough. SQLite even provides Full Text Search addons in the way of FTS3/4 and FTS5, so that’s cool too – skim through the article for code snippets on the why/how I’m using SQLite + FTS3/4.

If you’re unfamiliar with SQLite, it’s a pretty awesome light-weight SQL-compliant RDBMS. It’s often used as the first/starter database for bigger application frameworks like Django and Rails due to it’s simplicity and lack of dependencies (it comes installed on just about every distro, and keeps all data required for the database in a single file).

People often use SQLite only while initially building their app, and while these days it is indeed much easier to get bigger more robust RDBMSes like Postgres and MariaDB up and running easily thanks to tooling like containers, I’ve recently started wondering whether most applications I write need anything more than SQLite. As with most non-trivial questions, the answer is probably “it depends”, but after reading the guide on when to use SQLite, I’m more than a little convinced that no app I’ve written has ever needed more than SQLite had to offer me (at the very least I’ve never had that many users where the database became the issue). I really wanted to test the limits of SQLite, so my most recent project ( a job board application) in Haskell+Servant is actually using SQLite on the backend, and I plan on using it in production. There are a bunch of reasons I think it’s OK, and maybe these will convince you, maybe they won’t:

  1. At the beginning (at the very least), my DB workload will be VERY read heavy, it’s not like I’ll have users writing tons of state at the same time
  2. Caching (in-memory at the least, possibly redis) WILL be implemented before launch (I actually just finished implementing it today, post will come later)
  3. I don’t expect a crazy amount of traffic
  4. I don’t expect reads at the DB layer to be the bottleneck on the application, especially with a small cache in front

One of the biggest weaknesses of SQLite is the difficulty of horizontally scaling it (multiple “connections” to a single open SQLite file are not really a safe thing – but I think early on this won’t be such an issue. If I find that my application is being overloaded by what would probably be hundreds or thousands of requests a second and is falling down due to the database, after being dissapointed at being wrong I’d be pretty darn happy while quickly fixing the issue.

Either way, I’m using SQLite for my app and doubt I’ll need to change over anytime soon (if you’re still shaking your head, don’t worry, it’s probably not that hard to convert to using Postgres, I’ve built a reasonable DB migration scheme and changing the queries wouldn’t be too bad).

In another fit of YAGNI-inspired madness, I’ve chosen to eschew solutions like ElasticSearch or Solr and Full Text Search support in Postgres in favor of Full Text Search for SQLite3 – choosing the FTS3/4 extension over FTS5 (since the haskell driver doesn’t have FTS5 just yet, though it’s probably only a .cabal file change away. The haskell driver for sqlite does support FTS3/4 though, so I went with that for now.

Without much more ado, here’s how I went about implementing it in my own application. Note that you could use this scheme to setup a very small search compnent implementation for your own application without committing to SQLite as your main database.

Step 0: RTFM

As all great adventures do, I generally start by trying to find and Read The Fucking Manual. I use sqlite-simple, so that’s the first place I looked for any mention of FTS3/4 usage. Since nothing was mentioned there, I went one level deeper and looked at direct-sqlite. FTS3/4 (and FTS5) are just extensions to SQLite that have to be compiled in at sqlite3 lib compilie time, so I looked around for mention of those options and found it (It looks like after that, someone opened up a ticket and the very next PR in the repo will add FTS5 support). So with that, I know now that FTS3/4 at the very least are supported with the direct-sqlite library (and sqlite-simple by proxy). At first I thought I would need to compile sqlite3 libraries myself and do some compilie-time linking or what not but this saves me a ton of time.

Note that SQLite3 that exists on your system (if you go into a console and run sqlite3 <file>) and the libraries that haskell uses are not the same (like in the case of direct-sqlite) – just because FTS5 (or other plugins your distro might have compiled sqlite libraries with) is supported when you try things locally through sqlite3 doesn’t mean it’s supported from inside your haskell world.

Step 1: Start adding component machinery for a SQLiteFTS component (specific to my app structure)

This step is quite specific to how I’ve chosen to write my app so feel free to skip it.

For most code I write these days I embrace a pretty loose component pattern somewhat similar to stuartsierra’s component clojure library. The concept isn’t new, but thinking of the app as a system of components helps to give me much needed organizational structure, and generally helps me separate concerns – I’ve found this design pattern to be very practical and beneficial to me.

For example, answering the question “what goes into app startup” is pretty simple when you’re following a component-based pattern – all the components need to start. Once all the components are started, the “system” is started.

Here’s what the general structure of the search component looks like (relevant typeclasses, types, etc):

data SQLiteFTS = SQLiteFTS { searchCfg       :: SearchConfig
                           , searchLogger    :: Maybe Logger
                           , searchDBConn    :: Maybe S.Connection
                           , searchDBBackend :: SqliteBackend
                           }

makeConnectedSearchBackend :: SearchConfig -> Maybe Logger -> SqliteBackend -> IO (Either SomeException SQLiteFTS)
makeConnectedSearchBackend c maybeL db = connectSearch SQLiteFTS { searchCfg=c
                                                                 , searchLogger=maybeL
                                                                 , searchDBConn=Nothing
                                                                 , searchDBBackend=db
                                                                 }


data SearchBackendError = UnexpectedFailure
                        | MigrationFailure deriving (Eq)

instance Show SearchBackendError where
    show UnexpectedFailure = "An unexpected failure occurred"
    show MigrationFailure  = "Failed to migrate/setup search DB"

logErrAndReturn :: SQLiteFTS -> String -> SomeException -> IO SQLiteFTS
logErrAndReturn s msg err = logMsg s ERROR (msg <> ": " <> show err) >> return s

class SearchBackend s where
    getSearchLogger :: s -> Maybe Logger

    -- ^ Connect to the search backend
    connectSearch :: s -> IO (Either SomeException s)

    -- ^ Initialize the search backend database
    initialize :: s -> IO (Either SomeException s)

    -- ^ Search for jobs, given a job query (returns the list of jobIDs along with the total amount)
    searchJobs :: Massaged JobQuery -> s -> IO (Either SearchBackendError (PaginatedList JobID))

    -- ^ Disconnect
    disconnect :: s -> IO s

Here’s the early implementation of this type class (all methods not below were stubbed out with undefined at the start):

instance SearchBackend SQLiteFTS where
    getSearchLogger (SQLiteFTS _ l _ _) = l

    connectSearch s = logMsg s INFO ("Connecting to SQLITE FTS search backend @ (" <> addr <> ")")
                      >> try (S.open addr)
                      >>= either (logErrAndReturn s "Failed to connect to DB address") handleSuccess
                      >>= initialize
        where
          cfg = searchCfg s
          addr = searchAddr cfg
          traceQueries = searchTraceQueries cfg
          handleSuccess :: S.Connection -> IO SQLiteFTS
          handleSuccess conn = when traceQueries (S.setTrace conn (Just (logTextMsg s DEBUG)))
                               >> pure (s { searchDBConn=Just conn })

    initialize s = maybe (return (Right s)) setup (searchDBConn s)
        where
          handleSuccess :: () -> IO (Either SomeException SQLiteFTS)
          handleSuccess _ = logMsg s INFO "Search successfully migrated"
                            >> return (Right s)

          setup c = try (doMigration_ (searchCfg s) c)
                    >>= either (return . Left) handleSuccess

    -- ... lots of stubs below ...

Here’s what the configuration it takes looks like:

data SearchConfig = SearchConfig { searchAddr             :: String
                                 , searchLogLevel         :: Priority
                                 , searchTraceQueries     :: Bool -- this was added much later during some debugging sessions
                                 , searchFTSMigrationFile :: FilePath
                                 } deriving (Eq)

My configuration documentation/loading scheme is out of the scope of this blog post so I won’t talk about it today but maybe in a future post.

NOTE At the beginning working on the SqliteFTS integration, I chose the in-memory SQLite address :memory: for the database. It’s a pretty neat no-hassle way to mess around with a new important component with no pressure added, especially when trying to get the familiars of startup right for a component with somewhat ephemeral data (my search inidices get re-built at every app launch).

If you build your indices in a way that doesn’t rely on being in the same DB as the actual database (basically if you’re not just doing `INSERT INTO SELECT FROM ), this setup gets even more flexible – obviously, this would enable you to completely seperate the main store from the search component.

Step 2: Write some SQL that will build your corpus

Of course, to do any meaningful search, you need a corpus of documents to do searches on. The easiest way for me to do this was to have the SQLiteFTS component and the SqliteBackend component which stored all the entities for the application just share the same database. This is not necessarily a good engineering choice, but it was certainly an expedient one. This reduced my document-inserting code complexity to a INSERT INTO <somewhere> SELECT FROM <somewhere else> statement, and I certainly like that.

Here’s what the code to do the migrations looked like:

getMigrationSQL :: SearchConfig -> IO DT.Text
getMigrationSQL = fmap DT.pack . readFile . searchFTSMigrationFile

doMigration_ :: SearchConfig -> S.Connection -> IO ()
doMigration_ cfg c = getMigrationSQL cfg
                     >>= Database.SQLite3.exec (S.connectionHandle c)

Here’s what the first migration for the SQLiteFTS component looks like:

-- Create FTS table for active jobs
DROP TABLE IF EXISTS search_active_jobs;
CREATE VIRTUAL TABLE IF NOT EXISTS search_active_jobs USING fts4(title, description, industry, jobType, employerId, postingDate, applyLink, tags);

INSERT OR REPLACE INTO search_active_jobs(docid, title, description, industry, jobType, employerId, postingDate, applyLink, tags)
SELECT id, title, description, industry, jobType, employerId, postingDate, applyLink, tags
FROM jobs
WHERE isActive=1;

As you can see, it’s pretty simple – since the FTS database tables are built every time the app starts up, I only need to write one migration file (and use update/use it repeatedly to refresh the database even, in the most naive methodology).

At this point, I can at least run the files from the SQLite console directly and test out commands and see what goes in the tables – also, at this point I wrote enough code so the app could at lesat start up, with the SQliteFTS component starting up alongside everything else.

Step 3: Write the endpoint that will use the search

In a bit of a front-to-back approach, I designed the API/way I want to talk to the SQLiteFTS component first. This meant going into my handler code and creating various routes and endpoints to handle search and pretending that the SQLiteFTS was done and calling it as I would want to call it from the handlers of the API. This made it pretty easy to feel out a good design for the API of the SQLiteFTS.

I didn’t leave good notes on what the early endpoints looked like but here’s an endpoint that uses the FTS backend (simplified to remove features that came after):

-- ^ Job search (only checks active, becuase that's all that's indexed), using the available search backend
jobFTS :: Maybe String
       -> [JobIndustry]
       -> [CompanyID]
       -> Maybe Limit
       -> Maybe Offset
       -> [TagName]
       -> WithApplicationGlobals Handler (EnvelopedResponse (PaginatedList JobWithCompany))
jobFTS term is cs limit offset tags = getBackendWithSearch -- get various backends
                                      >>= \(dbBackend, searchBackend) -> searchJobs massagedJQ searchBackend
                                      >>= either (error "Job FTS failed") pure
                                      >>= hydrateSearchResultJobIDs dbBackend
                                      >>= ifNothingThrowError (Err.enveloped Err.jobSearchFailed)
                                      >>= pure . EnvelopedResponse "success" "Successfully completed search"
    where
      massagedJQ = massage $ JobQuery (trimSearchTerm term) is cs limit offset tags

As you can see, the interface I want is pretty simple – I want to be able to give the SearchBackend typeclass-compliant SQLiteFTS component a JobQuery and have it return a list of IDs. Then, that list of IDs gets served to the backend to be hydrated into a full result set. There are some fields that I don’t store in the FTS tables (also the fact that I generally need to provide company details along with jobs, and they’re not in FTS yet), which is why I went with the get-id-then-hydrate route.

Step 4: Write the functionality for the SearchBackend component

At this point, the search backend is up and running, properly migrates itself @ start, and the API is expecting data now from the jobFTS. The only problem now is that it doesn’t do anything useful yet. This is the perfect time to start writing the implementation of the SearchBackend typeclass for the SQLiteFTS type.

instance SearchBackend SQLiteFTS where
    getSearchLogger (SQLiteFTS _ l _ _) = l

    connectSearch s = logMsg s INFO ("Connecting to SQLITE FTS search backend @ (" <> addr <> ")")
                      >> try (S.open addr)
                      >>= either (logErrAndReturn s "Failed to connect to DB address") handleSuccess
                      >>= initialize
        where
          cfg = searchCfg s
          addr = searchAddr cfg
          traceQueries = searchTraceQueries cfg
          handleSuccess :: S.Connection -> IO SQLiteFTS
          handleSuccess conn = when traceQueries (S.setTrace conn (Just (logTextMsg s DEBUG)))
                               >> pure (s { searchDBConn=Just conn })

    initialize s = maybe (return (Right s)) setup (searchDBConn s)
        where
          handleSuccess :: () -> IO (Either SomeException SQLiteFTS)
          handleSuccess _ = logMsg s INFO "Search successfully migrated"
                            >> return (Right s)

          setup c = try (doMigration_ (searchCfg s) c)
                    >>= either (return . Left) handleSuccess

    searchJobs mjq@(Massaged jq) s = maybe (return (Left UnexpectedFailure)) handle $ searchDBConn s
        where
          (query, params) = SDBQ.makeJobQuerySQL jq
          logQuery = logMsg s DEBUG $ "Fn searchJobs query:" <> show query
          doSearch c = fmap S.fromOnly <$> (S.queryNamed c query params :: IO [S.Only JobID])
          getTotal c = extractSingleNumber <$> (S.queryNamed c (convertToCountQuery query) params :: IO [S.Only Int])
          handle c = logQuery
                     >> getTotal c
                     >>= \total -> doSearch c
                     >>= \rows -> pure $ Right (PaginatedList rows total)

    disconnect s = maybe (return s) (\c -> S.close c >> return s {searchDBConn=Nothing}) (searchDBConn s)

This is what the functionality looks like in it’s current form (removing some code from features what were developed later). It’s not the prettiest, but it’s not hideous (in my eyes anyway) either.

Step 3: Test things from the frontend

After getting this all to compile (with lots of help from GHC), it was time to do some quick testing with the front-end spun up. Testing methodology is pretty quick & dirty:

  • Start the app from GHCI
  • Open up a browser (Firefox Developer Edition because it’s the best, of course)
  • Hit a listing of the jobs on the site that I know hits the backend endpoint I expect
  • Watch app logs
  • Ensure that the IDs I expect to come back do come back (trying out searches that should return partial result-sets)

True to the usual promise of haskellers everywhere, “if it compiles, then it’s probably right”™.

Now it’s time to make it more robust, add tests, and some of the other stuff principled engineers do.

Add code that updates the search backend

One of the first things you might have noticed that this very naive implementation is lacking is any facilities to rebuil/update itself. This is an issue particularly because I’ve built quite a lot of administration into the application, but with the code as-is, the search index wouldn’t get update despite something relatively major like a new job being added. That’s obviously a pretty serious problem for a live API, so the first order of business was to fix that.

In the simplest case, I could just call the initialize method over and over again (since it reloads everything), but I wanted to take some time to write a slightly more efficient solution at the very least, since initialize will set up every possible index.

Here’s the code that I added to SearchBackend typeclass and the implementation that did the job:

class SearchBackend s where
    -- ... other declarations ...
    refreshJobsIndex :: s -> IO ()

instance SearchBackend SQLiteFTS where
    -- ... lots of declarations ...
    refreshJobsIndex s = maybe (pure ()) doRefresh (searchDBConn s)
        where
          doRefresh c = logMsg s DEBUG "Refreshing job search index..."
                        >> S.execute_ c SDBQ.refreshJobsIndex

Here’s what the SQL for that looks like:

-- ^ Refresh the jobs index
refreshJobsIndex :: ST.Query
refreshJobsIndex = ST.Query "INSERT OR REPLACE INTO search_active_jobs(docid, title, description, industry, jobType, employerId, postingDate, applyLink) \
                            \ SELECT id, title, description, industry, jobType, employerId, postingDate, applyLink \
                            \ FROM jobs \
                            \ WHERE isActive=1"

It’s basically identical to the initial loading migration – and as I add more indices, I’ll need to make more like these. The types aren’t too descriptive, but the method certainly does the job.

Write backend integration tests

Since the SearchBackend is a component, let’s test the whole thing with an integration test. I use HSpec for my tests – maybe check out their users manual/quick start guide if you’ve never heard of it/are curious.

Here are what some tests for it look like:

makeTestSearchConfig :: FilePath -> SearchConfig
makeTestSearchConfig path = cfg { searchAddr=path }
  where
    cfg = appSearchConfig defaultTestConfig

makeTestBackendConfig :: FilePath -> BackendConfig
makeTestBackendConfig path = cfg { dbAddr=path }
  where
    cfg = appBackendConfig defaultTestConfig

-- ^ Use a temporary system file to power the backend, passing the same file to search backend, since SQLiteFTS works on the same DB
withSearchBackend :: (SQLiteFTS -> IO a) -> IO a
withSearchBackend action = withSystemTempFile tempDBFileTemplate $ \path _ -> bracket (makeBackends path) disconnectBackends action
  where
    makeBackends path = startBackend (makeTestBackendConfig path)
                        >>= makeConnectedSearchBackend (makeTestSearchConfig path) Nothing
                        >>= either (\e -> error ("Failed to create backend:" ++ show e)) return
    disconnectBackends s = Search.SearchBackend.disconnect s
                           >> Types.disconnect (searchDBBackend s)

main :: IO ()
main = hspec spec

spec :: Spec
spec = around withSearchBackend $ do
  describe "setup" $ do
    it "works without a logger" $ \s -> isNothing (searchLogger s) `shouldBe` True
    it "has a DB connection" $ \s -> isJust (searchDBConn s) `shouldBe` True

  describe "initialization" $ do
    it "works with no data" $ initialize >=> containsRightValue
    it "works with data added to the backend" $ \s -> makeActiveTestJob (searchDBBackend s)
                                                      >> initialize s
                                                      >>= containsRightValue

--- ... more tests ...

  describe "search by company ID" $ do
    it "finds job posted by company" $ \s -> makeActiveTestJob (searchDBBackend s)
                                                >>= \(ModelWithID cid _, _) -> initialize s
                                                                                 >>= shouldBeRight
                                                                                 >>= searchJobs (Massaged (JobQuery "" [] [cid] Nothing Nothing []))
                                                                                 >>= shouldBeRight
                                                                                 >>= shouldMatchList [cid] . items


As you can see, I’ve got a bunch of helper methods (like containsRightValue and makeActiveTestJob) to help me make the tests as readable as possible. Writing the tests for the SQliteFTS component was a little interesting because I had to start up the DBBackend component to help it along – so that made the withSearchBackend function a little more interesting.

Of course, there are way more tests than this, but this gives a good overview of what the structure looks like. Early on, I only had search by term, so the test started off simple (like the implementation) and got progressively more complex. The stack of TODOs was formidable, but getting them all done felt great.

Make search more robust

As I briefly noted earlier, the earliest implementations of search were just by term only – of course, people will want to search by more than just FTS term, so I added things that were already in the original search like search by Company or search by industry, etc.

Write more tests

As the app got more complex, I wrote more tests – pretty self explanatory. Every new facet of search needed to at least get one happy-path test. Tests, at the very least regression tests, are so critical to moving fast and deveoping software, along with building confidence in your own codebase. While I don’t subscribe to the entirety of TDD, the renewed emphasis on tests-as-a-part-of-fundamentally-sound-engineeering is certainly something I endorse.

Switch the frontend over to the new system

As I finished the new search feature, it was ready for primetime so I switched to frontend endpoints all to use it. I was super pleased with the quicker search time as well as the separation gained by the new system. Also, as far as I’m concerned the simplicity of the system also stayed low, no need to add more local development/production resources for ElasticSearch or switch to a bigger database might be more performant.

So for now, Iv’e got a nice SQLite-powered FTS feature implemented – we’ll see if I come to regret choosing SQLite for this as time goes on.

Did you find this read beneficial? Send me questions/comments/clarifciations.
Want my expertise on your team/project? Send me interesting opportunities!