tl;dr - Building a simple URL bouncer with Servant isn’t that hard, and the usual warm fuzzies you get from well-typed functions, interfaces, and code still apply
If you’re not familiar with Haskell or Servant, the former is a programming language that focuses on pure functional concepts and the latter is one of the most interesting/popular frameworks for it that specializes in exposing your API as a type itself. A brief taste of both of these things is below:
-- A simple API declared with Servant that exposes one endpoint, /api/v1/users, to GET and POST requests
type API = "api" :> "v1" :> UsersAPI
type UsersAPI = "users" :> Get '[JSON] (PaginatedList User)
<|> "users" :> ReqBody User :> Post '[JSON] (EnvelopedResponse (ModelWithID User))
This is a basically a snippet of code from my own codebase, but should show you just how expressive Haskell is and how the utilities of Servant fit together to extend that expressiveness to the web-server domain. This isn’t an introductory post on either of those technologies, so if you’re looking for that you might want to check out some excellent resources already out there.
A lot of my work these days is with/through GAISMA, a Japanese company I run with a buddy of mine on a fairly recent project we’re hoping on (fully) launching this year – a job board aimed at the flourishing bilingual market in Japan called The Start (project now defunct). At the outset of the project, TheStart was loaded up a list of killer features that could set the project apart, and as I start to iterate on them I realize just how much is involved in actually executing even the simplest of goals in a reasonable, well-engineered manner. Even the simple idea of “get a job board up” has taken me lots more time than I thought. This blog post is an in-some-kind-of-depth look at just one of the features I thought was a footnote, but took some time to think through – a URL bouncer for the early version of the site (v1.1).
Of course, while dreaming up all the awesome features of this new job board (while job boards aren’t terribly exciting ideas, there is definitely some room left to innovate), the reasonable expectation that bounces occuring on the site should be tracked was forgotten. I’m not referring to bounces in the active user/traffic bounces sense (which is usually the term for when a user comes to your site, hangs around for a bit then leaves to browse the rest of the internet), rather I’m referring to what others might call referrals/conversions, which is when someone sees a job posting they’re interested in on the site, and clicks through to apply.
From the beginning I aimed to build a fairly transparent platform, where business users who made accounts as company reps were able to see just how their jobs were doing, how many clicks/conversions they were garnering, and see just how useful the platform was to them. I think anything less than that level of transparency is just hoping for under-informed customers and I don’t think that makes a good business strategy (I also haven’t studied business at length in any academic capacity so take that statement with a boulder of pure salt). The first obvious thing to do to increase this transparency is of course to count the basic value offer of our product (the job board) – getting people to check out the jobs themselves!
As anything that seems simple at the start, but then gets more complicated as you unpack it, I realized we needed to use bounce URLs in very different ways:
Having these statistics would increase the level of analysis (from 0 to something, I guess) we could perform in any one of these areas, and possibly make a huge difference down the line when someone tried to pull insight from the madness. Answering a question like “what are the most popular job postings in a given industry?” or “what are the most popular job postings of all time, and which companies do they belong to?”, are obviously good questions to ask/be able to answer – it’s a no-brainer.
I can’t say there was much critical thought put into this decision – entire companies have been built on the promise of “simple” link redirection with metrics. That’s a big indicator of the kind of dangers that could be lurking – those same companies have also made some mistakes that were somewhat subtle, and I was fooled by as well. However, the idea of a link redirection scheme seemed so simple that I chose to build it myself. I also (often to my own detriment) very highly value knowing as much of my “stack” as possible so I’m less surprised in the future, and enjoy building/managing my own infrastructure so it was an easy decision (whether it was right is a whole ’nother topic).
One piece of infrastructure that we already used that seemed like it could be up for the job was Piwik. I looked semi-desperately for a quick win there, wondering if Piwik somehow supported the creation of simple configurable/API-accessible URL redirects/redirection, but I could never find the feature. There was also the worry of now coupling my pretty no-frills/simple application to Piwik (which is a much more complicated contraption) and that kind of worried me. Spinning up the app in a completely separate environment normally only requires just the database and the executable, but including this would require running Piwik (or some in-memory mock of it) as well – and that’s not such an easy task, even with tools like docker/docker compose around.
While this feature is pretty large – architectural considerations (basically just trying not to paint myself into a corner), frontend, and backend work – here’s the 1000ft view of what needed to be done:
While the list looks pretty short written down, there’s a lot packed into each item, and lots of complexity to be avoided.
As the post progresses, I’ll cover how each of these portions were done in the order that they were done. This should do a lot to show what my development process was like and in generally what working with a Haskell codebase CAN be like.
After some light thought it was pretty obvious I’d need a type that holds IDs, URLs, and creation dates at the very least (immutability was also probably a plus here, wouldn’t want history/stats for a certain URL to just suddenly disappear. I’d also probably want a wider type with other kinds of information that I could gather about hosts that were bounced (what IP? what country? referrer? device? etc, whatever is on the User Agent that the browser shares with me).
Minutes/hours in and there’s already a relatively large structural dilemma to consider: How exactly should the bouncing be done? There are a bunch of different things you can collect depending on how you do it. Here are two basic approaches:
The crux of this approach is returning a HTTP response that instead of being 200 (which means OK), and content that is (usually) a webpage to show or some sort of data, to instead returning a HTTP response that says 302 (go to this place instead, for now). If it were a conversation it might go something like this:
Browser: Hey do you know where this /jobs/apply/1234 thing is? Do you have it?
Server: Hey yeah, we have that, but actually you should visit some-company.com/apply/artisanal-pickle-salesman-position for that information directly. Matter of fact, just leave here and go straight there
NOTE - DO NOT use HTTP 301s for non-permanent redirections. Want to know why? Go read the wiki or the HTTP spec, it’s good for you. Misusing 301 is VERY painful in production.
Simple 302s will yield you only the information kept in the actual web request, because the person basically never gets a chance to load a page on your server, but it’s the fastest and the least-scummy, and least offputting for users. 99% of users never stop to consider the fact that when they click a link in Facebook messenger or Google hangouts it’s not actually a link to the content, but actually Facebook/Google’s link to the link to the content. This enables them to build data on what people are talking about, what’s trending, and whatever else they may be using it for.
window.location
This approach is a little different from HTTP 302 redirection in that it actually loads a webpage, but then uses javascript to collect some more metrics (or maybe even ask if the person wants to be redirected, or whatever else), and do the eventual redirect, using the browser location API.
This basically amounts to adding a script tag like the following to the page you serve:
<script>
// ... Some other logic ...
window.location = "https://www.some-company.com/apply/artisanal-pickle-salesman-position";
</script>
There a few more things you can collect/do when you go with the javascript approach. If it were a conversation, it would go like this:
Browser: Hey do you know where this /jobs/apply/1234 thing is? Do you have it?
Server: Hey yeah, we have that, check out this page
Browser: Cool, thanks
** seconds later browser is redirected to some-company.com/apply/artisanal-pickle-salesman-position… **
Since I really don’t need to collect that much information from browsers right now (and depending on what you do it can be pretty shady/undesirable), I figured I’d go with the basic 302 redirects for now. In the future if I want to do a different type, I can just do some parametrization/change-up the types and modify the code to show a page that will do the redirect. I think I successfully made this decision in a way that makes sense and doesn’t paint me into a corner, which feels like a win. Only time will tell!
Haskell places a focus on types. Types help you think clearer, and thinking clearly about what’s happening in a program helps you write better code. With that massive over-simplification of the benefits of Haskell out of the way, heres what the first draft (relatively close to the final draft as well) of the types look like:
data URLBounceConfig = URLBounceConfig
{ targetUrl :: String
, name :: Maybe String
, bounceCreatedAt :: DateTime
} deriving (Eq, Show)
data URLBounceInfo = URLBounceInfo
{ bouncedHost :: String
, bouncedAt :: DateTime
, referer :: Maybe String
, userAgent :: Maybe String
} deriving (Eq, Show)
-- .... a few lines down ...
$(deriveJSON defaultOptions ''URLBounceConfig)
$(deriveJSON defaultOptions ''URLBounceInfo)
It’s all pretty self-explanatory, and I’m not even using the most specific types I could, for simplicity (I could replace String
with Hostname
to be a bit more forthright, replace targetUrl
’s type with a proper URI
/URL
type). maybe in the future I’ll revisit and add stronger type restrictions. you can basically ignore deriveJSON
for now, but just think of it as a super useful thing that writes a whole lot of boiler plate that makes it easy to read these Haskell types to/from JSON objects automagically.
-- Create the url bounce table
CREATE TABLE IF NOT EXISTS url_bounces (
id INTEGER PRIMARY KEY,
name TEXT UNIQUE,
targetUrl TEXT,
isActive INTEGER,
createdAt TEXT);
CREATE TABLE IF NOT EXISTS url_bounce_info (
id INTEGER PRIMARY KEY,
bouncedHost TEXT,
bouncedAt TEXT,
referer TEXT,
userAgent TEXT);
CREATE INDEX IF NOT EXISTS urlBounceName_idx ON url_bounces (name);
This SQL is actually written to be run on SQLite. There’s a lot I could say about the choice to use SQLite instead of spinning up Postgres, but I’m going not going to say too much here (maybe I’ll lay it out in a future blog post). The simple gist is that when I think about projects I’ve done in the past, I can’t remember one that’s ever grown past a scale that SQLite could (probably) handle. Reading SQLite’s case on when to use it really illuminated the idea that I might not need more than SQLite (especially at the prototype/build phase), and I’m using this project as a chance to really test just how far one can get with SQLite.
That said, I’ve written my software in the usual componentized-interface + implementation pattern (that’s not a real term, I just made it up) – meaning that I have a Backend
typeclass (interface) that has a SQLiteBackend
implementation (so I could easily write a PostgresBackend
implementation if the need to switch ever arises). Yes, this reeks of YAGNI, and in practice, this kind of pattern is rarely ever actually used (not many people seem to ACTUALLY switch their main database around, to almost no one’s surprise) – but in this case I think the abstraction is worth it.
Code that I ended up writing for the backend ended up pretty well split into different concerns/important sections in the files so I’ve split them up similarly in the blog post as well:
The code to make sure API endpoints were actually reachable:
-- ... a bunch of other routes ...
-- ^ URL bouncing related endpoints
type URLBounceConfigsAPIV1 = "url-bounce-configs" :> CookieAuth :> Get '[JSON] (EnvelopedResponse (PaginatedList (ModelWithID URLBounceConfig)))
:<|> "url-bounce-configs" :> CookieAuth :> ReqBody '[JSON] URLBounceConfig :> Post '[JSON] (EnvelopedResponse (ModelWithID URLBounceConfig))
:<|> "url-bounce-configs" :> CookieAuth :> Capture "bounceConfigId" URLBounceConfigID :> Get '[JSON] (EnvelopedResponse (ModelWithID URLBounceConfig))
:<|> "url-bounce-configs" :> CookieAuth :> Capture "bounceConfigId" URLBounceConfigID :> "activate" :> Post '[JSON] (EnvelopedResponse (ModelWithID URLBounceConfig))
:<|> "url-bounce-configs" :> CookieAuth :> Capture "bounceConfigId" URLBounceConfigID :> "deactivate" :> Post '[JSON] (EnvelopedResponse (ModelWithID URLBounceConfig))
:<|> "url-bounce-configs" :> CookieAuth :> Capture "bounceConfigId" URLBounceConfigID :> "info" :> QueryParam "pageSize" Limit :> QueryParam "offset" Offset :> Get '[JSON] (EnvelopedResponse (PaginatedList (ModelWithID URLBounceInfo)))
:<|> "url-bounce-configs" :> CookieAuth :> Capture "bounceConfigId" URLBounceConfigID :> "bounce-count" :> Get '[JSON] (EnvelopedResponse Int)
There are a lot of endpoints here, and implmementation details, but one of the best things about using Servant’s declarative API routing is that for the most part, it’s actually very readable. With that excuse, I’m going to hold off on explaining too much of what’s happening here, and letting you just read for yourself!
Also, you might ask: “Why in the world you allow so much duplication ? Surely you could factor "url-bounce-configs" :> CookieAuth :> Capture ...
out and make this cleaner?!”. You’d be absolutely right – this code is kinda dirty.
The code to do CRUD (mostly creating, reading) operations:
addURLBounceConfig :: b -> URLBounceConfig -> IO (Maybe (ModelWithID URLBounceConfig))
addURLBounceConfig b bounceCfg = maybe (return Nothing) handle (backendConn b)
where
handle c = getCurrentTime >>= \now -> insertEntity_ DBQ.insertURLBounceConfig bounceCfg { bounceCreatedAt=now } c
setURLBounceConfigActivity :: b -> Bool -> URLBounceConfigID -> IO (Maybe (ModelWithID URLBounceConfig))
setURLBounceConfigActivity b bIsActive bid = maybe (return Nothing) (updateSimpleFieldAndReturnEntityOnSuccess_ DBQ.urlBounceConfigsTableName bid DBQ.genericIsActiveFieldName bIsActive) (backendConn b)
findURLBounceConfigByID :: b -> URLBounceConfigID -> IO (Maybe (ModelWithID URLBounceConfig))
findURLBounceConfigByID b cid = maybe (return Nothing) (getRowBySimpleField_ DBQ.urlBounceConfigsTableName DBQ.genericIDField cid) (backendConn b)
getAllURLBounceConfigs :: b -> IO (Maybe (PaginatedList (ModelWithID URLBounceConfig)))
getAllURLBounceConfigs = maybe (return Nothing) (getFullListOfEntity_ DBQ.urlBounceConfigsTableName) . backendConn
findURLBounceConfigByName :: b -> String -> IO (Maybe (ModelWithID URLBounceConfig))
findURLBounceConfigByName b = getRowBySimpleField DBQ.urlBounceConfigsTableName b DBQ.urlBounceConfigsNameFieldName
saveURLBounceInfo :: b -> URLBounceInfo -> IO (Maybe (ModelWithID URLBounceInfo))
saveURLBounceInfo b info = maybe (return Nothing) (insertEntity_ DBQ.insertURLBounceInfo info) (backendConn b)
findURLBounceInfoForConfig :: b -> URLBounceConfigID -> Maybe Limit -> Maybe Offset -> IO (Maybe (PaginatedList (ModelWithID URLBounceInfo)))
findURLBounceInfoForConfig b cid limit offset = maybe (return Nothing) (getRowsBySimpleField_ DBQ.urlBounceInfoTableName DBQ.urlBounceConfigFKName cid limit offset) (backendConn b)
getNumberHitsForBounceConfigByID :: b -> URLBounceConfigID -> IO (Maybe Int)
getNumberHitsForBounceConfigByID b cid = maybe (return Nothing) (getRowCountBySimpleField_ DBQ.urlBounceInfoTableName DBQ.urlBounceConfigFKName (Just cid)) (backendConn b)
This code lives in both Types.hs
(where the typeclass and function signatures are) and SqliteBackend.hs
(where the implementation is). There is lots I could go into about the code, but here are the most interesting (to me) higher level points that should help with understanding it.
You might ask “Why in the world would you still be writing CRUD code for your endpoints yourself?” – That’s a good question.
Backend
typeclassHere are some light notes on the Backend
typeclass and what it means:
b
you see passed in everywhere is the Backend
object itself. Similar but not quite the exact same as when you you get self
as the first argument in methods in a language like python. Somewhere in Types.hs
there is a declaration that goes like this:class Backend b where
getBackendLogger :: b -> Maybe Logger
connect :: b -> IO b
disconnect :: b -> IO b
-- ... lots more stuff ...
Super quick & dirty Haskell 102 (since typeclasses are a littler further ahead than 101 but maybe not quite 201): getBackendLogger
is a Function that “takes” a b
, where b
conforms to the Backend
typeclass , and produces a value with type Maybe Logger
. Read that sentence over and over until it seems to make sense.
Basically, what it’s saying, is that if you give getBackendLogger
, the function, a Backend
, it will give you a thing that is a Maybe Logger
. If you don’t know what Maybe Logger
is, don’t even think worry about it, just think about it in the abstract sense, use a red bowling ball if you want. If you’re interested in what a Maybe
is, checkout this SO post that goes into it a bit. Don’t spend too much time on it though, there’s a thing called Monad
s that are often discussed at the same time as Maybe
because they’re related concepts, and you might be tempted to try and figure out what they are, but that’s a bad move super early on if you’re just beginning. It’s one of those things you have to use a bit, then research/read up on, then use a bit more, then research/read-up/watch videos about/watch lectures about, then fully understand, and feel super comfortable with.
So, with all that in mind, the sentence that describes saveURLBounceInfo
would be something like: saveURLBounceInfo
is a function that takes some b
(that is a Backend
), a URLBounceInfo
value, and produces an IO (Maybe (ModelWithID URLBounceInfo))
value. That’s certainly a mouthful, but once you spend more time with it and start to understand the meaning behind these signatures (and let them sink in even more), the clarity of mind you get is very very rewarding.
One thing that might be confusing to beginners is this IO SOMETHING
thing… Again, this isn’t a monad tutorial, but you can just read that to mean that it represents “an action that when run will produce the SOMETHING
”. So an IO (Maybe (ModelWithID URLBounceInfo))
is “an action that when run will produce a Maybe ModelWithID URLBounceInfo
) value. In this context, it’s obvious what this action is doing from the types alone (even though it could technically do almost anything), it’s putting the URLBounceInfo
you gave it in the database! That’s clearly why what you get out is a Maybe (ModelWithID ...)
, you basically got the ID when you put the row in the database.
The code is pretty easy to read and expressive, so even if you don’t understand these concepts thoroughly don’t despair – you don’t have to worry too much abou this code, that’s my job :).
Maybe that doesn’t mean anything to you? If so I guess now would be a good time to talk about…
Hooking up the code that was actually going to make the routes DO stuff looks something like this:
-- ... sometime after the API declaration ...
urlBounceConfigsAPIServer :: ServerT URLBounceConfigsAPIV1 (WithApplicationGlobals Handler)
urlBounceConfigsAPIServer = allURLBounceConfigs
:<|> createURLBounceConfig
:<|> getURLBounceConfigByID
:<|> changeURLBounceConfigActivity True
:<|> changeURLBounceConfigActivity False
:<|> getInfoForURLBounceConfig
:<|> getBounceCountForURLBounceConfig
-- ... lots of controller-type code in between ...
createURLBounceConfig :: Auth.WAISession -> URLBounceConfig -> WithApplicationGlobals Handler (EnvelopedResponse (ModelWithID URLBounceConfig))
createURLBounceConfig s bounceConfig = do
_ <- requireRoleFromRawSession Administrator s
backend <- getBackendOrFail
added <- liftIO $ addURLBounceConfig backend bounceConfig
case added of -- Oh look, a pattern match on a Maybe!!
Nothing -> throwError Err.failedToAddResource
Just r -> return $ EnvelopedResponse "success" "Successfully created url bounce config" r
getURLBounceConfigByID :: Auth.WAISession -> URLBounceConfigID -> WithApplicationGlobals Handler (EnvelopedResponse (ModelWithID URLBounceConfig))
getURLBounceConfigByID s cid = do
_ <- requireRoleFromRawSession Administrator s
backend <- getBackendOrFail
bounceCfg <- liftIO $ findURLBounceConfigByID backend cid
case bounceCfg of
Nothing -> throwError Err.failedToRetrieveResource
Just c -> return $ EnvelopedResponse "success" "Successfully retrieved url bounce config" c
-- ... more URL bouncing controller code ...
doBounce :: Request -> Maybe (ModelWithID URLBounceConfig) -> WithApplicationGlobals Handler String
doBounce req = maybe (throwError Err.unknownBounce) bounce
where
bounce (ModelWithID cid c) = do
backend <- getBackendOrFail
now <- liftIO DT.getCurrentTime
_ <- liftIO $ saveURLBounceInfo backend (makeURLBounceInfoFromRequest cid now req)
throwError err302 { errHeaders = [("Location", B8.pack (bounceTargetUrl c))] }
These are only a few functions but it gives you the idea of the machinery that connects that user-facing route to the work the backend has to do. This is the “controller”-type code, if you’re familiar with the MVC design paradigm as commonly applied to web servers.
As you can see, this haskell code reads very imperatively, and very clearly as to what it’s doing. Some of the conventions (like _ <- ....
might not be so crystal clear, but it should be clear what the line is at least trying to accomplish.
There’s a bit to sift through, but this is basically what it took to get started with building a semi-complete and semi-production-ready URL bouncer in Haskell with Servant. I hope you enjoyed reading (and didn’t get too lost in the application specifics everywhere) this code.
These days a lot of my haskell code looks different – less do
and more >>=
(basically this is like composing functions together instead of calling them one by one on seperate lines and shufflying inputs around), but trying to get into explaining >>=
and Monad
s and all of it would most certainly make this a monad tutorial and lots more, and lord knows there are enough of those on the internet as is.
It was super fun, and the solution I got at the end doesn’t seem terrible sooooo maybe everything’s good? If everything goes terribly wrong, I’ll make sure to update this blog post!