The Deployment Spectrum

Categories
servers

tl;dr - Manual SSH, then automated SSH (ex. Ansible) and friends (Salt/Puppet/Chef), Cloud-init (AKA cram-it-in-userData), pre-built VMs (ex. Packer), infrastructure-as-code (CloudFormation, Terraform, Pulumi, etc) and finally containers and container orchestrators (ex. Kubernetes, Nomad)

The trends in application deployment have been pretty identifiable over the years and I rarely see it discussed so I figured I’d take a stab. I gave a similar but different presentation in the past on the evolution of the backend back in 2019, but I want this post to really focus in on the deployment aspect. The evolution of application deployment mechanisms is fascinating and while it’s historical it’s also very much related to the present – so much so that I think of it as a current spectrum – the “Deployment spectrum”.

Unlike others in my field I can’t say that I’ve been here since the start of computing (that’s how young this field is – imagine being able to say you were there “since the beginning” of medicine or physics, etc), but I’ve certainly seen enough paradigms come and go to write about some of them.

Context: Why do people even use servers?

The concept is really quite simple, allow me to bore you with a rough analogy. Let’s imagine you’ve put together a cool website for showing cat pictures that you want to show to everyone else (in the world). You have a laptop computer which you can run the website on, but there’s a problem – you turn the computer off when you go to sleep so the website will be sleeping when you are! If you want a computer that’s always up, maybe the one at your home that you shut down every night isn’t the best bet! So you pay a stranger that doesn’t go to sleep to use their computer that’s always up, so everyone else in the world can always get their cat pictures from your awesome site. *SSH is the equivalent of you being able to sit in their house, at that spare laptop, and run commands at “your” (rented) computer.

Today we call these “bare metal servers” but what the “server” looks like can range from being a “dedicated” server (full access/control over a computer sitting on someone’s shelf) or a Kernel Virtual Machine(KVM)-supporting Virtual Private Server (VPS) (a virtualized segment of a computer sitting on someone’s shelf) as well. There are even some setups where you’re given limited access to an existing computer, with other users (almost like the early time-sharing models) and the only thing stopping you from deleting other people’s files/data are the permissions of the operating system you’re running on – an example of this is libvirtuozzo-powered shared hosting (think “wordpress hosting”, etc) I’ll refer to these as shared-kernel (think users on the same laptop).

Said another way:

  • “VPS” -> Shared kernel container-like existence – libvirtuozzo (?), you can rarely install new programs but you can usually use what’s already installed (ex. Apache, or PHP)
  • “KVM VPS” -> Virtual machine powered by the linux kvm kernel module and associated machinery (this stuff is fascinating)
  • “dedicated server” -> complete control to a physical machine that someone has hooked up

Anyway, now that we’ve discussed a little bit of the landscape let’s get into the ways we manage these servers.

SSH based deployments

Well now you’ve bought a machine (whether dedicated or shared-kernel) – how do you get your awesome website from your computer to that computer that’s always up? Well you can use this nifty tool called ssh which gives you terminal-level access to the always-running machine. There are other tools out there for making this connection or giving you access to remote machines but SSH is almost certainly the most commonly used.

How would you set up your website on this new computer with this knowledge? Well it’s easy – just log in to that far-away computer and run all the stuff you set up to get things working locally! Type out all the same files (or if you’re smart, scp them), and make this far-away computer look enough like yours for your site to run. This usually means:

  • Installing a web server (ex. nginx, apache)
  • Copying over the files for your website
  • Ensuring NGINX always runs (sometimes that always running computer may reboot!)
  • Setting up a firewall so the only traffic allowed in is for your website and crucial services

How you accomplish these tasks is up to you completely (there are lots of choices to make), but it all happens at the command line, essentially pretending that you’re there sitting in front of that computer, and installing things as you see fit.

Pitfalls

SSHing in and running commands is really easy and very powerful, so there are the requisite responsbilities to be aware of, the most important one being that you are in charge of your security, and “hardening” your machine to be at least harder for people to attack. A few key points that generally cover it

  • Disable password-based SSH access, only use SSH keys
  • Close all unused ports, you should be able to look at top and have <30 things
  • If you run an email server, make sure you’re not an open relay
  • install ufw or an equivalent firewall
  • Install fail2ban
  • nmap (port scan) your machine from the outside – every program that is accessible from the outside world is a vulnerability (yes, even sshd though compromises are rare)

It’s your server, so if you want the organization to be an absolute mess or install and uninstall things willy nilly, it’s up to you (unless of course you’re on a non-KVM VPS) – security when dealing with the external world is usually most important.

Automated (SSH-based) deployment

Of course, once you SSH into and upgrade/fix one or more machines more than 5 times, you start to get ideas. Ideas about a day when you don’t have to do that anymore, and someone else can do it – or even better, a machine can do it! You can get going pretty quickly with bash scripts and other things, and people did this for a while, but some tools sprung up in higher level languages to make the process a little more declarative – Ansible is one of those tools.

Ansible does imperative systems management (you tell ansible what commands to run in what sequence), but it starts to raise the level of abstraction around the operations themselves – machines have “roles” and tasks can be grouped, etc. Ansible also does some provisioning (check out the listing) but we won’t talk about that just yet.

Cloud Init

SSHing in and doing lots of steps to configure your computer is one thing, but what if you could instead run those scripts on a machine just after it booted? You could save yourself some time sitting waiting for ansible to finish if you compressed those commands down to a few lines, for example:

sudo apt-get install nginx
# let's say your site was available somewhere online
scp -r user@some-other-machine:/path/to/your/website/files /var/nginx/html
sudo systemctl enable nginx
sudo systemctl start nginx

I’ve left a little bit out of this (scp needs an identity to access the remote machine, etc), but you can imagine what this looks like. Why bother with ansible if you know that all you really want done is the above four lines and you’ll never log into the machine again?

Lucky for us, a standard called Cloud Init (cloud-init docs) came along and made this very easy. Cloud Init can be used to provision disks, set up users, run scripts, set up networking and tons of other things that people often want to do right after booting a VM – and it has been adopted far and wide (even Alpine Linux supposedly now supports it!).

This is probably one of the first steps towards the coveted immutable infrastructure ideaology, but we won’t talk about that just yet.

A Shift: VM technology becomes easier to run locally

A confluence of things probably contributed to this shift, but once the ability to virtualize machines (KVM technology) became more widespread and understood, people began to think a different way to deploy. What if you could build that “remote laptop” your server provider was running for you, and give it to them to run? If you can build a virtual machine on your machine, SSH into it and completely prepare it – you could give a serialized version of that virtual machine (a virtual machine “image”) to your cloud provider and they could run it!

Using our previous methods, we can do something like this:

  • Start a virtual machine on your own computer
  • SSH into that machine (a virtual computer, running on our “real” computer)
  • Run SSH commands or use Ansible to run the setup for your application

One of the tools people used to really get this going was Hashicorp Vagrant – it made running local VMs much easier (similarly, VirtualBox), which means it was easy to get the power of the cloud “at home”.

Pre-built VMs, and the rise of Immutable Infrastructure

OK so building images is one step, but some clouds of course don’t support uploading raw images, so what do you do? Well, you use a tool like Hashicorp Packer and you build your VM in the cloud, and use your cloud’s support to save the image. Packer was (and still is) one of the biggest tools in this space and is fantastic (Hashicorp is awesome).

The idea has a easily recognizable name at this point, but I think this is the kind of change in thinking that gave rise to “Immutable Infrastructure”. Packer is still an active choice in many stacks and has a place in 2021. While this approach helps get one started with immutable infrastructe, note that it is not a full-on infrastructure-as-code approach – you’ll still need to click around in a console or two to create things like DNS records and load balancers if you’re only using Packer.

Obviously VMs that work as soon as they boot are preferable to those that need to be set up after being instantiated.

Automated Fleet management with Puppet/Chef/Salt

While up until now I’ve only really discussed Ansible, there are a bunch of other well known tools that popped up at around the same time and perform a somewhat similar function:

These tools all offer way more than ansible (not including AWX or Ansible Tower), and they can do continuous deployment of your machines and VMs as well.

While these tools do infrastructure-as-code and various advanced features now, as I remember it they were mostly upstaged by a relatively new tool in the field…

Infrastructure as Code

CloudFormation

Probably not the first tool that comes to mind with regards to infrastructure as code but CloudFormation was one of the first on the scene – it just took a very regrettable route in putting too much of it’s functionality inside YAML, including dynamic operations specified with YAML. I won’t spend too much time on this, but early on, CloudFormation was an advanced way to do repeatable infrastructure, came with CloudFormation Designer which promised to free us from writing said YAML (and didn’t really deliver).

Terraform

Terraform burst on the scene and offered a way to deploy your non-machine infrastructure (DNS records, load balancers, etc) in an automated manner, with a slightly more fitting DSL – HCL. HCL was a breath of fresh air compared to CloudFormation YAML and promised to work on more than just one cloud, so it was quickly adopted by the community. Now, when deploying some infrastructure, you could “simply” run terraform up and the requested resources would be deployed!

With A pre-built VM (that’s already been pushed to AWS) you’re 90% of the way to the ideal automated setup. Even if you weren’t quite there yet as far as building VMs, userData was always available as well.

Pulumi

Pulumi fits in the exact same conceptual space as Terraform but offers a few very important features:

  • Infrastructure as Code as code (that is to say, no DSLs)
  • An Automation API
  • Pure code abstraction methods (custom resources, rather than writing custom providers)

This is enough to make it what I use (over Terraform) for my projects and my most highly rated infrastructure-as-code project. Being able to call on the entire ecosystems of languages like NodeJS (Typescript), Python, and Golang makes a huge difference. Other groups have noticed how useful this is and have released similar tools (ex. the AWS CDK), but Pulumi was one of the first to realize how important having a code-driven interface was.

ASIDE: Are we at a local maximum for Ops?

For the consciencous operations engineer who has all these tools firmly grasped, a cluster of servers under management to serve a website (let’s say funnycatpictures.com) might entail wrangling all of these tools to bear:

  • Pulumi for managing infrastructure
    • DNS records
    • Load balancers
    • Cloud memory stores (AWS ElasticCache, GCP Memstore, Azure Cache for Redis)
    • Cloud data stores (AWS RDS, GCP Cloud SQL, Azure PostgreSQL
  • Packer for building VMs (AWS AMIs, GCP Machine Images, Azure images)
  • Ansible for rare emergency rollout situations (let’s imagine there’s a situation where you absolutely can’t rebuild the VM)

So you might ask yourself, is that it? are we at a local maximum for ops? Well no, not quite yet – there’s one more piece that is conspicuously missing that is very common these days.

Containers for higher density deployment

The next big wave to hit the deployment space is the popularization of containers – once you’ve got VMs deploying to the cloud, there’s a question of what do you run on them and how you maintain them, and containers offer a very compelling answer. The conscientious engineer will note that “containers” are really just namespaced and sandboxed processes – the end result of these applied kernel features being easier-to-deploy applications. Developers can package their dependencies right alongside their apps, and now the operations people don’t have to maintain multiple versions of python or node or other system packages and libraries to ensure that deployed programs will run.

Containers are convenient because they offer consistent and relatively uniform packaging for applications, and also to run packages – a VM with NGINX running on it listening at port 443 (containerized, possibly) with one or more running container instances of an application listening at ports 3000 through 3003 would easily constitute a load-balanced web-tier application in the 3 tier application style. If you are OK running your database with your application you can even pop a postgresql:alpine container on there and get your database running right along side.

After getting used to running containers, the natural next step is to be able to run fleets of containers – or to treat a group of machines as just a pool of running containers – being able to have a container run without being too occupied with which machine it runs on. Since the “old” Docker Swarm has been end of lifed (I haven’t gotten a chance to try out swarmkit), we have to look elsewhere…

Orchestrating your containers: Nomad & Kubernetes

Nomad and Kubernetes both offer answers to the container orchestration problem – once you have either of these systems set up across a pool of machines you can (more or less) give the systems a container to run, and some information about how to run it and you’re off to the races.

I’m biased towards Kubernetes, and haven’t given myself enough time to take a stab at running and maintaining Nomad (though I hear great things about it, Fly.io is built on Nomad). For either of these projects taking time up front to really understand the concepts (and the world that made them necessary) is very important – along with taking your time to feel around the solution and stand up/tear it down a f ew times. Things can and will go wrong but if you’ve got a good understanding of the underlying fundamentals of the problme and system you’ve chosen you should be alright, and be able to perform dynamic debugging

Wrapup

Well I hope you enjoyed this not-quite-chronological writeup on the history, major paradigms and current options in server deployment. Feel free to send me a diatribe on how I’m wrong if I am, I’d love to hear it.

Like what you're reading? Get it in your inbox