Ansible Is Awesome

Ansible is just the tool I was looking for, for my current level of ops mastery.

vados

14 minute read

tl;dr - There are lots of ways to get smarter about how you deploy. Ansible is one choice, it’s not the fanciest, but it’s amazing.

NOTE This is not an introduction to ansible, please check out their official documentation if you want that.

For most of my projects, I use a GNU Make Makefile based build process. I do that because it’s cross-platform, pretty well suppported/known (for people who build software), and easy to standardize on no matter what project I’m working on. For example, if I’m working on a JS project, and have a bunch of grunt tasks, I proxy the real important top-level ones through make targets, so that I have a unified flow like make build that matches another project that might be in some other language (which doesn’t have grunt). It’s another layer of abstraction much of the time (as people love to build tools for their favorite languages), but for me it’s worth it.

(Necessary?) Context: All aboard the container hype train

Over the last year, I’ve boarded the container hype train (All the innovation enabled by LXC: Docker, rkt, etc), and am using containers to deploy my applications. This hype train was indeed worth getting on, as I’ve found that as more and more languages include great libraries to run servers of various kinds, reducing my deployments to a light virtual machine that just runs a server I wrote for the program is getting easier and easier. Switching to container-driven deployment processes have saved me from having to wory about putting my source code on the server, setting up dependencies, and doing a lot of other tedious work that’s differed from language to language (i.e. pyenv/virtualenv/bundler/npm/go get/etc). Now I just put a container on the server that runs a program that listens on port 5000 (or whatever), and as long as some other program on the server is listening on port 80 and forwarding traffic to port 5000, I’m good. The simplest instance of this means I just run NGINX, tell it that there’s something at localhost:5000 that it should proxy to, and I’m off to races.

I currently use Docker as my container runtime of choice because I tried rkt once (twice? maybe once), and couldn’t get the hello world example to run. There are features of rkt that make it attractive to me, but I didn’t (and still don’t) have the patience to work with software that makes me scratch my head at “Hello World” unless it’s the only option. Since I already had experience with docker and it’s got some very good ergonomics, I stuck with it. I’m definitely glad that there’s an alternative out there though, should help to keep the Docker crew honest/on their toes and innovating. Maybe I’ll try rkt again in the future.

Things I ran into while boarding the hype train

  1. Don’t use devicemapper if you have any other option. One of my servers runs ArchLinux and I was able to switch to the overlay file system, and my life managing docker containers became so much better.

  2. Union filesystems are super cool. If I were to give you a bad, probably wrong explanation, they basically express your hard-drive state as a series of deltas build on top of one another. If you start at state 0 with an empty hard-drive, and create a file (let’s say “README.txt”, a new state is built (let’s call it state 1), which says “take state 0, and add README.txt”. From an efficiency point of view there are some drawbacks, but the deduplication features and the ability to do things like going back in time (just going to a previous state of your hard-drive) are amazing.

Union based file systems are also one of the only good answers to those crypto locker schemes that are running around the internet today – If someone tried to encrypt your harddrive, and they started yesterday, theoretically all you’d have to do is roll back the contents of your hard-drive! Obviously if the attacker compromised your ability to rollback or messed up the harddrive management code in some way then you couldn’t, but just as read-only access and hardware-protections are options to mitigate. Anyway, this article isn’t even about filesystems so I’ll stop there.

(Necessary?) Context: One step/push-button deploys are really important to me

In my opinion, If you’re doing things right in the software world in 2017, your application should be deployed in the span of ONE button press, or shell command. For me, on many of my projects right now, that “one step” looks like this:

make build deploy SSH_ADDR=<server ip>

As previously described, I use make a lot across my projects, and I generally try to make this line do all necessary things that builds the application, and deploys it, using whatever means necessary. Here’s what this looks like for this blog itself:

  1. Build the application with Hugo – the static site contents is spit into a local folder
  2. Using a Dockerfile stored in the repo, build a docker image based on NGINX with the static site contents
  3. Push the image to the remote server
  4. Replace a possibly running container on the remote server with the new container, build with the new image

I should probaby go into how exactly I do this in another post (bits are sprinkled around this blog already) but that’s talk for another day.

Either way, the steps above can very easily differ from project to project, so it’s really important to me to be able to have a command that just does what needs to be done, end of story (unless something goes wrong of course).

Context: What it takes to set up a server

If you go out and get a VPS (I use a company called INIZ), or get a super cheap dedicated server like I did recently, There’s a bunch of things you need to do to get that server to a state where it can be used for running applications (and other services, like email) in production in a reasonably secure/responsible manner. Here are just some of the things:

  • Disable Password login/remote access to root in favor of SSH keys
  • Install Fail2Ban
  • Update sudoers
  • Update the machine’s packages (whether apt-get or pacman)
  • Install docker
  • Set up users (if you want access control for apps at the user level)
  • Set up email
  • Set up databases that will power your apps
  • … more things …

Whenever there is a thing to do, there is more than likely a hierarchy of how to do it intelligently. I refer to this as “operational intelligence”.. If we take “setting up a server” to be the thing to do, here’s are some of the different levels of the hierarchy of operational intelligence I’ve seen (from what I’ve seen at companies to what I do in my own projects, to what’s out there that I know of):

  • Make Joe from Server Operations do it Personally ask “someone in ops” (Joe) to set up a server
  • File a ticket to Susie in Server Operations File a request “someone in ops” (Susie, this time) to set up a server every time you need it
  • Do it yourself SSH into the machine and set up the server yourself manually
  • Let a program do it Use a bash script/makefile/whatever to SSH in and set up the server
  • Let a program someone else wrote do it e.g. Fabric - arguably the same as the previous, plus some ability to leverage shared libraries and deduplicate work + the general advantage/disadvantage of a fully featured programming language
  • Let a program lots of people work on do it e.g. Ansible - More on the shared libraries, lots of built in components, and support for many features you might have had to write in python yourself, were you using Fabric. TONS of modules for working with things like systemd that you just can use without thinking about
  • Let an “enterprise-scale” program do it e.g. Chef/Puppet/Salt - A (somewhat opinionated) step up from Ansible, but these tools try to manage your infrastructure (notice the buzzwords that have started to appear), at enterprise scale. This manifests itself often by needing to add a server that just sends commands (though these tools don’t necessarily require that).
  • Put it in “the cloud” e.g. AWS, Google Cloud Platform, Azure all have offerings that offer to put your app up, as long as you learn how to click/type through their interfaces, and pass them the zip archives or source code you have created.

NOTE: Getting to the level of let the cloud do it starts to bring about this really interesting concept of treating your services/servers as cattle NOT pets.

  • Manage the atmosphere that creates “the cloud” e.g.Terraform - Terraform is similar to the previous few steps (you could use ansible or fabric to manage AWS, for example), but different in that it is at a higher level of abstraction. Terraform will deploy to wherever you want it to. If AWS is “the cloud” I can only assume tools like Terraform manage the planet, or the atmosphere (which is kind of why they named it Terraform I believe). Also side note, Hashicorp makes some amazing tools.

Looking at options for my own projects

So with this knowledge of what it takes, and what’s out there, I started to research my options for a project I’ve been working on (and my projects going forward). Since this is basically a chance to rethink how I deploy, I started to look at how I could get as close to the “servers as cattle” mantra as possible, because it seems like the way forward.

I started mulling over a few things:

Fabric/Ansible - Mentioned above, I actually had a bunch of experience with Fabric (and last time I used it I was at the point where I started imagining how to extend it and build a bigger provisioning tool out of it) at the time of making this decision, but was a big question mark.

NixOS - “Nix” can refer to a bunch of things – an OS, a package manager, and a programming language – but it’s based on reproducability, and some of that cool union filesystem stuff (and bunch of other cool concepts), and seems to be just about the pinnacle of if-it-works-once-it-works-forever life.

Terraform - I covered it a little bit before, but I really like the idea of a meta tool that manages my cloud for me.

Choosing Ansible

In the end, I went with Ansible, for a few reasons:

Reason #1: I didn’t want to spend the next week/few weeks understanding and learning to properly use Nix. I wanted to get up and running quickly, and unfortunately that meant trading the features and benefits of Nix for something a little closer to what I was familiar with. While it’s spoiled of me to expect/require this, I found it harder to get started with Nix than spending 10 mins doing a “hello world”. That’s often the level of patience I have with relatively-mature open source projects these days. If you want me to get on your hype train (it’s fine if you don’t, you probably don’t want too many band-wagoners on your train), shorten the on ramp. I still firmly believe that I’ll be revisiting Nix or something like it in the future, because it really seems like the way forward – determinism is one of the sexiest things in computing… That feeling of something working, then working the same every time is amazing.

Reason #2: Python’s pretty good for scripting/getting dirty, when you have to. I no longer expect anything to go smoothly, so I prioritize solutions that degrades in the user-failure case to something that’s full-featured, powerful, and that I likely already understand already. Consider how using AWS degrades - If something goes wrong while you’re working with AWS, you can:

  • Start checking AWS documentation
  • Use AWS CLI tool
  • Google AWS issues
  • SSH in (if you could get the server started) and do stuff
  • file a support ticket

That’s just an example of what I mean by considering how some tool degrades in user-failure cases. Using that analogy, I like how ansible degrades, because I have two steps:

  • Read the ansible documentation and figure out how to do what I want axiomatically
  • Write a hack in python to get it done, despite it being the wrong way

Reason #3: It’s got more stuff built in than Fabric. For example, Managing systemd was/is a big part of my management flow – and ansible has great support for managing systemd.

Reason #4: It doesn’t require a base-server, but has the option if you need it. I’m a super small operation, I only have one server to manage – I like that ansible scales up if I need it to, but starts small and simple, with one server.

Starting to use ansible, and loving it

After all this text, we finally start getting to the real content – I found Ansible to be VERY enjoyable to use. After casually reading through the awesome documentation, I started trying to get it set up on my servers, which required reaching out to a few other sources:

These articles helped me solidify my knowledge of the ansible, in particular the directory structure and what the files were supposed to look like. Once I became comfortable with those aspects of the workflow, my productivity skyrocketed. Here are just a few reasons why I love it:

  1. It Just Works ™ - honestly just half of the things that I tried just worked.
  2. Configuration files are very readable, folder convention is not over-the-top
  3. Module configuration specification is really succint and makes sense - want to daemon-reload systemd? ez pz, add a daemon_reload: yes to some other command and it’ll get done.
  4. I can run the commands as much as I want and things mostly just work (idempotency), especially if you write your tasks carefully.
  5. Docker suport is a thing. Check out the docker_image module and the docker_container module.
  6. Excellent documentation for just about every module putting in examples are a lifesaver.
  7. On the soft side, it just feels productive. I took a break to sort of let everything sink in, but I haven’t really felt like I was druding into a swamp of complexity. Adding and changing stuff has been pretty easy.

Some rough patches

Of course no tool is without it’s warts, and here are some things I found while using ansible I was less than excited about:

  • It was a little confusing at first to figure out the relationship between a playbook, a piece of “inventory” a role, and the actual tasks, and templates that were being run/used. If I were to try and put it succintly:

A Playbook is a recipe for transforming a piece of inventory into the state you want it to be in. “The state you want it to be in” is defined by the playbook by specifiying Roles you want that piece of inventory to play. Roles have tasks assigned to them, which are what needs to happen for the server to play the role (i.e., it should have nginx installed, it should have the web-app files, etc) Templates can be used in tasks (but don’t have to be).

Wrapping up

I was so happy using ansible that I wanted to make this blog post. Now that I’ve spilled those happy feelings all over the internet, I’ll leave you with a small (old, so pardon if it’s missing some key conventions) configuration I wrote that showcases how simple it was for me:

---

- name: login to the docker registry that contains the webapp container
  shell: docker login -u gitlab-ci-token -p "{{registry_access_token}}" "{{registry}}"

- name: get version {{webapp_version}} of the the-start-webapp container
  docker_image:
    name: "{{webapp_container_name}}"
    tag: "{{webapp_version}}"
    state: present

- name: add webapp systemd service
  become: true
  template:
    src: webapp.service.j2
    dest: /etc/systemd/system/webapp.service

- name: start & enable the webapp systemd service
  become: true
  systemd:
    name: webapp
    state: started
    enabled: yes
    daemon_reload: yes

As you can read, this short list of tasks (which was stored @ roles/webapp/tasks/main.yaml in my infrastructure-management repo), does everything necessary to get a web app started, leveraging systemd for starting and management later. I wrote this in minutes with a little bit of alt-tabbing back and forth to the documentation, and it just worked. That was huge for me.

Of course, when I logged into the server to confirm (trust but verify), I found an error, but it was actually my fault! the registry name was invalid. I actually believed in ansible enough that I took down the running production container and re-ran the ansible task to put it back after fixing it. Of course, the application isn’t highly used, and in general you don’t want downtime but I was feeling particularly scrappy at that moment so I did it (no ragrets). There’s a bit more I could say about it, but I think that shows just how much I now trust this tool in my toolbox.

Did you find this read beneficial? Send me questions/comments/clarifciations.
Want my expertise on your team/project? Send me interesting opportunities!