tl;dr If you/your team aren’t on to the wonders of CI yet, you should check it out. A nice easy way to get started is with Gitlab, which is self-hostable and has a free tier @ Gitlab.com. There is a lot of cool stuff you can do automatically with CI, all you need is some scripting chops and some patience to figure out what works and what doesn’t.

It’s no secret that I’m a huge fan of Gitlab – they don’t pay me (this isn’t “sponsored” content), I’m just a rabid fan. One of the biggest criticisms of Gitlab might be that it’s trying to do too much (including tons of features), but I appreciate the direction they’re going in because they do just about everything they do well at a Good Enough(tm) level, which means I get awesome features like a repo-adjacent docker registry, kubernetes integration, easy to use continuous integration for free, and all in the same tool. There’s no other tool that I more wholeheartedly suggest to up-and-coming or new startups – who says startups can’t have good engineering practices while delivering fast?

In some recent work I’ve been doing with a startup along with on personal projects, I’ve worked through some pretty fun little scripts to do in CI that solved some problems for me, which I thought I should maybe share. I think these are significant because they reduce the need for manual human maintenance/intervention on a codebase, which is what automation is all about:

Recognizing and tagging new versions
Building and publishing an image to a docker repo
Testing out migrations (checking for migration collisions against what’s landed in master)
Deploying to staging
Updating a foreign repo (case study: generating an SDK with Swagger CodeGen)

These tasks are vaguely listed from easy to intermediate, but they only scratch the surface – there’s a lot more possible!

Recognizing and tagging new versions

While there’s alot to be said about versioning (mostly arguments about [semver (Semantic versioning)][https://semver.org/], and why you should/shouldn’t be using it), for most projects (whether personal or corporate), I end up tagging “releasable” versions of code with some vX.X.X type tag. I’ve also experimented with having tags like deployed-production or deployed-staging on faster moving projects, just to keep track of what was actually out on servers. Part of “cutting” a release version usually involved tagging the appropriate commit (usually after bumping all the metadata files in the repo), but I’ve found that I can easily and safely remove that step with CI. The concept is simple: instead of manually tagging versions, just check if a version identified in a metadata file does not have a corresponding tag in the repo from CI. Something like this:

Retrieve version from metadata file
Check git repo’s tags for a v-prefixed tag with the same name
If the tag doesn’t exist, create it

Of course, step 3 requires a bit of spooky action: pushing to a git repo from inside a CI step run on the same git repo, possibly causing a never ending loop of tag pushes (well at least until the second one fails since it already exists). Luckily this isn’t so spooky – as long as you authorize your CI runner to make pushes to your repo, and optionally ensure that the job that’s adding the tags doesn’t run when a tag is pushed, everything is smooth sailing.

Here’s what this looks like for a haskell project I work on:

image: alpine

stages:
  - build
  - test
  - extra

build:
  stage: build
  script:
    - apk --update add make
    - make build # let's say you're using make for your build tool

test:
  stage: test
  script:
    - apk --update add make
    - make test-unit test-integration test-e2e

tag_new_version:
  stage: extra
  only:
    - branches
  script:
    # Install & setup SSH
    - apk add --update openssh
    - mkdir ~/.ssh && chmod 700 ~/.ssh && touch ~/.ssh/known_hosts
    - eval $(ssh-agent -s)
    - ssh-keyscan -t rsa gitlab.com >> ~/.ssh/known_hosts
    - echo "$CI_DEPLOY_PRIVATE_KEY" | tr -d '\r' | ssh-add - > /dev/null
    # Add gitlab remote
    - git remote add gitlab git@gitlab.com:mrman/job-board-api.git
    # Get version, exit early if tag already exists
    - export VERSION=v$(awk '/version\:\s+([0-9\.]+)/{print $2}' job-board-api.cabal) # note that the version is pre-pended with "v" (e.g. v1.0.0) for the branch name
    - export VERSION_TAG_EXISTS=$(git ls-remote gitlab | grep $VERSION | wc -l)
    - test $VERSION_TAG_EXISTS -eq 1 && exit 0 # exit early if the version tag already exists
    # Push new tag
    - git tag $VERSION
    - git push gitlab $VERSION

With this, all you’ll have to update in your actual repo are the metadata file(s) that determine what version your project is at. For node this might be package.json, for Haskell this might be project.cabal, etc.

Pushing to a Docker repo

Making this work is pretty well documented by Gitlab, but I’ll note how I’ve made it work here:

image: alpine

stages:
  - build
  - test
  - publish

build:
  stage: build
  script:
    - apk --update add make
    - make build # let's say you're using make for your build tool

test:
  stage: test
  script:
    - apk --update add make
    - make test-unit test-integration test-e2e

publish_docker_image:
  stage: publish
  image:
    docker
  services:
  - docker:dind
  only:
  - /v[0-9|\.]+/
  except:
  - branches
  script:
    # assuming you have your versions in a file called `.version`), will likely look different for your project (
    - export VERSION=$(cat .version) # VERSION is assumed to look like X.X.X (e.g. 1.0.0)
    - docker login -u gitlab-ci-token -p $CI_JOB_TOKEN registry.gitlab.com
    - docker build -t registry.gitlab.com/group/project/image:$VERSION .
    - docker push registry.gitlab.com/group/project/image:$VERSION

This is pretty much exactly like the Gitlab-provided documentation, with a few tweaks, taking advantage of the $VERSION variable, and ensuring that this step only runs when a non-branch ref (usually a tag) with a vX.X.X-like name is pushed.

Testing out migrations

Now we’re on to some more interesting uses of CI – if you’re using a database for your application, it’s likely that you’re using some sort of migration library or framework to ensure that as you deploy new versions the database changes in an ordered/reliable way into the state you expect it to be in.

There are at least two subtle problems that can happen when working on an application that does migrations on a backend database:

Migrations that are checked in could be bad (not work), especially if they’re supposed to have up and down (forward/backward) components
Migrations that are checked in to a branch could conflict with another branch (or more importantly, master) before they’re merged in

We can actually test both of these things automatically, with the help of a relatively smart CI script. It likely won’t completely fix every problem you’ll ever have with database state, but it can at least give you confidence that if a fresh CI run was forced onto a branch, it won’t make it into master and blow up your deploy process when you try to deploy it.

I ran into this problem when working on a project that used typeorm, an excellent ORM for Typescript. It has a very simple migration model which I love, and it uses timestamps as a simple way of coordinating migrations. typeorm will throw an error (as it should) if it finds a new migration that is older than a previous already-applied migration. Of course, when working in a team or even by yourself, it’s easy to get behind while working and have someone else check in a migration that might conflict with your view of the system, so this is very much a welcomed feature of typeorm.

While this will certainly look different for your project, here’s the basic pseudo code:

image: alpine

stages:
  - build
  - test
  - sanity_check

build:
  stage: build
  script:
    - apk --update add make
    - make build # let's say you're using make for your build tool

test:
  stage: test
  script:
    - apk --update add make
    - make test-unit test-integration test-e2e

bad_migration_check:
  stage: sanity_check
  script:
    -  # find out how many migrations are new, in comparison with master branch
    -  # apply the new migration(s)
    -  # reverse the new migration(s)

migration_collision_check:
  stage: sanity_check
  script:
    - # rename the local migrations folder
    - # git pull the migrations folder from master
    - # find out which migrations are only available locally, merge them into the migrations folder pulled from master
    - # attempt to migrate (assuming your migration tool should error if the new migrations somehow don't mesh with the ones from master)

The actual logic here is pretty difficult to do in consecutive bash/sh statements, so it’s probably better to write a utility script (or Makefile target) to handle this.

Here’s what the bad_migration_check step (checking if the current migrations are valid) looks like for a project of mine that uses Make:

migrate:
  stage: sanity_check
  variables:
    STACK_BUILD_OPTS: "--fast --no-terminal"
    STACK_TEST_OPTS: "--fast"
  script:
    - make local-dev-db

There’s not much to look at, but that step actually breaks down to the following 3 things:

local-dev-db: clean-db create-db migrate-local-db db-fixture-dev

This project actually uses SQLite (at least for now), and actually only does forward migrations, so this is more than enough to test.

Deploying to a staging environment

Moving closer to the current holy grail of DevOps, Continuous Delivery, with all this CI power in your hands, you’re eventually going to want to start deploying to a test/staging environment when something good enough to deploy is checked into the repository. It might have struck you earlier with the use of ssh-agent that you can actually securely SSH to just about any machine you want from CI, given you set up credentials properly. At least for the early definitions of “deploy”, if we can SSH, it’s not hard to imagine a situation where we’re deploying!

There are at least 4 types of deployments that are very easy to do from a CI environment:

Heroku/Dokku-type push-to-deploy
SSH-powered log-on-to-a-machine-and-run-commands
Kubernetes-powered apply-a-configuration-to-a-cluster-of-servers
AWS CloudFormation/Terraform-powered apply-a-configuration-to-the-cloud

As I’ve covered (1) and (2) previously by way of introducing how you would push to another repo or set up ssh-agent, all you need is a little imagination there to start banging out a deployment flow. (4) is also relatively straightforward, given that you write a sufficiently complex script/install the command line tools to control the provisioning tools provided. Kubernetes is actually very well supported/integrated into Gitlab, so the documentation is a great place to start.

After setting up a k8s Cluster, adding the cluster to Gitlab (see the k8s integration documentation), here’s what this would look like roughly:

deploy_staging:
  stage: deploy
  environment:
    name: staging
  only:
    refs:
      - /v[0-9|\.]+/
    kubernetes: active
  except:
    - branches
  script:
    - envsubst < infra/k8s/project.yaml.pre > infra/k8s/project.yaml
    - kubectl apply -f infra/k8s/project.yaml
    - kubectl rollout status deployments/project

Assuming you have your project’s k8s integration set up properly, those steps should setup and apply single resource configuration file from an imaginary, generated infra/k8s/project.yaml. If you want to do better than putting all your secrets in Gitlab secret variables, you can put your secrets in your kubernetes cluster (or some other credential store) and ensure that Kubernetes is configured to look there in the resource config files (i.e. using secretRef env variables from source in your pod config).

Updating a foreign repository after a release

One of the reasons tools like Swagger are becoming more popular is the advanced automation that it allows. Once you go through the trouble of tagging your API with metadata (whether that’s adding @ annotations or maintaining YAML/JSON yourself), you can do things like auto generating client APIs to your service, saving your team (and other teams) lots of time. So where does CI fit in? Well first, you’d probably love if every time you checked in code, the generated swagger documentation was ensured to be valid. To improve from there, it would also be great if when you release a new version of your service, wouldn’t it be nice if the SDKs that should be used to access the service at the version could be generated?.

Here’s what this step looks like for one of my projects (it’s pretty long but it all makes concrete sense):

release_generated_sdk:
  stage: extra
  image: docker
  only:
    - /v[0-9|\.]+/
  except:
    - branches
  variables:
    DOCKER_DRIVER: overlay
  services:
    - docker:dind
  script:
    - apk add --update git openssh gettext make
    # Setup SSH
    - mkdir ~/.ssh && chmod 700 ~/.ssh && touch ~/.ssh/known_hosts
    - eval $(ssh-agent -s)
    - ssh-keyscan -t rsa gitlab.com >> ~/.ssh/known_hosts
    - echo "$JS_SDK_DEPLOY_PRIVATE_KEY" | tr -d '\r' | ssh-add - > /dev/null
    # Get version
    - export VERSION=`make print-release-version` # version without v here for use by swagger (in package.json)
    - 'echo -e "VERSION: $VERSION"'
    # Run swagger code gen, save location of generated code
    - mkdir swagger/swagger-generated-api
    - envsubst < swagger/api-js-config.json.pre > swagger/api-js-config.json
    - |
      docker run --rm -v ${PWD}:/local swaggerapi/swagger-codegen-cli generate \
      -c /local/swagger/api-js-config.json \
      -i /local/swagger/swagger.json \
      -l javascript \
      -o /local/swagger/swagger-generated-api      
    - cd swagger/swagger-generated-api && export GENERATED_CODE_PATH=`pwd`
    # Make a folder in /tmp and push to other repo
    - cd /tmp && git clone $JS_SDK_REPO_SSH sdk-repo
    - cd sdk-repo && cp -r $GENERATED_CODE_PATH/* .
    - git config user.email "user@example.com"
    - git config user.name "CI Bot"
    - git add .
    - git commit -am "Release v$VERSION (automated)"
    - git push origin master
    - git tag v$VERSION
    - git push origin v$VERSION

As you might have noticed, I needed to make a gitlab CI variable called JS_SDK_REPO_SSH that pointed to the remote repo, along with JS_SDK_DEPLOY_PRIVATE_KEY for the key to use with SSH

Wrapping up

So, one of the things I love about Gitlab is the ease with which it brings disciplined engineering to any team. With a docker repo, CI, Kubernetes, and various other features so easily within reach, it’s so easy to start automating parts of the process, and that’s awesome.

One of the only downsides to working on this was dealing with super long build times with Haskell… A problem I’m going to revisit someday hopefully soon.

VADOSWARE

Living in a yak shaver's paradise.

Fun with Gitlab CI

Categories