Better K8s Monitoring Part 1: Adding Prometheus

Adding better monitoring for applications running in my k8s cluster using Prometheus.

vados

10 minute read

It’s been a while since I learned of the wonders (and cleared up my misconceptions) of dedicated hosting and set up a “Baremetal” CoreOS single-node k8s cluster. For a while now I’ve maintained a single large (by my standards) machine that has been running Kubernetes, and purring right along – outside of the occasional restart or operator error, it hasn’t gone down and has kept my applications running. While most of the applications don’t get much traffic (as most are projects that haven’t launched), it does keep up some more important things like my mail servers for some domains I hold.

tl;r Setting up bespoke Prometheus on my k8s cluster was pretty easy, read on to find out how I did it.

Observability of infrastructure is critical to being good at ops, and by extension dev-ops. While I don’t any particularly “web-scale” applications to run, the leverage that Docker + Kubernetes have given me have allowed me to start worrying now more about how to maintain infrastructure reliably. Now that I have a reliable, scalable, and easy abstraction for running X applications on N nodes in a sane way, I am freed up to start thinking of the next set of questions – things like “I know the application is running, but what’s happening to the requests that are hitting it?” or “where are the bottlenecks in my application?”. I can’t stress how nice it is to move on to thinking of these questions without worrying about what’s below (getting the X applications running on N nodes).

This is going to be a series of posts where I seek to address those questions on my tiny cluster – improving monitoring (AKA observability) for my cluster:

  1. Part 1 (this post)
  2. Part 2: Setting up the EFKK (Elastic Search + FluentD + Kibana) for logs

This post will cover the first step – adding Prometheus to my tiny cluster, and getting applications to start sending up some metrics.

SIDE NOTE I’m intentionally approaching this as a series of blog posts where I set up the individual pieces of observability using one of the applications I’m developing. There are solutions out there that can do all of this for you, with a kubectl apply of a single file, but before I reach for solutions like those (in particular Istio), I like to set up the components myself to get a feel for how they run and get comfortable with at least the basic concepts they’re involved with. I will likely also do a post at the end of the series where I kick the tires on Istio and see just how easily it can do every

Step 0: Prep & RTFM

The metrics collection/monitoring space has both immense depth and breadth. There are tons of tools that this post could have been about, and tons of methodology, configuration, processes that could have been discussed. Every tool has it’s tradeoffs, and there is a wealth of information (and misinformation) out there on the various tools in this space. I was initially kind of confused when first reading about metrics collection systems and prometheus – I was unsure if I could use prometheus for alerting, or if I would need to perform my own averaging before inserting metrics. A proper read through of the documentation answered these questions though (and a little bit of googling to see what prometheus’s competitors are and how it fits in the ecosystem).

Since I’m using Kubernetes, I’ve chosen to try out one of the CNCF’s banner projects, which has been touted to work very well with Kubernetes – Prometheus. The plan is simple – run Prometheus in a DaemonSet on the cluster, and ensure that pods (and the containers therein) have access to the Prometheus instance and can write metrics to it.

Caveats - I’m not using statsd but I could (and arguably should) be – logs are viewed currently by going to pods one by one (and errors are sent over email). - I’m only running one node, so I’m not employing a High Availability setup – if the one Prometheus client goes down, I’m SOL

The first obvious step is to read the documentation for Prometheus. It’s OK, I’ll wait.

Step 1: Adding Prometheus to the cluster

The most important Kubernetes Resources (feel free to revisit on the [Kubernetes concepts][k8s-concepts]) I think I’ll need are:

One important thing to note (that I realized shortly after typing the above two bullet points) is that since it’s a DaemonSet you actually don’t need a persistent volume claim – hostPath will work just fine since you’re guaranteed to be a single instance on all nodes.

This part is pretty easy, I only need to write the Kubernetes resource configuration that should spawn the resources I want, and then I can spin up Prometheus and see that it’s accessible internally (and for example displays a non-broken UI).

Here’s the first set of resource configuration files that I got to kubectl apply:

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
  namespace: kube-system
data:
  prometheus.yaml: |-
    global:
      scrape_interval: 15s

      external_labels:
        monitor: prm-monitor

    scrape_configs:
      - job_name: prometheus
        scrape_interval: 5s
        static_configs:
          - targets: ['localhost:9090']

---
apiVersion: v1
kind: Service
metadata:
  namespace: kube-system
  name: prometheus
spec:
  selector:
    app: prometheus
  ports:
    - protocol: TCP
      port: 9090

---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: prometheus
  namespace: kube-system
  labels:
    app: prometheus
spec:
  selector:
    matchLabels:
      name: prometheus
  template:
    metadata:
      labels:
        name: prometheus
    spec:
      containers:
      - name: prometheus
        image: prom/prometheus
        ports:
        - containerPort: 9090
          protocol: TCP
        resources:
          limits:
            memory: 4Gi
          requests:
            cpu: 200m
            memory: 1Gi
        volumeMounts:
        - name: prometheus-config
          mountPath: /etc/prometheus
        - name: var-log
          mountPath: /var/log
        - name: var-lib-docker-containers
          mountPath: /var/lib/docker/containers
      terminationGracePeriodSeconds: 30
      volumes:
      - name: prometheus-config
        configMap:
          name: prometheus-config
      - name: var-log
        hostPath:
          path: /var/log
      - name: var-lib-docker-containers
        hostPath:
          path: /var/lib/docker/containers

As you can see, theres a ConfigMap to handle configuring Prometheus (this is actually in a different file entirely, but included here for copmleteness), a Service to expose Prometheus to the kube-system namespace (at least the admin interface) and a DaemonSet to run the actual thing.

While this configuration was sucessfully applied, the container crashed, here’s what the logs looked like:

level=info ts=2018-02-04T11:39:54.584738852Z caller=main.go:225 msg="Starting Prometheus" version="(version=2.1.0, branch=HEAD, revision=85f23d82a045d103ea7f3c89a91fba4a93e6367a)"
level=info ts=2018-02-04T11:39:54.584781969Z caller=main.go:226 build_context="(go=go1.9.2, user=root@6e784304d3ff, date=20180119-12:01:23)"
level=info ts=2018-02-04T11:39:54.584802233Z caller=main.go:227 host_details="(Linux 4.14.16-coreos #1 SMP Thu Feb 1 20:38:35 UTC 2018 x86_64 prometheus-gfjdm (none))"
level=info ts=2018-02-04T11:39:54.584817176Z caller=main.go:228 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2018-02-04T11:39:54.586999373Z caller=main.go:499 msg="Starting TSDB ..."
level=info ts=2018-02-04T11:39:54.587056932Z caller=web.go:383 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2018-02-04T11:39:54.592341395Z caller=main.go:509 msg="TSDB started"
level=info ts=2018-02-04T11:39:54.592375106Z caller=main.go:585 msg="Loading configuration file" filename=/etc/prometheus/prometheus.yml
level=info ts=2018-02-04T11:39:54.592399828Z caller=main.go:386 msg="Stopping scrape discovery manager..."
level=info ts=2018-02-04T11:39:54.592410242Z caller=main.go:400 msg="Stopping notify discovery manager..."
level=info ts=2018-02-04T11:39:54.592418388Z caller=main.go:424 msg="Stopping scrape manager..."
level=info ts=2018-02-04T11:39:54.592430374Z caller=main.go:382 msg="Scrape discovery manager stopped"
level=info ts=2018-02-04T11:39:54.59244178Z caller=manager.go:59 component="scrape manager" msg="Starting scrape manager..."
level=info ts=2018-02-04T11:39:54.592706692Z caller=main.go:396 msg="Notify discovery manager stopped"
level=info ts=2018-02-04T11:39:54.592754015Z caller=main.go:418 msg="Scrape manager stopped"
level=info ts=2018-02-04T11:39:54.592897677Z caller=manager.go:460 component="rule manager" msg="Stopping rule manager..."
level=info ts=2018-02-04T11:39:54.592936645Z caller=manager.go:466 component="rule manager" msg="Rule manager stopped"
level=info ts=2018-02-04T11:39:54.592960792Z caller=notifier.go:493 component=notifier msg="Stopping notification manager..."
level=info ts=2018-02-04T11:39:54.592989281Z caller=main.go:570 msg="Notifier manager stopped"
level=error ts=2018-02-04T11:39:54.593063704Z caller=main.go:579 err="Error loading config couldn't load configuration (--config.file=/etc/prometheus/prometheus.yml): open /etc/prometheus/prometheus.yml: no such file or directory"
level=info ts=2018-02-04T11:39:54.593121987Z caller=main.go:581 msg="See you next time!"

Well it looks like the error is pretty obvious – /etc/prometheus/prometheus.yml didn’t get mounted properly – the typo strikes again! (I used .yaml instead of .yml).

After fixing that small typo and a copy paste error, Prometheus is running!. Again, this is only SO easy because of the giants were standing on. Docker + Kubernetes are coming in big here to make deploying this fully featured application super duper easy..

All that was left was to do the steps in the Getting Started guide and get more intuition with using Prometheus.

Step 2: Instrumenting an application to throw metrics at Prometheus

I am currently working on a Haskell application that powers a job-board (one of those un-finished projects I was talking about earlier). Luckily for me, someone has already done the hard work of writing a prometheus library and I just have to use it.

I chose to use the wai-middleware-prometheus library since it would be light on the code required, and since I use Servant, which runs on WAI, I could get access to a bunch of metrics relatively easily – like request latency (which is pretty important).

Here’s the code it took to make it work (note that some has to do with how my code is structured and isn’t very general:

{-# LANGUAGE OverloadedStrings    #-}

module Metrics.MetricsBackend ( makeConnectedMetricsBackend
                              , MetricsBackend(..)
                              , Metrics(..)
                              ) where

import           Config (MetricsConfig(..))
import           Data.Maybe (Maybe, fromMaybe)
import           Data.Monoid ((<>))
import           Network.Wai (Application)
import           Network.Wai.Middleware.Prometheus (PrometheusSettings(..), prometheus, instrumentApp, instrumentIO)
import           System.Log.Logger (Logger, Priority(..))
import           Types (HasLogger(..))

data MetricsBackend = MetricsBackend { metricsCfg      :: MetricsConfig
                                     , metricsLogger   :: Maybe Logger
                                     , promConfig      :: PrometheusSettings
                                     }

type MetricsSettings = PrometheusSettings

makeConnectedMetricsBackend :: MetricsConfig -> Maybe Logger -> IO MetricsBackend
makeConnectedMetricsBackend cfg maybeL = pure MetricsBackend { metricsCfg=cfg
                                                             , metricsLogger=maybeL
                                                             , promConfig=makePromSettings cfg
                                                             }

makePromSettings :: MetricsConfig -> PrometheusSettings
makePromSettings cfg = PrometheusSettings { prometheusEndPoint=metricsEndpointURLs cfg
                                          , prometheusInstrumentApp=metricsAppWide cfg
                                          , prometheusInstrumentPrometheus=metricsIncludeMetricsEndpoint cfg
                                          }

class Metrics m where
    getMetricsLogger :: m -> Maybe Logger
    getMetricsConfig :: m -> MetricsConfig

    instrument :: m -> Application -> IO Application
    instrumentAction :: m -> String -> IO a -> IO a

instance HasLogger MetricsBackend where
    getComponentLogger = metricsLogger

instance Metrics MetricsBackend where
    getMetricsLogger = metricsLogger
    getMetricsConfig = metricsCfg

    instrument m app = logMsg m INFO "Instrumenting application..."
                       >> pure (prometheus (promConfig m) app)

    instrumentAction m metric action = logMsg m DEBUG ("Instrumenting endpoint " <> metric)
                                       >> instrumentIO metric action

As you can see, most of the heavy lifting is done by the excellent wai-middleware-prometheus library, most of the code is just me fitting it into the way I’ve architected my app (and adding a tiny bit of logging).

Step 3: Update prometheus config to start scraping

After checking the /metrics endpoint and ensuring it was available locally inside the pod (but not globally on the internet), it’s time to set up communication between Prometheus and the pod!

The first thing to do is to update the Prometheus config map, which was really easy! I thought I might have to mess with the restrictive NetworkPolicy I had set for the app but since I put the DNS-provided service name app.namespace.svc.cluster.local, I could nslookup the app pod just fine from the prometheus pod.

Here’s the updated ConfigMap, pretty standard:

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
  namespace: kube-system
data:
  prometheus.yml: |-
    global:
      scrape_interval: 15s

      external_labels:
        monitor: prm-monitor

    scrape_configs:
      - job_name: prometheus
        scrape_interval: 5s
        static_configs:
          - targets: ['localhost:9090']

      - job_name: app
        scrape_interval: 5s
        static_configs:
          - targets: [app-svc.namespace.svc.cluster.local:5001']

I did need to restart (forcefully kill the prometheus pod) though… Pretty sure you’re not supposed to do that with daemon sets.

Woops, looks like I DID indeed have to upate my NetworkPolicy – this is a pod accessing another, inside the cluster.

---
#
# metrics things can hit it
#
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-metrics
  namespace: app
spec:
  podSelector:
    matchLabels:
      tier: api
  policyTypes:
    - Ingress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          name: kube-system
    ports:
    - protocol: TCP
      port: 8080

Easy as pie! with this I instantly saw the /targets page in prometheus light up with green indicating that the scraper was working.

The data I’m getting now is thoroughly unexciting and is mostly fabricated, but it’s great to finally be doing this part of my infrastructure at least somewhere close to right. As time goes on I’ll add more and more metrics (more important ones, more likely), and get some nice Grafana dashboards up.

Wrapping up

In the next posts, I’ll go into (and hopefully easily through) setting up cluster-wide logging infrastructure, tracing, and service-mesh powered traffic analysis.

It took me entirely too long to get through this post mostly due to a backlog of tickets on the service that I instrumented, but I’m glad I finally got to it, and joined the cool kids’ club!

Did you find this read beneficial? Send me questions/comments/clarifciations.
Want my expertise on your team/project? Send me interesting opportunities!