After trying to restart my cluster I ran into issues not having the
kube-router couldn't find
10.96.0.1 (the API server) and CoreDNS couldn't start because it couldn't find anything (since
kube-router) was down. It's a chicken and egg and though this PR was supposed to solve it, it's certainly still a problem. Not sure if this is because I'm on older versions of
kube-router (1.0.1) and Kubernetes itself (1.17) but got some nice unscheduled downtime after a machine restart.
In the reddit discussion, there was a better solution suggested -- using a hard-coded kubeconfig that references the mounted pod-local service account file.
An even better solution showed up after that when murali-reddy hopped in to note that this was actually an upstream Kubernetes issue which has since been solved, it was likely that kube-router could be updated to use and rely on the pod-mounted credentials without specifying kubeconfig at all. I removed
--kubeconfig from my DS and it worked great for me, so I can whole-heartedly recommend the solution.
tl;dr - If you’re running
kube-router (I run version
1.0.1), make sure to update the kubeconfig that is being used by it after credential rotations, otherwise spooky pod->service (but not pod->pod) communication issues could occur.
Recently while working on some unrelated issues, I discovered that the kubeconfig that
kube-router uses can indeed be stale. I ran into some issues with service->service communication and root-caused the issue after a bunch of head scratching – the fix was to simply copy the newer
kubeconfig over to the right directory on th ehost for
kube-router to pick up. This wasn’t a real satisfying fix, but it certainly was enough to get me going again. I probably won’t be using
kube-router for much longer, in favor of Cilium or Calico so for now it was good enough.
Here’s the rough gist of what I did to figure out what was wrong:
NetworkPolicyobjects involved (and blocking the requests)
kubectl execinto the container and do
nslookupto the service name (inthe output you should see an
<service>.<namespace>.svc.cluster.localentry for the service in question)
curlthe IP of the pod backing the service (an
Endpointof the Service) directly (you can get pod IPs via
kubectl get pods -o wide)
If all the above goes well, we know at this point we know the problem is not DNS at least, and pod->pod communication is working as we expect, so the problem lies elsewhere.
This point is where I started so suspect that something with the CNI (
kube-router) wasn’t working properly so I took a look at the logs of my
kube-router pod and found the answer:
E0107 19:06:25.099257 1 reflector.go:205] github.com/cloudnativelabs/kube-router/vendor/k8s.io/client-go/informers/factory.go:73: Failed to list *v1.NetworkPolicy: Unauthorized I0107 19:06:25.099907 1 reflector.go:240] Listing and watching *v1.Pod from github.com/cloudnativelabs/kube-router/vendor/k8s.io/client-go/informers/factory.go:73 E0107 19:06:25.100317 1 reflector.go:205] github.com/cloudnativelabs/kube-router/vendor/k8s.io/client-go/informers/factory.go:73: Failed to list *v1.Pod: Unauthorized I0107 19:06:25.101082 1 reflector.go:240] Listing and watching *v1.Namespace from github.com/cloudnativelabs/kube-router/vendor/k8s.io/client-go/informers/factory.go:73 E0107 19:06:25.101501 1 reflector.go:205] github.com/cloudnativelabs/kube-router/vendor/k8s.io/client-go/informers/factory.go:73: Failed to list *v1.Namespace: Unauthorized I0107 19:06:26.096717 1 reflector.go:240] Listing and watching *v1.Endpoints from github.com/cloudnativelabs/kube-router/vendor/k8s.io/client-go/informers/factory.go:73 E0107 19:06:26.097421 1 reflector.go:205] github.com/cloudnativelabs/kube-router/vendor/k8s.io/client-go/informers/factory.go:73: Failed to list *v1.Endpoints: Unauthorized
And there it is – one of the most common problems I run into, something trying to talk to the Kubernetes API and not being able to authenticate. Of course, I didn’t believe these messages immediately – “Why in the world would it be Unauthorized? kube-router definitely has credentials”, so I start going back and looking through my configuration for the
Then it dawns on me – “how does kube-router get it’s credentials again?…” – does it use a
serviceAccount seemed to be set correctly, but there’s one I didn’t consider – turns out kube-router uses a kubeconfig. This is where I found the actual problem – I have a
hostPath entry that points to a file on disk, and after the credentials were rotated (during a recent cluster upgrade/change), the file actually became stale, so I updated it, and everything started working again.
I don’t particularly like how manual the solution was (copying over a file), but since I plan on moving to a different CNI in the near future it’s good enough for me, for now.