After trying to restart my cluster I ran into issues not having the --kubeconfig
option. kube-router
couldn't find 10.96.0.1
(the API server) and CoreDNS couldn't start because it couldn't find anything (since kube-router
) was down. It's a chicken and egg and though this PR was supposed to solve it, it's certainly still a problem. Not sure if this is because I'm on older versions of kube-router
(1.0.1) and Kubernetes itself (1.17) but got some nice unscheduled downtime after a machine restart.
In the reddit discussion, there was a better solution suggested -- using a hard-coded kubeconfig that references the mounted pod-local service account file.
An even better solution showed up after that when murali-reddy hopped in to note that this was actually an upstream Kubernetes issue which has since been solved, it was likely that kube-router could be updated to use and rely on the pod-mounted credentials without specifying kubeconfig at all. I removed --kubeconfig
from my DS and it worked great for me, so I can whole-heartedly recommend the solution.
tl;dr - If you’re running kube-router
(I run version 1.0.1
), make sure to update the kubeconfig that is being used by it after credential rotations, otherwise spooky pod->service (but not pod->pod) communication issues could occur.
Recently while working on some unrelated issues, I discovered that the kubeconfig that kube-router
uses can indeed be stale. I ran into some issues with service->service communication and root-caused the issue after a bunch of head scratching – the fix was to simply copy the newer kubeconfig
over to the right directory on th ehost for kube-router
to pick up. This wasn’t a real satisfying fix, but it certainly was enough to get me going again. I probably won’t be using kube-router
for much longer, in favor of Cilium or Calico so for now it was good enough.
Here’s the rough gist of what I did to figure out what was wrong:
NetworkPolicy
objects involved (and blocking the requests)kubectl exec
into the container and do nslookup
to the service name (inthe output you should see an <service>.<namespace>.svc.cluster.local
entry for the service in question)kubectl exec
and curl
the IP of the pod backing the service (an Endpoint
of the Service) directly (you can get pod IPs via kubectl get pods -o wide
)If all the above goes well, we know at this point we know the problem is not DNS at least, and pod->pod communication is working as we expect, so the problem lies elsewhere.
This point is where I started so suspect that something with the CNI (kube-router
) wasn’t working properly so I took a look at the logs of my kube-router
pod and found the answer:
E0107 19:06:25.099257 1 reflector.go:205] github.com/cloudnativelabs/kube-router/vendor/k8s.io/client-go/informers/factory.go:73: Failed to list *v1.NetworkPolicy: Unauthorized
I0107 19:06:25.099907 1 reflector.go:240] Listing and watching *v1.Pod from github.com/cloudnativelabs/kube-router/vendor/k8s.io/client-go/informers/factory.go:73
E0107 19:06:25.100317 1 reflector.go:205] github.com/cloudnativelabs/kube-router/vendor/k8s.io/client-go/informers/factory.go:73: Failed to list *v1.Pod: Unauthorized
I0107 19:06:25.101082 1 reflector.go:240] Listing and watching *v1.Namespace from github.com/cloudnativelabs/kube-router/vendor/k8s.io/client-go/informers/factory.go:73
E0107 19:06:25.101501 1 reflector.go:205] github.com/cloudnativelabs/kube-router/vendor/k8s.io/client-go/informers/factory.go:73: Failed to list *v1.Namespace: Unauthorized
I0107 19:06:26.096717 1 reflector.go:240] Listing and watching *v1.Endpoints from github.com/cloudnativelabs/kube-router/vendor/k8s.io/client-go/informers/factory.go:73
E0107 19:06:26.097421 1 reflector.go:205] github.com/cloudnativelabs/kube-router/vendor/k8s.io/client-go/informers/factory.go:73: Failed to list *v1.Endpoints: Unauthorized
And there it is – one of the most common problems I run into, something trying to talk to the Kubernetes API and not being able to authenticate. Of course, I didn’t believe these messages immediately – “Why in the world would it be Unauthorized? kube-router definitely has credentials”, so I start going back and looking through my configuration for the kube-router
DaemonSet
(ConfigMap
s, etc).
Then it dawns on me – “how does kube-router get it’s credentials again?…” – does it use a ServiceAccount
? The serviceAccount
seemed to be set correctly, but there’s one I didn’t consider – turns out kube-router uses a kubeconfig. This is where I found the actual problem – I have a hostPath
entry that points to a file on disk, and after the credentials were rotated (during a recent cluster upgrade/change), the file actually became stale, so I updated it, and everything started working again.
I don’t particularly like how manual the solution was (copying over a file), but since I plan on moving to a different CNI in the near future it’s good enough for me, for now.