It's been a while since this was posted. Hopefully the information in here is still useful to you (if it isn't please let me know!). If you want to get the new stuff as soon as it's out though, sign up to the mailing list below.Join the Mailing list
After trying to restart my cluster I ran into issues not having the
kube-router couldn't find
10.96.0.1 (the API server) and CoreDNS couldn't start because it couldn't find anything (since
kube-router) was down. It's a chicken and egg and though this PR was supposed to solve it, it's certainly still a problem. Not sure if this is because I'm on older versions of
kube-router (1.0.1) and Kubernetes itself (1.17) but got some nice unscheduled downtime after a machine restart.
In the reddit discussion, there was a better solution suggested -- using a hard-coded kubeconfig that references the mounted pod-local service account file.
An even better solution showed up after that when murali-reddy hopped in to note that this was actually an upstream Kubernetes issue which has since been solved, it was likely that kube-router could be updated to use and rely on the pod-mounted credentials without specifying kubeconfig at all. I removed
--kubeconfig from my DS and it worked great for me, so I can whole-heartedly recommend the solution.
tl;dr - If you’re running
kube-router (I run version
1.0.1), make sure to update the kubeconfig that is being used by it after credential rotations, otherwise spooky pod->service (but not pod->pod) communication issues could occur.
Recently while working on some unrelated issues, I discovered that the kubeconfig that
kube-router uses can indeed be stale. I ran into some issues with service->service communication and root-caused the issue after a bunch of head scratching – the fix was to simply copy the newer
kubeconfig over to the right directory on th ehost for
kube-router to pick up. This wasn’t a real satisfying fix, but it certainly was enough to get me going again. I probably won’t be using
kube-router for much longer, in favor of Cilium or Calico so for now it was good enough.
Here’s the rough gist of what I did to figure out what was wrong:
NetworkPolicyobjects involved (and blocking the requests)
kubectl execinto the container and do
nslookupto the service name (inthe output you should see an
<service>.<namespace>.svc.cluster.localentry for the service in question)
curlthe IP of the pod backing the service (an
Endpointof the Service) directly (you can get pod IPs via
kubectl get pods -o wide)
If all the above goes well, we know at this point we know the problem is not DNS at least, and pod->pod communication is working as we expect, so the problem lies elsewhere.
This point is where I started so suspect that something with the CNI (
kube-router) wasn’t working properly so I took a look at the logs of my
kube-router pod and found the answer:
E0107 19:06:25.099257 1 reflector.go:205] github.com/cloudnativelabs/kube-router/vendor/k8s.io/client-go/informers/factory.go:73: Failed to list *v1.NetworkPolicy: Unauthorized I0107 19:06:25.099907 1 reflector.go:240] Listing and watching *v1.Pod from github.com/cloudnativelabs/kube-router/vendor/k8s.io/client-go/informers/factory.go:73 E0107 19:06:25.100317 1 reflector.go:205] github.com/cloudnativelabs/kube-router/vendor/k8s.io/client-go/informers/factory.go:73: Failed to list *v1.Pod: Unauthorized I0107 19:06:25.101082 1 reflector.go:240] Listing and watching *v1.Namespace from github.com/cloudnativelabs/kube-router/vendor/k8s.io/client-go/informers/factory.go:73 E0107 19:06:25.101501 1 reflector.go:205] github.com/cloudnativelabs/kube-router/vendor/k8s.io/client-go/informers/factory.go:73: Failed to list *v1.Namespace: Unauthorized I0107 19:06:26.096717 1 reflector.go:240] Listing and watching *v1.Endpoints from github.com/cloudnativelabs/kube-router/vendor/k8s.io/client-go/informers/factory.go:73 E0107 19:06:26.097421 1 reflector.go:205] github.com/cloudnativelabs/kube-router/vendor/k8s.io/client-go/informers/factory.go:73: Failed to list *v1.Endpoints: Unauthorized
And there it is – one of the most common problems I run into, something trying to talk to the Kubernetes API and not being able to authenticate. Of course, I didn’t believe these messages immediately – “Why in the world would it be Unauthorized? kube-router definitely has credentials”, so I start going back and looking through my configuration for the
Then it dawns on me – “how does kube-router get it’s credentials again?…” – does it use a
serviceAccount seemed to be set correctly, but there’s one I didn’t consider – turns out kube-router uses a kubeconfig. This is where I found the actual problem – I have a
hostPath entry that points to a file on disk, and after the credentials were rotated (during a recent cluster upgrade/change), the file actually became stale, so I updated it, and everything started working again.
I don’t particularly like how manual the solution was (copying over a file), but since I plan on moving to a different CNI in the near future it’s good enough for me, for now.