1 year before I setup a Kubernetes cluster with 3 master nodes and 1 worker node, and today I got the following error
I don’t know that all the certs generated only have 1 year lifespan. Document says during update of cluster version should automatically update all the certs, but I don’t have the time…
1
2
| kubectl get pods --all-namespaces
kubectl: Unable to connect to the server: x509: certificate has expired or is not yet valid
|
Update process
I found the following article. Solve the certificate error, but…
Please follow this link to complete the setup.
After update
I do a health check and I saw this
1
2
3
4
5
6
| ubuntu@k8s-master02:/etc/kubernetes$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master01 Ready control-plane,master 367d v1.20.5
k8s-master02 NotReady control-plane,master 367d v1.20.5
k8s-master03 Ready control-plane,master 367d v1.20.5
k8s-node01 Ready <none> 367d v1.20.5
|
I login to k8s-master02.
run journalctl -u kubelet
shows the following error
1
2
| Attempting to register node k8s-master02
Apr 09 05:25:17 k8s-master02 kubelet[744]: E0409 05:25:17.570172 744 kubelet_node_status.go:93] Unable to register node with API server: nodes is forbidden: User "system:anonymous" cannot create resource "nodes" in API group "" at the cluster scope
|
Re-add master node
I search Google, seems no one have the same error, so I have to delete the k8s-master02 node and re-add.
On k8s-master01
Before removing k8s-master02, I need to find a way to add k8s-master02 back
This stackoverflow page is helpful
https://stackoverflow.com/questions/63936268/how-to-generate-kubeadm-token-for-secondary-control-plane-nodes
1
2
3
4
| # temporary add a upload certs for joining new control plane. (Valid only for 2 hours)
kubeadm init phase upload-certs --upload-certs
# ask k8s to generate the join command
kubeadm token create --print-join-command
|
remove k8s-master02
1
2
| kubectl drain k8s-master02
kubectl delete node k8s-master02
|
On k8s-master02
Now, I have the join command, I run the following commands
1
2
3
| kubeadm reset
kubeadm join 192.168.100.20:6444 --token pel8cz.mtvnu0n7q8aj6lx6 --discovery-token-ca-cert-hash sha256:xxxxxxx --control-plane --certificate-key xxxxxxxx
|
another error come out….
1
2
3
| 1006 18:54:08.820819 2954 manifests.go:135] [control-plane] wrote static Pod manifest for component "kube-scheduler" to "/etc/kubernetes/manifests/kube-scheduler.yaml"
[check-etcd] Checking that the etcd cluster is healthy
failed to dial endpoint https://192.168.100.22:2379 with maintenance client
|
Clear etc
This page helps me to troubleshoot. https://stackoverflow.com/questions/67921552/re-installed-node-cannot-join-kubernetes-cluster
Back to k8s-master01
1
2
3
4
5
6
7
8
9
10
| # have to install etc-client on ubuntu
ETCDCTL_API=3 etcdctl --endpoints 127.0.0.1:2379 --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member list
# list of etcd member
5a4945140f0b39d9, started, sbg2-k8s001, https://192.168.208.12:2380, https://192.168.208.12:2379
740381e3c57ef823, started, gra3-k8s001, https://192.168.208.13:2380, https://192.168.208.13:2379
77a8fbb530b10f4a, started, rbx4-k8s001, https://192.168.208.14:2380, https://192.168.208.14:2379
# remove k8s-master02 from the above list
ETCDCTL_API=3 etcdctl --endpoints 127.0.0.1:2379 --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member remove e073aa5a204b727d
|
Finally
I am able to re-join k8s-master02 now and all nodes are healthy.
1
2
3
4
5
| NAME STATUS ROLES AGE VERSION
k8s-master01 Ready control-plane,master 367d v1.20.5
k8s-master02 Ready control-plane,master 31s v1.20.5
k8s-master03 Ready control-plane,master 367d v1.20.5
k8s-node01 Ready <none> 367d v1.20.5
|