Build and breaking Kubernetes clusters

What is a Kubernetes cluster made of?

In this session you will experience first-hand Kubernetes' architectural design choices.

The plan is as follow:

You will create a three nodes.
You will bootstrap a cluster with kubeadm — a tool designed to create Kubernetes clusters.
You will deploy a demo application with two replicas.
One by one, you will take down each Node and inspect the status of the cluster.

Will Kubernetes recover from the failures?

There's only one way to know!

Prerequisites

Make sure you have the following tools installed:

Docker Desktop (or Podman Desktop)
kubectl
minikube

Bootstrapping a cluster

Let's start by creating a three-node Kubernetes cluster with two worker nodes.

Instead of using a premade cluster, such as the one you can find on the major cloud providers, you will go through bootstrapping a cluster from scratch.

But before you can create a cluster, you need nodes.

You will use minikube for that:

$ minikube start --no-kubernetes --container-runtime=containerd --driver=docker --nodes 3
😄  minikube v1.29.0
✨  Using the docker driver based on user configuration
👍  Starting minikube without Kubernetes in cluster minikube
🚜  Pulling base image ...
🔥  Creating docker container (CPUs=2, Memory=2200MB) ...
📦  Preparing containerd 1.6.15
🏄  Done! minikube is ready without Kubernetes!

It may take a moment to create those Ubuntu instances depending on your setup.

You can verify that the nodes are created correctly with:

$ minikube node list
minikube      192.168.105.18
minikube-m02  192.168.105.19
minikube-m03  192.168.105.20

It's worth noting that those nodes are not vanilla Ubuntu images.

Containerd (the container runtime) is preinstalled.

Apart from that, there's nothing else.

It's time to install Kubernetes!

Bootstrapping the primary node

In this section, you will install Kubernetes on the master Node and bootstrap the control plane.

The control plane is made of the following components:

etcd, a consistent and highly-available key-value store.
kube-apiserver, the API you interact with when you use kubectl.
kube-scheduler, used to schedule Pods and assign them to Nodes.
kube-controller-manager, a collection of controllers used to reconcile the state of the cluster.

You can SSH into the primary node with:

$ minikube ssh --node minikube

If you find that terminal handling is not working well (e.g. resizing terminals don't work, command prompt behaves weird), you can try this alternative:

$ docker exec -it minikube su - docker

There are several tools designed to bootstrap clusters from scratch.

However, kubeadm is an official tool and the best supported.

You will use that to create your cluster.

To install kubeadm and a few more prerequisites, execute the following script in the primary node:

$ curl -s -o master.sh https://academy.learnk8s.io/master.sh
$ sudo bash master.sh auto

In a new terminal session, SSH into the second node with:

$ minikube ssh --node minikube-m02

And execute the following setup script:

$ curl -s -o worker.sh https://academy.learnk8s.io/worker.sh
$ sudo bash worker.sh auto

Repeat the same steps for the last node.

In a new terminal session, SSH into the third node with:

$ minikube ssh --node minikube-m03

And execute the following setup script:

$ curl -s -o worker.sh https://academy.learnk8s.io/worker.sh
$ sudo bash worker.sh auto

Those scripts:

Once completed, you can finally switch back to the terminal session for the primary node and bootstrap the cluster with:

$ sudo kubeadm init --config config.yaml
# truncated output
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes control plane was initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Then you can join any number of worker nodes by running the following on each as root:

  kubeadm join 192.168.49.2:8443 --token nsyxx6.quc5x0djkjdr564u \
    --discovery-token-ca-cert-hash sha256:3bc332011691454867232397bf837dbb73affc96…

Please make a note of the join command at the end of kubeadm init output. You will need it later to join the workers, and you don't have to run it now.

The command is similar to:

kubeadm join <master-node-ip>:8443 --token [some token] \
  --discovery-token-ca-cert-hash [some hash]

The command is necessary to join other nodes in the cluster.

Please don't skip the previous step! Make a note of the command and write it down! You will need it later on.

In the output of the kubeadm init, you can notice this part:

$ mkdir -p $HOME/.kube
$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
$ sudo chown $(id -u):$(id -g) $HOME/.kube/config

The above instructions are necessary to configure kubectl to talk to the control plane.

You should go ahead and follow those instructions.

Once you are done, you can verify that kubectl is configured correctly with:

$ kubectl cluster-info
Kubernetes control plane is running at https://<master-node-ip>:8443
CoreDNS is running at https://<master-node-ip>:8443/api/v1/namespaces/kube…

Let's check the pods running in the control plane with:

$ kubectl get pods --all-namespaces
NAMESPACE     NAME                               READY   STATUS
kube-system   coredns-787d4945fb-knjnx           0/1     ContainerCreating
kube-system   coredns-787d4945fb-vmnzn           0/1     ContainerCreating
kube-system   etcd-minikube                      1/1     Running
kube-system   kube-apiserver-minikube            1/1     Running
kube-system   kube-controller-manager-minikube   1/1     Running
kube-system   kube-proxy-wtdc4                   1/1     Running
kube-system   kube-scheduler-minikube            1/1     Running

Why are the CoreDNS pods in the "ContainerCreating" state?

Let's investigate further:

$ kubectl describe pod coredns-787d4945fb-knjnx -n kube-system
Name:                 coredns-787d4945fb-knjnx
Namespace:            kube-system
# truncated output
Events:
Type     Reason                   Message
----     ------                   -------
Warning  FailedCreatePodSandBox   Failed to create pod sandbox: ...(truncated)... network ...

The message suggests that the network is not ready!

But how is that possible?

You are using kubectl to send commands to the control plane — it should be ready?

The message is cryptic, but it tells you that you must still configure the network plugin.

In Kubernetes, there is no standard or default network setup.

Instead, you should configure your network and install the appropriate plugin.

You can choose from several network plugins, but for now, you will install Flannel — one of the simplest.

Installing a network plugin

Kubernetes imposes the following networking requirements on the cluster:

All Pods can communicate with all other pods.
Agents on a node, such as system daemons, kubelet, etc., can communicate with all pods on that node.

Those requirements are generic and can be satisfied in several ways.

That allows you to decide how to design and operate your cluster network.

In this case, each node in the cluster has a fixed IP address, and you only need to assign Pod IP addresses.

Flannel is a network plugin that:

Assigns a subnet to every node.
Assigns IP addresses to Pods.
Maintains a list of Pods and Nodes in the cluster.

In other words, Flannel can route the traffic from any Pod to any Pod — just what we need.

Let's install it in the cluster.

The master.sh script you executed earlier also created a flannel.yaml in the local directory.

$ ls -1
config.yaml
flannel.yaml
master.sh
traefik.yaml

You can submit it to the cluster with:

$ kubectl apply -f flannel.yaml
namespace/kube-flannel created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.apps/kube-flannel-ds created

It might take a moment to download the container and create the Pods.

You can check the progress with:

$ kubectl get pods --all-namespaces

Once all the Pods are "Ready", the control plane Node should transition to a "Ready" state too:

$ kubectl get nodes -o wide
NAME       STATUS   ROLES           VERSION
minikube   Ready    control-plane   v1.26.2

Excellent!

The control plane is successfully configured to run Kubernetes.

This time, CoreDNS should be running as well:

$ kubectl get pods --all-namespaces
NAMESPACE     NAME                               READY   STATUS
kube-system   coredns-787d4945fb-knjnx           1/1     Running
kube-system   coredns-787d4945fb-vmnzn           1/1     Running
kube-system   etcd-minikube                      1/1     Running
kube-system   kube-apiserver-minikube            1/1     Running
kube-system   kube-controller-manager-minikube   1/1     Running
kube-system   kube-proxy-wtdc4                   1/1     Running
kube-system   kube-scheduler-minikube            1/1     Running

However, there must be more than a control plane to run workloads.

You also need worker nodes.

Connecting worker Nodes

In the control plane node, pay attention to the running nodes:

$ watch kubectl get nodes -o wide

The watch command executes the kubectl get nodes command at a regular interval.

You will use this terminal session to observe nodes as they join the cluster.

In the other terminal, first, list the IP address of the second node.

$ minikube node list
minikube      192.168.105.18
minikube-m02  192.168.105.19
minikube-m03  192.168.105.20

Then, you should SSH into the first worker node with:

$ minikube ssh -n minikube-m02

Download and execute the following script to install the prerequisites:

You should now join the worker Node to the cluster with the kubeadm join command you saved earlier.

The command should look like this (the only thing we changed is added sudo in front):

$ sudo kubeadm join <master-node-ip>:8443 --token [some token] \
  --discovery-token-ca-cert-hash [some hash]

Execute the command and pay attention to the terminal window in the control plane Node.

If you encounter a "Preflight Check Error", append the following flag to the kubeadm join command: --ignore-preflight-errors=SystemVerification.

The worker node is provisioned and transitions to the "Ready" state.

As soon as the command finishes, execute the following lines to enable kubelet to start after worker node reboot:

$ sudo systemctl enable kubelet

And finally, you should repeat the instructions to join the second worker node.

First, list the IP address of the third node with:

$ minikube node list
minikube      192.168.105.18
minikube-m02  192.168.105.19
minikube-m03  192.168.105.20

Then, SSH into the node with:

$ minikube ssh -n minikube-m03

Download and install the prerequisites (pay attention to the new IP address):

Join the node to the cluster with the same kubeadm join command you used earlier:

$ sudo kubeadm join <master-node-ip>:8443 --token [some token] \
  --discovery-token-ca-cert-hash [some hash]

If you encounter a "Preflight Check Error", append the following flag to the kubeadm join command: --ignore-preflight-errors=SystemVerification.

You should observe even the second worker joining the cluster and transitioning to the "Ready" state.

And finally, complete the kubelet configuration with (enable kubelet autostart):

$ sudo systemctl enable kubelet

Excellent, you have a running cluster!

But there needs to be one nicety added to this setup: an Ingress controller.

Installing the Ingress controller

The Ingress controller is necessary to read your Ingress manifests and route traffic inside the cluster.

Kubernetes has no default Ingress controller, so you must install one if you wish to use it.

When you executed the master.sh command, it created a traefik.yaml file.

$ ls -1
config.yaml
flannel.yaml
master.sh
traefik.yaml

Traefik is an ingress controller and can be installed with the following command:

$ kubectl apply -f traefik.yaml
namespace/traefik created
serviceaccount/traefik created
clusterrole.rbac.authorization.k8s.io/traefik created
clusterrolebinding.rbac.authorization.k8s.io/traefik created
daemonset.apps/traefik created

The Ingress controller is deployed as a Pod, so you should wait until the image is downloaded.

If the installation was successful, you should be able to return the host and curl the first worker Node:

$ curl <ip minikube-m02>
curl: (7) Failed to connect to 192.168.49.3 port 80: Operation timed out

Unfortunately, that IP address lives in the Docker network and is not reachable from your Mac or Windows (it's reachable if you are working on Linux).

But, worry not.

You can launch a jumpbox — a container with a terminal session in the same network:

$ docker run -ti --rm --network=minikube ghcr.io/learnk8s/netshoot:2023.03

From this container, you can reach any node of the cluster — let's retrieve and repeat the experiment:

$ curl <ip minikube-m02>
404 page not found

You should see a 404 page not found message.

404 page not found is not an error. This is a message from the Ingress controller saying no routes are set up for this URL.

Is your cluster ready now?

Yes, there's one minor step needed.

You have to be logged in to the control plane node to issue kubectl commands.

Wouldn't it be easier if you could send commands from your computer instead?

Exposing the kubeconfig

The kubeconfig file holds the credentials to connect to the cluster.

Currently, the file is saved on the control plane node, but you can copy the content and save it on your computer (outside of the minikube virtual machine).

You can retrieve the content with:

$ minikube ssh --node minikube cat '$HOME/.kube/config' >kubeconfig

Now, the content is saved in your local file, named kubeconfig.

If you are on Mac or Windows, you should apply one small change: replace <master-node-ip>:8443 with localhost and the correct port exposed by Docker.

First, list your nodes with:

$ docker ps
CONTAINER ID   IMAGE                                 PORTS                       NAMES
5717b8d142ac   gcr.io/k8s-minikube/kicbase:v0.0.37   127.0.0.1:53517->8443/tcp   minikube-m03
d5e1dbe9611c   gcr.io/k8s-minikube/kicbase:v0.0.37   127.0.0.1:53503->8443/tcp   minikube-m02
648efe712022   gcr.io/k8s-minikube/kicbase:v0.0.37   127.0.0.1:53486->8443/tcp   minikube

Find the port that forwards to 8443 for the control plane (in the above example is 53486).

And finally, replace <master-node-ip>:8443 in your kubeconfig:

apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: LS0tLS1CR…
    server: https://127.0.0.1:<insert-port-here>
  name: mk
contexts:

Finally, navigate to the directory where the file is located and execute the following line:

export KUBECONFIG="${PWD}/kubeconfig"

Or if you are using PowerShell:

$Env:KUBECONFIG="${PWD}\kubeconfig"

You can verify that you are connected to the cluster with:

$ kubectl cluster-info
Kubernetes control plane is running at https://127.0.0.1:53486
CoreDNS is running at https://127.0.0.1:53486/api/v1/namespaces/kube…

Please note that you should export the path to the kubeconfig whenever you create a new terminal session.

You can also store the credentials alongside the default kubeconfig file instead of changing environment variables.

However, since you will destroy the cluster at the end of this module, it's better to keep them separated for now.

If you are still convinced you should merge the details with your kubeconfig, you can find the instructions on how to do so here.

Congratulations!

You just configured a fully functional Kubernetes cluster!

Recap:

You created three virtual machines using minikube.
You bootstrapped the Kubernetes control plane node using kubeadm.
You installed Flannel as the network plugin.
You installed an Ingress controller.
You configured kubectl to work from outside the control plane.

The cluster is fully functional, and it's time to deploy an application.

Deploying the application

You can find the application's YAML definition in app.yaml.

The file contains three Kubernetes resources:

A deployment for the podinfo pod. It is currently set to a single replica, but you should deploy 2.
There is a Service to route traffic to the pods.
There is an Ingress manifest to route the traffic to the pod.

You can create the resources with

$ kubectl apply -f https://raw.githubusercontent.com/learnk8s/devopdays-sg-ha/refs/heads/main/app.yaml

Verifying the deployment:

List the current IP addresses for the cluster from your host machine with:

$ minikube node list
minikube      192.168.105.18
minikube-m02  192.168.105.19
minikube-m03  192.168.105.20

If your app is deployed correctly, you should be able to execute:

kubectl get pods -o wide and see two pods deployed — one for each node.
curl <ip minikube-m02> from the jumpbox and see the pod hostname in the JSON output.
curl <ip minikube-m03> from the jumpbox and see the pod hostname in the JSON output.

If your deployment isn't quite right, try to debug it using this handy flowchart.

Testing resiliency

Kubernetes is engineered to keep running even if some components are unavailable.

So you could have a temporary failure to one the scheduler, but the cluster will still keep operating as usual.

The same is true for all other components.

The best way to validate this statement is to break the cluster.

What happens when a Node becomes unavailable?

Can Kubernetes gracefully recover?

And what if the primary Node is unavailable?

Let's find out.

Making the nodes unavailable

Observe the nodes and pods in the cluster with:

$ watch kubectl get nodes,pods -o wide
NAME                STATUS   ROLES                  INTERNAL-IP
node/minikube       Ready    control-plane          192.168.105.18
node/minikube-m02   Ready    <none>                 192.168.105.19
node/minikube-m03   Ready    <none>                 192.168.105.20

NAME                               READY   STATUS   NODE
pod/hello-world-5d6cfd9db8-nn256   1/1     Running  minikube-m02
pod/hello-world-5d6cfd9db8-dvnmf   1/1     Running  minikube-m03

Observe how Pods and Nodes are in the "Running" and "Ready" states.

Let's break a worker node and observe what happens.

In another terminal session, shut down the second worker node with:

$ minikube node stop minikube-m03
✋  Stopping node "minikube-m03"  ...
🛑  Successfully stopped node minikube-m03

Please note the current time and set the alarm for 5 minutes — (you will understand why soon).

Observe the node almost immediately transitioning to a "Not Ready" state:

$ kubectl get nodes -o wide
NAME                STATUS   ROLES                  INTERNAL-IP
node/minikube       Ready    control-plane          192.168.105.18
node/minikube-m02   Ready    <none>                 192.168.105.19
node/minikube-m03   NotReady <none>                 192.168.105.20

The application should still serve traffic as usual.

Try to issue a request from the jumpbox with:

$ curl <minikube-m02 IP address>
Hello, hello-world-5d6cfd9db8-nn256

However, there is something odd with the Pods.

Have you noticed?

$ watch kubectl get pods -o wide
NAME                               READY   STATUS   NODE
pod/hello-world-5d6cfd9db8-nn256   1/1     Running  minikube-m02
pod/hello-world-5d6cfd9db8-dvnmf   1/1     Running  minikube-m03

Why is the Pod on the second worker node still in the "Running" state?

And, even more puzzling, Kubernetes knows the Node is not available (e.g. it's "NotReady"), why isn't rescheduling the Pod?

Making the control plane unavailable

You should stop the control plane:

$ minikube node stop minikube

Please note that, from this point onwards, you won't be able to observe the state of the cluster with kubectl.

Notice how the application still serves traffic as usual.

Execute the following command from the jumpbox:

$ curl <minikube-m02 IP address>
Hello, hello-world-5d6cfd9db8-nn256

In other words, the cluster can still operate even if the control plane is unavailable.

You won't be able to schedule or update workloads, though.

You should restart the control plane with:

$ minikube node start minikube

Please notice that minikube may assign a different forwarding port to this container, and you might need to fix your kubeconfig file.

You can easily verify if the port has changed with:

CONTAINER ID   IMAGE                                 PORTS                       NAMES
5717b8d142ac   gcr.io/k8s-minikube/kicbase:v0.0.37   127.0.0.1:53517->8443/tcp   minikube-m03
d5e1dbe9611c   gcr.io/k8s-minikube/kicbase:v0.0.37   127.0.0.1:53503->8443/tcp   minikube-m02
648efe712022   gcr.io/k8s-minikube/kicbase:v0.0.37   127.0.0.1:65375->8443/tcp   minikube

In this case, the port used to be 53486 and now is 65375.

You should amend your kubeconfig file accordingly:

apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: LS0tLS1CR…
    server: https://127.0.0.1:<insert-new-port-here>
  name: mk
contexts:

With this minor obstacle out of your way, halt the remaining worker Node with:

$ minikube node stop minikube-m02
✋  Stopping node "minikube-m02"  ...
🛑  Successfully stopped node minikube-m02

You should verify that both Nodes are in the NotReady state:

$ kubectl get nodes -o wide
NAME                STATUS   ROLES                  INTERNAL-IP
node/minikube       Ready    control-plane          192.168.105.18
node/minikube-m02   NotReady <none>                 192.168.105.19
node/minikube-m03   NotReady <none>                 192.168.105.20

And the application is finally unreachable.

Neither curl <minikube-m02 IP address> nor curl <minikube-m03 IP address> from the jumpbox will work now.

Scheduling workloads with no worker node

Despite not having any worker nodes, you can still scale the application to 5 instances:

$ kubectl edit deployment hello-world
deployment.apps/hello-world edited

And change the replicas to replicas: 5.

Monitor the pods with:

$ watch kubectl get pods -o wide
NAME                           READY   STATUS
hello-world-5d6cfd9db8-2k7f9   0/1     Pending
hello-world-5d6cfd9db8-8dpgd   0/1     Pending
hello-world-5d6cfd9db8-cwwr2   0/1     Pending
hello-world-5d6cfd9db8-dvnmf   1/1     Terminating
hello-world-5d6cfd9db8-nn256   1/1     Running
hello-world-5d6cfd9db8-rjd54   1/1     Running

The Pods stay pending because no worker node is available to run them.

In another terminal session, start both nodes with:

$ minikube node start minikube-m02
$ minikube node start minikube-m03

It might take a while for the two virtual machines to start, but, in the end, the Deployment should have five replicas "Running".

You can test that the application is available from the jumpbox with:

$ curl <minikube-m02 IP address>
Hello, hello-world-5d6cfd9db8-8dpgd

But, again, there's something odd.

Have you noticed how the Pods are distributed in the cluster?

Let's pay attention to the Pod distribution in the cluster:

$ kubectl get pods -o wide
NAME                           READY   STATUS    NODE
hello-world-5d6cfd9db8-2k7f9   1/1     Running   minikube-m02
hello-world-5d6cfd9db8-8dpgd   1/1     Running   minikube-m02
hello-world-5d6cfd9db8-cwwr2   1/1     Running   minikube-m02
hello-world-5d6cfd9db8-nn256   1/1     Running   minikube-m02
hello-world-5d6cfd9db8-rjd54   1/1     Running   minikube-m02

In this case, five Pods run on worker1 and none on worker2.

However, you might experience a slightly different distribution.

You could have any of the following:

5 Pods on worker1, 0 on worker2
0 Pods on worker1, 5 on worker2
3 Pods on worker1, 2 on worker2
2 Pods on worker1, 3 on worker2

And if you are lucky, you could also have:

4 Pods on worker1, 1 on worker2
1 Pods on worker1, 4 on worker2

Why isn't Kubernetes rebalancing the Pods?

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
app.yaml		app.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Build and breaking Kubernetes clusters

Prerequisites

Bootstrapping a cluster

Bootstrapping the primary node

Installing a network plugin

Connecting worker Nodes

Installing the Ingress controller

Exposing the kubeconfig

Deploying the application

Testing resiliency

Making the nodes unavailable

Making the control plane unavailable

Scheduling workloads with no worker node

About

Uh oh!

Releases

Packages

learnk8s/devopdays-sg-ha

Folders and files

Latest commit

History

Repository files navigation

Build and breaking Kubernetes clusters

Prerequisites

Bootstrapping a cluster

Bootstrapping the primary node

Installing a network plugin

Connecting worker Nodes

Installing the Ingress controller

Exposing the kubeconfig

Deploying the application

Testing resiliency

Making the nodes unavailable

Making the control plane unavailable

Scheduling workloads with no worker node

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages