Kubernetes: Architecture & Resource Management

8 min readDec 25, 2023

👷‍♂️ Software Architecture Series — Part 13.

📌Kubernetes is an open source (one of the largest and popular projects since 2014) container orchestrater for deploying applications in a distributed cloud environment. Its fundamental aim is to equip software developers with more velocity, efficiency and agility while building an application and deploying seamlessly on cloud. Application containers when packaged in Docker image format become easy to build, deploy and distribute and come with the benefit of providing necessary abstraction.

The Images are bundling of a software program (or a service) and its dependencies into a single artifact under a root file system. An image is a binary package which has all the files necessary to run a program inside a container. Containers can be viewed as the instances of images running as a process on host operating system. For example, Docker images become Docker containers while running on Docker Engine, which is now industry’s standard and runs on various operating systems. The idea is to help software developers avoid “dependency hell” and provide with ability to run applications consistently across different infrastructures.

While Docker is a container run time engine, Kubernetes is used to run and manage containers from many container runtime engines. It unifies a cluster of machines into single pool of resources and schedules containers to run on these machines based on available resources. Inside Kubernetes, Pods are the basic unit, which basically groups containers together. Apart from grouping containers, Kubernetes provides services like compute scheduling, self-healing, service health monitoring, scaling, volume management, secret, and configuration management, etc. to provide a seamless cloud deployment ecosystem.

Kubernetes works on a master slave communication model, with one master and several worker nodes. The master node, usually called control plane, is used to manage, and orchestrate entire Kubernetes clusters. The worker nodes, usually called data plane, provide container runtimes for running applications (images) and ensure communication with master node and networking. These are the primary execution unit within Kubernetes. Each worker node can host multiple pods and each pod containing one or more containers running inside them.

Let us look at its architectural components:

Master Node/ Control Plane:

kube-API Server: users and third-party components like monitoring services can interact with cluster via this API server. Communication happens over REST API protocols. The API server works only with etcd and includes a built-in bastion API-server proxy, enabling external access to Cluster IP services.

etcd: It is a distributed key-value store for storing data about Kubernetes clusters, such as information about pods, their state, and namespaces. It is the sole stateful component in the control plane and helps API server monitor any modification to a Kubernetes object’s state.

Scheduler: It schedules compute requests on the cluster. Upon deployment of a pod, scheduler identifies the appropriate worker node in the cluster which matches with pod’s requirement and schedules the pod on the node.

kube-controller manager: It handles different controllers that help create replicas of containers and ensure the cluster stays in the desired state (declaratively provided via manifest YAML file).

cloud-controller manager: It helps in interacting with the underlying cloud infrastructure in a cloud-provider-specific way. Its primary role is to manage cloud-specific features, integrations, and resources within a Kubernetes cluster. Since different cloud providers offer unique services and functionalities, the cloud-controller manager allows Kubernetes to abstract and interface with these cloud-specific services seamlessly. It helps decouple cloud-specific functionalities from the core Kubernetes control plane components. This separation ensures that the main Kubernetes codebase remains generic and avoids becoming too tightly coupled with individual cloud provider details. It continuously monitors the state of cloud resources and reconcile them with the desired state specified in Kubernetes objects (like services, persistent volume claims).

Worker Node/ Data Plane:

kubelet: It is an agent responsible for registering worker nodes with the API server and working with the podSpec primarily from the API server. It manages lifecycle of pods on a node and lifecycle of containers for a pod on a given node, mounts volume by reading pod config and handles liveliness, readiness, and startup probes. It monitors the node’s resources (CPU, memory, disk, network) and ensures that containers running on the node do not exceed allocated resources. It enforces resource limits specified in the pod specifications. Kubelet interacts with the API server to fetch pod specifications, report node status, and handle various control plane requests related to pod management.

kube-proxy: It is a networking component within data plane to facilitate communication to and from services running within the cluster. In Kubernetes, a service is an abstract way to expose an application running on a set of pods as a network service. It provides a stable, virtual IP address (Cluster IP) that represents the service and a way to access the pods backing that service. Kube-proxy runs as a daemonset on every node in the Kubernetes cluster. This deployment strategy ensures that there is an instance of kube-proxy running on each node, managing network rules locally on that node to handle traffic for Services and their associated pods. It communicates with the API server to get the details about the Services and their respective pod IPs and ports.

Container runtime: It provides runtime environment for containers to run. It performs various tasks such as pulling images from container registries, allocating and isolating resources for containers, and managing the entire lifecycle of a container on a host. Kubernetes supports multiple container runtimes compliant with CRI, such as CRI-O, Docker Engine, and containerd. The kubelet manages lifecycle of containers and provides information to control plane via runtimes.

When a request is made to Kubernetes (e.g., creating a pod), the API server receives the request, validates it, and updates the cluster’s state stored in etcd. The scheduler continuously watches for new pods and unscheduled pods. When it identifies an unscheduled pod, it queries the API server for pod details and makes decisions about which node would be the best fit based on available resources and constraints. Scheduler doesn’t directly interact with etcd during this decision-making process.

Once the kube-scheduler decides on the best node for the pod, it updates the pod’s scheduled node field in the API server. The kubelet on the selected node then receives this information and starts the container(s) associated with the pod.

Resource Management:

Although the primary motivator to adopt Kubernetes based architecture is reliable and seamless deployment in a cloud distributed system, Kubernetes can also play a vital role in efficiently managing the resources within the cloud infrastructure. It aims at increasing the Utilization metric (ratio of actively used resources to the total resources purchased or allocated) within a deployed cloud infrastructure. Resource Requests (minimum amount of a resource required to run the application) and Resource Limits (maximum amount of a resource that an application can consume) are two metrics which can be declaratively specified in a Kubernetes manifest YAML file.

Defining accurate resource requests and limits in pod specifications is crucial. By providing Kubernetes with precise information about an application’s resource requirements (CPU, memory), the scheduler can make better decisions when placing pods on nodes. Scheduler uses a bin-packing algorithm to determine the best placement of pods on nodes based on resource requirements and availability. Kubernetes allows for resource sharing among pods within a node. Resources are requested per container, not per Pod. The total resources requested by the Pod is the sum of all resources requested by all containers in the Pod because the different containers often have very different CPU requirements. If one pod is not utilizing its full allocated resources, those resources can be made available for other pods on the same node. This efficient sharing helps in maximizing overall cluster utilization.

In case a container exceeds its memory limit in Kubernetes, The Linux kernel employs an OOM killer to terminate the offending container to free up resources for other critical system processes or containers. The OOM killer selects the container to terminate based on various criteria like memory usage, container priority, etc. Within the container, processes attempting to allocate more memory than the specified limit will encounter failures. For instance, if an application tries to allocate memory via malloc or any similar operation and the allocated memory exceeds the set limit, the allocation will fail, resulting in errors or crashes within the application.

Upon hitting memory limits, container might experience performance degradation, unresponsiveness or exhibit unexpected behavior. Container runtime takes note of all the events and logs generated due to memory limit breaches, which can be later helpful for debugging and understanding resource usage patterns.

In Kubernetes, it’s not possible to dynamically increase the memory limits of a running container once it has been set. The memory limit set for a container remains fixed during its lifecycle. However, we can analyze resources using a VerticalPodAutoscaler object or configure automatic scaling using vertical Pod auto scaling. VerticalPodAutoscaler object can analyze resource usage and recommend or set resource requests for pods based on observed metrics. When using VerticalPodAutoscaler (VPA) with the updateMode set to Auto, if the VPA determines that a Pod’s resource requests need to be changed based on the observed usage patterns and the configuration set in the VPA object, it will trigger the eviction of that Pod. This eviction results in the Pod being terminated and then recreated by Kubernetes with the updated resource requests specified by the VPA.

However, if the application is facing a sudden increase in resource usage and situation needs to be managed effectively, Horizontal Pod Scaler comes to the rescue. It automatically adjusts the number of pod replicas within a Kubernetes deployment, replication controller, or replica set based on observed metrics like CPU utilization, memory consumption, or custom metrics. When workload demand increases, it scales out by adding more pods, and when demand decreases, it scales in by reducing the number of pods. When there’s a need for additional compute resources (due to increased demand for pods), it automatically provisions new nodes to accommodate them. Conversely, it can scale down the cluster by removing nodes when resources are underutilized.

To work with Horizontal Pod Scaler, first step is to ensure that the Kubernetes Metrics Server is installed in the cluster. The Metrics Server collects resource utilization metrics (such as CPU and memory usage) from the cluster. The second step would be to specify in the manifest YAML file the metrics for scaling. By default, Kubernetes supports CPU utilization metrics. Then the Horizontal pod Autoscaler object is defined, and minimum and maximum number of replicas are also specified. This object continuously monitors the defined metrics and when the metrics breach the defined thresholds (e.g., high CPU utilization), HPA automatically adjusts the number of replicas of your deployment to meet the demand.

🛒In summary, Kubernetes extends its role beyond a container orchestrater to a full blown tool to effectively manage the resources in a cloud infrastructure. It ensures fair and optimized usage of compute resources like CPU and memory across the cluster by scheduling pods onto suitable nodes and allocating to pods based on defined resource requests and limits. Through Horizontal Pod Autoscaling (HPA) and Vertical Pod Autoscaling (VPA), Kubernetes dynamically adjusts the number of pod replicas and their resource allocations in response to workload demands. Kubernetes provides monitoring capabilities and its interface can be easily integrated with third party tools as well to gain better visibility into resource consumption at various levels (cluster, node, pod). Kubernetes even allows administrators to set quotas and limits on resource usage for namespaces, ensuring that specific workloads or teams do not consume excessive resources and impact others within the cluster.

😇No wonder why Kubernetes is the most popular and growing open source platform of last decade!

Kubernetes: Architecture & Resource Management

Master Node/ Control Plane:

Worker Node/ Data Plane:

Resource Management:

Written by Reeshabh Choudhary

No responses yet