In recent years, the field of machine learning has made significant progress. Large language and vision models with hundreds of billions of parameters have been released. These models are great at what they do, but the software engineering and operations aspects of running them in real time are less well-known.
As an ML engineer, building a great model is becoming easier, but deploying it remains a major challenge. As part of this series, we will explore Kubernetes as a tool for MLOps.
- It is an open-source container orchestration tool that helps you deploy and manage containerized applications. It provides a way to automate the deployment, scaling, and management of containers, so that you can focus on building great applications.
- With the software industry moving from monolithic to microservice-based architecture, every feature is hosted as a separate service running in separate containers independently.
- Managing hundreds and thousands of services with thousands of containers, led to chaos. Thus, born the Kubernetes tool to overcome this challenge.
Machine learning systems are evolving with the need to be highly performant with high availability and low latency. Consider volatile systems like recommender systems providing relevant ads after a product search or optimizing the price of a ride in Uber, based on demands during peak hours, and many such ML systems require tools that are,
- High availability results in zero downtime.
- Scalability: resulting in high performance
- Disaster recovery: resulting in backup and restore.
As part I of this series, we’ll keep the blog quite simple, talking about only Nodes. But before we start talking about Nodes, the installation of Kubernetes results in the Kubernetes cluster, which must have at least one node.
Nodes refer to instances, either physical machines or remote instances. In a Kubernetes cluster, there are two types of nodes, broadly known as Master and Worker nodes, that run containerized applications. Master nodes direct the worker nodes to run the containerized applications inside a Pod. A pod is an abstraction over containers in Kubernetes and has its own computing, storage, and network resources.
Usually, a Kubernetes cluster consists of one master node and multiple worker nodes. The master node contains the Control Plane, which helps in managing the worker node and cluster state.
The master node is responsible for many processes that help run and manage the Kubernetes cluster properly. Some of the key components are as follows:
- It is a container, that helps in application deployment in a Kubernetes cluster.
- It acts as a cluster gateway, and it gets the initial request for any updates in the cluster.
- It acts as a channel to facilitate communication between different Kubernetes clients and the cluster, like the Kubernetes UI.
- It helps in scheduling Pods in a node.
- API Server → Scheduler → It only decides where the pod should be placed in the nodes.
- It is also considered smart as a service, because, when a request for pod assignment comes in, the scheduler looks into node utilization in terms of CPU and memory, and creates pods on nodes that have more resources.
- It detects or tracks if a pod dies in a node and helps in rescheduling the pod.
- It detects state changes in the cluster.
- The flow of Pod scheduling
CM → Scheduler → In which nodes, the pods should be assigned? → Kubelet → restarts the pod.
- It is a key-value store for the cluster.
- It stores details of cluster changes like terminated pods, scheduled pods, etc.
- The scheduler and Controller Manager all use the data stored in ETCD.
- It takes snapshots of the cluster state, thus helping in backup and restoring the cluster state.
- It answers all the below questions:
How would the scheduler know, where the resources are available?
How the cluster health status is tracked?
How CM knows, that the cluster state has changed like Kubelet has restarted the pod?
It does the actual work in the Kubernetes cluster, meaning, it runs the containers and the applications in them. Three processes that must be installed in every worker node are as follows:
- It is a run-time environment for the containers.
- It is responsible for communication between services within the cluster
- It forwards requests from Services to Pods.
- It is sensible while sending the requests, meaning, Consider sending a request from A to B, where A and B are two pods in Node_1 and there is also a replication of the same in Node_2, then kube-proxy sends a request from A in Node_1 to B in Node_1 instead of Node_2. thus reducing the network overhead.
- It schedules pods to the nodes.
- It communicates between containers and nodes.
- It assigns resources to containers like the CPU and Memory.