Innovation in Scalability: Event-Driven Autoscaling in Kubernetes

Learn more about Kubernetes Event-Driven Autoscaling for enhanced scalability in microservices and optimal resource utilisation.

Innovation in Scalability: Event-Driven Autoscaling in Kubernetes

Written by Mircea Talu (Junior DevOps Engineer), in the January 2024 issue of Today Software Magazine. Read the article in Romanian here.

Kubernetes Event-Driven Autoscaling (KEDA) is an open-source tool that enables automatic scaling of event-driven workloads in Kubernetes. But what does this mean for us as application developers or administrators? Let's have a look!

What do We Mean by Scalability?

Scalability is one of the most relevant attributes of an application today. The ability to scale up or down instances - depending on the traffic we have - is necessary for efficient use of computational resources, but also to ensure application availability.

The importance of efficient resource usage lies in financial as well as environmental considerations, with availability being crucial for a good user experience.

How Can We Achieve Scalability?

Firstly, to scale a workload efficiently, it needs to be broken down into the smallest possible chunks. For example, if we have many users on the front-end and want to scale, we cannot do so if we have a monolithic structure (front-end and back-end in the same container).

What Is a Container?

A container is a standard unit of software that packages code and its dependencies so that the application can run quickly and reliably.

One of the great advantages of containers is that they virtualise the operating system and create an isolated space for the application. In this way, we can run containers on any machine, regardless of the operating system, its version, and the runtime environment.

As a result, it is much easier to launch new instances of an application if it is containerised.

How Do We Manage a Fleet of Containers?

With the growing popularity of containers, software engineers have faced a new problem: how can container fleets be managed efficiently and automatically? On 9 September 2014, Google produced a revolutionary solution: Kubernetes.

Kubernetes is an open-source container orchestration system for automating software deployment, scaling, and management. In short, let’s understand the core components of this system.

What is a Pod?

APod is the smallest unit that can be created and managed in Kubernetes, and is a group of one or more containers, with shared storage and networking, and a specification that explains how the containers should be run.

We can think of a Pod as a wrapper for containers, which makes them compatible with the Kubernetes cluster.

For that reason, the Pod is the unit we want to scale according to the workload it must process.

What Does a HorizontalPodAutoscaler Do?

In Kubernetes, a HorizontalPodAutoscaler automatically updates a workload resource (such as a Deployment or StatefulSet) in order to automatically scale the number of Pods to match demand. This is a native resource in Kubernetes and can be used directly without the need for an extension.

Horizontal scaling means that the response to increased load is to raise more Pods. This is different from vertical scaling, which for Kubernetes would mean allocating more resources - for example: memory or CPU - to Pods that run already.

What are the Limitations of This HorizontalPodAutoscaler?

Despite its importance, the main limitation of the HorizontalPodAutoscaler is that it only scales Pods based on how much CPU and memory they use. For example, if the average CPU usage for our set of Pods is higher than the desired value, we'll create more Pods to distribute the workload. If the average is lower, we’ll remove several Pods to use resources efficiently.

This way of scaling is useful, but it may not cover all use cases exactly as we would like.

Since the HorizontalPodAutoscaler is native to Kubernetes, it only looks inside the cluster. As a result, it can't solve our problem of scaling based on external stimuli.

What is Kubernetes Event-Driven Autoscaling?

Kubernetes Event-Driven Autoscaling (KEDA) is a lightweight component that can be added to any Kubernetes cluster. KEDA works alongside standard Kubernetes components such as the HorizontalPodAutoscaler and can extend its functionality.

As the name implies, KEDA allows us to scale based on events. To understand its necessity, we'll consider a case inspired by everyday life.

We are the manager of a supermarket, and we have to decide when to open/close cashier checkouts. Initially, we have only one checkout open because the traffic is not high in the morning. But as we move towards peak hours, the supermarket has more and more shoppers. Based on which metric will we decide if we need to open a new checkout?

If we were to go by the HorizontalPodAutoscaler, we would open a new checkout when our resource usage (cashier patience) would reach its limit. This does not seem like an optimal solution in this case.

We would rather choose to open a new checkout when the number of shoppers waiting in line increases above a certain number (10, for example). Thus, we scale according to our ability to process shoppers, which is what we actually aim to do.

If the number of shoppers waiting in line increases above 10, we obviously don't have the required number of cashiers to process.

Returning from this thought exercise, we will try to translate the problem into a software application. We replace shoppers with messages to be processed, cashiers with our containers, and the waiting line with a queue (Kafka, RabbitMQ, etc.).

-1- Supermarket - Application Analogy.png

To succeed in getting the same scaling, based on messages (events), we need to find a way to extend that HorizontalPodAutoscaler. For this, we will use KEDA.

How Do We Integrate KEDA?

The first step we'll take is to install KEDA in our Kubernetes cluster. This installation defines a (CustomResource) of type ScaledObject for us. We will connect this resource to our Deployment of Pods, and it will define the scaling process via a configuration.

Installation is not a difficult process - we can do it with a Helm chart.

Once we have KEDA installed in the cluster, we can use the resources it gives us. In the example below, we will use a ScaledObject that we will connect to our containerised application called function. This application processes messages from a queue (Azure Service Bus Queue) and displays them on the screen.

Below, we have the configuration of the ScaledObject resource.

TSM Jan. 2024 issue - Code 1.png

As we can see, we are dealing with a few parameters that need to be set:

scaleTargetRef is used to identify the resource we want to scale. In our case, this is obtained by "function.fullname" and it represents the deployment of our function.

pollingInterval is the time interval, in seconds, between 2 checks made by KEDA. These checks are meant to find out the queue length. We need to know the length of the queue to decide if and how to perform the scaling.

cooldownPeriod is the period KEDA waits to enter the idle state, and idleReplicaCount is the number of instances (Pods) desired when idle.

The idle state occurs when our queue is empty for that specified period (30 seconds in our case). In the image idleReplicaCount is set to 0, which means that we will stop all instances of our application if we are in the idle state.

This procedure is called scaling to zero and is a desirable attribute for any application.

Traditionally, if we don't use KEDA, the HorizontalPodAutoscaler would need at least one running instance to scale because of the way it was built. But with KEDA, we can stop this instance and then only start it when we have traffic again, as if nothing happened.

minReplicaCount and maxReplicaCount are 2 suggestive parameters, they set the minimum (1) and maximum (10) instances we can have when not idle.

Excellent! We managed to set up scaling with just 5 numbers and a word!

However, what remains for us to do is to tell KEDA which queue to listen to and when to trigger scaling.

TSM Jan. 2024 issue - Code 2.png

For this, we will use the triggers section. Here we specify the queue that KEDA (an Azure Service Bus Queue) listens to and how we connect to it (connectionFromEnv), which is actually a connection string.

The last important parameter is messageCount, the number of messages we'll take action on. If we have more than 5 messages gathered in the queue, we will consider the processing speed not sufficient and scale horizontally, adding Pods.

As we can see, KEDA is vendor-agnostic, supporting a wide range of vendors and event triggers, Azure Service Bus Queue being just one of them.

How Does Scaling Look Like in Cluster?

The only remaining step is testing whether scaling is working as expected. To test, we'll send an enormous number of messages (10000) to the Azure portal queue at once.

After a few seconds, surprised by the latest events, KEDA notices the significant change that has taken place in queue length and starts automatically and quickly, pulling up an entire fleet of Pods for processing.

In the below picture, we have increased the maximum number of Pods to 50 to make it easier to see the "cherry on top".

-2- Visualising Results in K9s.PNG

After all messages are processed and the queue becomes empty, KEDA will stop the Pods and we will not be left with any, since we will be in the idle state.

How Do We End This Chapter, and What Does Event-Driven Autoscaling Mean For the Future?

In an era of microservices, where scalability is a key issue and asynchronous communication via queues, topics, or other message delivery modes is common in most software applications, KEDA gives us a simple and scalable solution.

KEDA thus facilitates asynchronous communication, the best way to communicate between microservices. This mode of communication does not require microservices to wait on each other as in a Request-Reply model.

On the other hand, efficient use of resources has both a significant financial and environmental impact.

Finally, we can't forget the biggest advantage KEDA offers us: the unparalleled satisfaction of watching in real-time as dozens or even hundreds of Pods instantly start up and devour our entire queue full of messages.