From HPA to KEDA: Elevating Your Kubernetes Autoscaling

As applications evolve into more dynamic and complex systems, the need for intelligent scaling solutions becomes critical. Kubernetes provides robust orchestration, but scaling based on traditional metrics alone (CPU and memory) can fall short in real-time event-driven environments. Enter KEDA (Kubernetes Event-Driven Autoscaling), a groundbreaking framework that transforms how we think about scaling in Kubernetes.

The Need for Intelligent Autoscaling

Imagine a Live Sports Streaming Platform: During major sporting events, such as the World Cup, user engagement spikes dramatically as fans flock to watch live broadcasts. If the platform fails to adapt, viewers may experience buffering or dropped connections, leading to frustration and churn.

KEDA Solution: By monitoring real-time metrics like active viewer counts and stream quality, KEDA dynamically scales the backend services. As traffic surges, it increases the number of streaming servers to maintain a seamless viewing experience. Once the event concludes and traffic subsides, KEDA gracefully scales back, ensuring resource efficiency without compromising user satisfaction.

What Distinguishes KEDA?

KEDA’s innovative architecture, driven by Custom Resource Definitions (CRDs) empowers users to define scaling behaviors based on real-time metrics. This transcends traditional CPU-centric approaches, fostering a truly responsive ecosystem. Here are some key features which makes KEDA a standout solution:

KEDA Key features.png

Key Terminologies to Note:

ScaledObject: A fundamental component of KEDA that defines dynamic scaling behaviors for deployments based on external triggers, allowing customization to meet specific application needs.

ScaledJob: Designed for batch workloads, this feature efficiently manages scaling for event-driven jobs, optimizing resource usage for scheduled tasks.

Scalers: These components monitor external metrics and systems (such as message queues and databases) to make real-time, informed scaling decisions, ensuring application responsiveness and efficiency.

KEDA Operator: The central orchestration component of KEDA that manages the lifecycle of Custom Resource Definitions (CRDs) and scaling logic, ensuring seamless operation.

How KEDA Works: The Scaling Algorithm

KEDA employs a scaling algorithm akin to the Horizontal Pod Autoscaler (HPA), with the default formula being:

desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]

This formula adjusts the number of replicas based on the ratio of current to desired metric values, facilitating responsive scaling.

Advanced Custom Scaling Features in KEDA

KEDA enhances this scaling mechanism with several advanced features that allow for greater customization.

ScalingModifiers:

This additional setting enables you to modify the behavior of scaling actions. For example, you can implement delays before scaling up or down, or set thresholds that limit how much to scale in one action. This helps fine-tune scaling to respond more appropriately to fluctuations in demand.

ScalingStrategy:

This feature defines the overall approach KEDA uses to scale your application. You can choose between aggressive scaling, which quickly adds instances to meet demand, or more conservative strategies that prioritize stability and minimize fluctuations. The selected strategy influences both the frequency of scaling actions and the specific thresholds at which scaling triggers occur.

MultipleScalersCalculation:

When leveraging multiple scalers, this setting determines how metrics from different scalers are aggregated to make scaling decisions. Options include taking the maximum, minimum, or average of the metrics from all active scalers. This flexibility allows you to tailor scaling behavior based on the diverse needs of your application and the interaction of different workloads.

Real-World Use Cases: KEDA in Action

1. Cron Job Scaler:

Automate the scaling down of development and QA environments during weekends and off-hours. KEDA can be configured to use the Cron Scaler, triggering scaling events based on a cron schedule. This ensures efficient resource utilization and cost savings when demand is low.

2. Kafka Consumer Scaling:

In applications relying on Apache Kafka, KEDA employs the Kafka Scaler to adjust consumer applications based on the size of message queues. When the queue length increases, KEDA can automatically scale up the number of consumers to handle the load, then scale down as it decreases, maintaining performance without overprovisioning.

3. HTTP Traffic Management:

For web applications with variable traffic patterns, KEDA uses the Prometheus Scaler to scale HTTP servers based on custom metrics like request rates or response times. This ensures that the application can efficiently manage traffic spikes, enhancing user experience during peak hours.

Quick Start Guide: Installing KEDA

Prerequisites: A running Kubernetes cluster and Helm installed.

Step1: Add KEDA Helm Repository

helm repo add kedacore https://kedacore.github.io/charts && helm repo update

Step2: Install KEDA

helm install keda kedacore/keda --namespace keda --create-namespace

Defining Scaling Behaviors with CRDs

KEDA introduces two primary CRDs: ScaledObject and ScaledJob. These allow developers to specify how scaling should occur based on external metrics.

Example: ScaledObject for Multi-Trigger Scaling

Here’s an example of a ScaledObject that utilizes multiple triggers:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: video-stream-processor
  namespace: video-stream
spec:
  scaleTargetRef:
    name: video-processor-deployment
  minReplicaCount: 1
  maxReplicaCount: 20
  pollingInterval: 30 
  cooldownPeriod: 60  
  triggers:
    - type: azure-blob
      metadata:
        blobContainerName: video-uploads 
        accountName: storage-account
        connectionFromEnv: AZURE_STORAGE_CONNECTION
        blobCount: "5"
        blobPrefix: "uploads/"
        cloud: AzurePublicCloud

    - type: rabbitmq
      metadata:
        host: amqp://rabbitmq-host:5672/vhost
        protocol: auto
        mode: QueueLength 
        value: "100" 
        activationValue: "10" 
        queueName: processing-queue 
        hostFromEnv: RABBITMQ_HOST 
        unsafeSsl: true

NOTE: This ScaledObject example dynamically scales the video-processor-deployment based on two triggers: Azure Blob Storage and RabbitMQ. It monitors the number of blobs in a specified container and the length of a RabbitMQ queue, allowing for responsive scaling between 1 and 20 replicas. This multi-trigger setup ensures that the video processing application can efficiently handle varying workloads, maintaining performance during peak usage and optimizing resource utilization.

Conclusion

KEDA represents a significant leap forward in Kubernetes autoscaling, especially for event-driven architectures. By enabling real-time scaling based on diverse external metrics, KEDA allows applications to dynamically respond to demand fluctuations, optimizing resource usage and enhancing user experience.

As organizations continue to embrace microservices and serverless architectures, KEDA provides the tools necessary to build responsive, efficient, and cost-effective applications in a rapidly changing landscape. Whether you’re managing message queues, HTTP traffic, or batch processing, KEDA empowers you to navigate the complexities of modern application scaling.

Author

Janani S
Software Engineer - Level 1