Advice on scaling strategy: 1 pod per transaction, each lasting 10–30s (with headroom) #6857

emanuelef · 2025-06-22T19:02:37Z

emanuelef
Jun 22, 2025

Hi everyone,

I'm working on a use case where I want to autoscale a service with KEDA based on transactions per second (TPS), but with a twist:
Each transaction is handled by one pod and takes between 10 to 30 seconds to complete (median ~15s).
I want to maintain 1 pod per concurrent transaction, ideally with a small headroom (e.g., 20–30%) to absorb short bursts.
The transactions arrive via a RabbitMQ queue, so I'm evaluating the RabbitMQ scaler using either:
mode: QueueLength, or mode: MessageRate

My questions:

Is there a way to scale based on message rate or queue length but apply a scaling factor or "headroom" to have some more replicas than the exact number ? I know this value can be a float only when mode is MessageRate. So I cannot have 0.2 for QueueLength.
Is there a recommended approach to smooth the metric to avoid over-scaling or flapping, like using a moving average, custom scaler, or delayed cooldown?

If my goal is:

1 TPS → 5 pods
5 TPS → 25 pods
10 TPS → 50 pods
…

what’s the cleanest way to implement that logic?
I want to avoid running Prometheus if possible, and ideally keep things lightweight and cloud-agnostic (GKE, EKS, AKS).

Thanks for any guidance or best practices you can share!

Answered by rickbrouwer

Jun 23, 2025

Hi,

About the "headroom“. If you want something more than you get back you can use scaling modifiers. With these you can make sure that you add something to the value you get. See also:

https://keda.sh/docs/2.17/concepts/scaling-deployments/#scaling-modifiers

About flapping, you can use scaleUp and/or scaleDown behaviors and include a stabilizationWindowSeconds of for example 300 seconds to prevent rapid fluctuations (flapping) in the number of replicas.

so;

advanced:
    horizontalPodAutoscalerConfig:
      behavior:
        scaleDown:
          stabilizationWindowSeconds: 300

rickbrouwer
Jun 23, 2025
Collaborator

Hi,

About the "headroom“. If you want something more than you get back you can use scaling modifiers. With these you can make sure that you add something to the value you get. See also:

https://keda.sh/docs/2.17/concepts/scaling-deployments/#scaling-modifiers

About flapping, you can use scaleUp and/or scaleDown behaviors and include a stabilizationWindowSeconds of for example 300 seconds to prevent rapid fluctuations (flapping) in the number of replicas.

so;

advanced:
    horizontalPodAutoscalerConfig:
      behavior:
        scaleDown:
          stabilizationWindowSeconds: 300

emanuelef
Jun 23, 2025
Author

Thanks, I'll have a closer look to Scaling modifiers.
In the example I see there are two triggers so I wasn't sure I can use with only one.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Advice on scaling strategy: 1 pod per transaction, each lasting 10–30s (with headroom) #6857

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Advice on scaling strategy: 1 pod per transaction, each lasting 10–30s (with headroom) #6857

Uh oh!

Uh oh!

emanuelef Jun 22, 2025

Replies: 2 comments

Uh oh!

rickbrouwer Jun 23, 2025 Collaborator

Uh oh!

emanuelef Jun 23, 2025 Author

emanuelef
Jun 22, 2025

rickbrouwer
Jun 23, 2025
Collaborator

emanuelef
Jun 23, 2025
Author