Kubernetes Blog

Tuesday, April 16, 2019

Pod Priority and Preemption in Kubernetes

Author: Bobby Salamat

Kubernetes is well-known for running scalable workloads. It scales your workloads based on their resource usage. When a workload is scaled up, more instances of the application get created. When the application is critical for your product, you want to make sure that these new instances are scheduled even when your cluster is under resource pressure. One obvious solution to this problem is to over-provision your cluster resources to have some amount of slack resources available for scale-up situations. This approach often works, but costs more as you would have to pay for the resources that are idle most of the time.

Pod priority and preemption is a scheduler feature made generally available in Kubernetes 1.14 that allows you to achieve high levels of scheduling confidence for your critical workloads without overprovisioning your clusters. It also provides a way to improve resource utilization in your clusters without sacrificing the reliability of your essential workloads.

Guaranteed scheduling with controlled cost

Kubernetes Cluster Autoscaler is an excellent tool in the ecosystem which adds more nodes to your cluster when your applications need them. However, cluster autoscaler has some limitations and may not work for all users:

It does not work in physical clusters.
Adding more nodes to the cluster costs more.
Adding nodes is not instantaneous and could take minutes before those nodes become available for scheduling.

An alternative is Pod Priority and Preemption. In this approach, you combine multiple workloads in a single cluster. For example, you may run your CI/CD pipeline, ML workloads, and your critical service in the same cluster. When multiple workloads run in the same cluster, the size of your cluster is larger than a cluster that you would use to run only your critical service. If you give your critical service the highest priority and your CI/CD and ML workloads lower priority, when your service needs more computing resources, the scheduler preempts (evicts) enough pods of your lower priority workloads, e.g., ML workload, to allow all your higher priority pods to schedule.

With pod priority and preemption you can set a maximum size for your cluster in the Autoscaler configuration to ensure your costs get controlled without sacrificing availability of your service. Moreover, preemption is much faster than adding new nodes to the cluster. Within seconds your high priority pods are scheduled, which is critical for latency sensitive services.

Improve cluster resource utilization

Cluster operators who run critical services learn over time a rough estimate of the number of nodes that they need in their clusters to achieve high service availability. The estimate is usually conservative. Such estimates take bursts of traffic into account to find the number of required nodes. Cluster autoscaler can be configured never to reduce the size of the cluster below this level. The only problem is that such estimates are often conservative and cluster resources may remain underutilized most of the time. Pod priority and preemption allows you to improve resource utilization significantly by running a non-critical workload in the cluster.

The non-critical workload may have many more pods that can fit in the cluster. If you give a negative priority to your non-critical workload, Cluster Autoscaler does not add more nodes to your cluster when the non-critical pods are pending. Therefore, you won’t incur higher expenses. When your critical workload requires more computing resources, the scheduler preempts non-critical pods and schedules critical ones.

The non-critical pods fill the “holes” in your cluster resources which improves resource utilization without raising your costs.

Get Involved

If you have feedback for this feature or are interested in getting involved with the design and development, join the Scheduling Special Interest Group.

Creating a Raspberry Pi cluster running Kubernetes, the installation (Part 2) Dec 22
Managing Kubernetes Pods, Services and Replication Controllers with Puppet Dec 17
How Weave built a multi-deployment solution for Scope using Kubernetes Dec 12
Creating a Raspberry Pi cluster running Kubernetes, the shopping list (Part 1) Nov 25
Monitoring Kubernetes with Sysdig Nov 19
One million requests per second: Dependable and dynamic distributed systems at scale Nov 11
Kubernetes 1.1 Performance upgrades, improved tooling and a growing community Nov 9
Kubernetes as Foundation for Cloud Native PaaS Nov 3
Some things you didn’t know about kubectl Oct 28
Kubernetes Performance Measurements and Roadmap Sep 10
Using Kubernetes Namespaces to Manage Environments Aug 28
Weekly Kubernetes Community Hangout Notes - July 31 2015 Aug 4
The Growing Kubernetes Ecosystem Jul 24
Weekly Kubernetes Community Hangout Notes - July 17 2015 Jul 23
Strong, Simple SSL for Kubernetes Services Jul 14
Weekly Kubernetes Community Hangout Notes - July 10 2015 Jul 13
Announcing the First Kubernetes Enterprise Training Course Jul 8
Kubernetes 1.0 Launch Event at OSCON Jul 2
How did the Quake demo from DockerCon Work? Jul 2
The Distributed System ToolKit: Patterns for Composite Containers Jun 29
Slides: Cluster Management with Kubernetes, talk given at the University of Edinburgh Jun 26
Cluster Level Logging with Kubernetes Jun 11
Weekly Kubernetes Community Hangout Notes - May 22 2015 Jun 2
Kubernetes on OpenStack May 19
Weekly Kubernetes Community Hangout Notes - May 15 2015 May 18
Docker and Kubernetes and AppC May 18
Kubernetes Release: 0.17.0 May 15
Resource Usage Monitoring in Kubernetes May 12
Weekly Kubernetes Community Hangout Notes - May 1 2015 May 11
Kubernetes Release: 0.16.0 May 11
AppC Support for Kubernetes through RKT May 4
Weekly Kubernetes Community Hangout Notes - April 24 2015 Apr 30
Borg: The Predecessor to Kubernetes Apr 23
Kubernetes and the Mesosphere DCOS Apr 22
Weekly Kubernetes Community Hangout Notes - April 17 2015 Apr 17
Kubernetes Release: 0.15.0 Apr 16
Introducing Kubernetes API Version v1beta3 Apr 16
Weekly Kubernetes Community Hangout Notes - April 10 2015 Apr 11
Faster than a speeding Latte Apr 6
Weekly Kubernetes Community Hangout Notes - April 3 2015 Apr 4
Participate in a Kubernetes User Experience Study Mar 31
Weekly Kubernetes Community Hangout Notes - March 27 2015 Mar 28
Kubernetes Gathering Videos Mar 23
Welcome to the Kubernetes Blog! Mar 20

Kubernetes Blog

Tuesday, April 16, 2019

Pod Priority and Preemption in Kubernetes

Guaranteed scheduling with controlled cost

Improve cluster resource utilization

Get Involved

2019

2018

2017

2016

2015