Seamless scaling with VPA In-place Pod Resize on GKE
Learn how VPA In-place Pod Resize can help seamlessly vertically scale workloads on Google Kubernetes Engine (GKE).

Originally published on DEV Community by Olivier Bourgeois. Read on the original site
title: Seamless scaling with VPA In-place Pod Resize on GKE published: true description: Learn how VPA In-place Pod Resize can help seamlessly vertically scale workloads on Google Kubernetes Engine (GKE). tags: kubernetes, ai, gke, googlecloud cover_image: https://dev-to-uploads.s3.amazonaws.com/uploads/articles/uqzknnjyuuueceq6xotm.png
Use a ratio of 100:42 for best results.
published_at: 2026-05-20 20:24 +0000
Right-sizing Kubernetes workloads is a common platform engineering challenge. Set your requests too high, and you burn cloud budgets on idle capacity; set your limits too low, and your applications face throttling or dreaded OOMKills.
For years, the Vertical Pod Autoscaler (VPA) has been the standard answer to this problem, automatically adjusting CPU and memory requirements based on actual usage. However, this method of scaling came with a significant catch that prevented widespread adoption for critical workloads: applying new resource parameters required evicting and restarting the pod.
This disruption was often unacceptable for stateful applications, long-running connections, or latency-sensitive services.
Introducing In-place Pod Resize (IPPR) on GKE
In-place Pod Resize (IPPR) changes the game by allowing Kubernetes to modify resource requests and limits on live, running containers directly through the underlying container runtime, without triggering a restart.
By combining the intelligence of VPA with the non-disruptive nature of IPPR, GKE users finally have a viable path to dynamic, seamless, and automated right-sizing.
Note: As of writing, VPA IPPR is in Preview on GKE. While it is a massive step forward, I recommend evaluating it in staging environments before rolling it out to production workloads.
Getting started with IPPR
To use In-place Pod Resize, you need a GKE cluster running version 1.34.0-gke.2201000 or later.
- GKE Autopilot: VPA is enabled by default.
- GKE Standard: Requires the Vertical Pod Autoscaling feature to be enabled.
1. Enable the feature
If you aren't using Autopilot, ensure your cluster is created or updated with the necessary feature flags:
gcloud container clusters create CLUSTER_NAME \
--project=PROJECT_ID \
--location=us-east1 \
--release-channel=rapid \
--enable-vertical-pod-autoscaling
2. Define your VPA object
Create a VerticalPodAutoscaler resource targeting your Deployment or StatefulSet. The crucial element here is setting spec.updatePolicy.updateMode to InPlaceOrRecreate.
apiVersion: "autoscaling.k8s.io/v1"
kind: "VerticalPodAutoscaler"
metadata:
name: "my-vpa"
spec:
targetRef:
apiVersion: "apps/v1"
kind: "Deployment"
name: "my-deployment"
updatePolicy:
updateMode: "InPlaceOrRecreate"
3. Watch it scale
Apply the resource to your cluster and monitor your application under load. Instead of watching Pods terminate and recreate, you can watch the resources modify live using kubectl describe.
kubectl describe pod POD_NAME
Look for the AllocatedResources field or check the events section. You will see the requests change in real-time to match the VPA recommendations, while the Restart Count remains exactly the same.
The "Or Recreate" Fallback: Keep in mind that physics still apply. If VPA recommends a resource size that exceeds the remaining capacity of the Node your Pod is currently running on, an in-place resize is impossible. In this scenario, VPA will fall back to evicting and recreating the Pod so it can be scheduled onto a larger or emptier Node.
Ready to dive deeper?
While this introduction covers the basics of IPPR, right-sizing is just one part of a robust scaling strategy. Implementing VPA often goes hand-in-hand with horizontal scaling and cluster autoscaling. Check out the guide to master scaling on GKE: Run full-stack workloads at scale on GKE.
Originally published on DEV Community by Olivier Bourgeois. Read on the original site
You might also like

Mastra vs LangChain: Building an AI Agent Pipeline and Analyzing the Data
A week ago, I saw this tweet: I had just shipped SupportMesh, a multi-tenant AI support platform built on Mastra, so I had opinions from production. I liked the .dowhile() loop, the typed step schem

How Large-Scale Platforms Handle Millions of Daily Transactions
Every day, millions of people order food, stream videos, send messages, book rides, make payments, and shop online. Most of these actions take only a few seconds from the user's perspective. A user cl

The Saga Pattern in Node.js: How to Roll Back Distributed Transactions Across Microservices
Building reliable workflows across multiple microservices is challenging. In a monolith, a database transaction can ensure that multiple operations either succeed or fail together. But once data is sp