SkarpSkarp

Chapter 17 of 27

Operating Google Kubernetes Engine and Cloud Run Services

Clusters and serverless services evolve over time; learn how to scale, roll out updates, and diagnose issues in GKE and Cloud Run workloads.

27 min readen

Big Picture: Operating GKE and Cloud Run

From Deployment to Operations

You now shift from just deploying resources to operating them: keeping GKE clusters and Cloud Run services healthy, scalable, and reliable over time.

Exam Context

For the Associate Cloud Engineer exam, GKE and Cloud Run operations appear mainly under Deploying and implementing and Ensuring successful operation of a cloud solution.

GKE vs Cloud Run Roles

GKE gives you control over Kubernetes objects like Deployments and autoscalers, while Cloud Run is fully managed serverless, where you tune config and Google handles scaling.

What You Will Practice

You will learn to scale GKE workloads, tune pod resources, roll out and roll back updates, adjust Cloud Run concurrency and instance counts, and use logs and metrics to troubleshoot.

GKE Workload Basics: Deployments, Pods, and Services

Pods and Ephemerality

A Pod is the smallest deployable unit in Kubernetes. Pods are ephemeral; controllers create and destroy them to match the desired state.

ReplicaSets and Deployments

A ReplicaSet keeps a set number of pod replicas running. A Deployment manages ReplicaSets and provides rolling updates and rollbacks.

Services for Stable Access

A Service gives a stable IP and DNS name, load balancing traffic across pods selected by labels. Type `LoadBalancer` integrates with a Google Cloud load balancer.

Key kubectl Commands

Use `kubectl get` for overviews, `kubectl describe` for detailed config and events, and `kubectl logs deployment/myapp` to see logs from pods managed by a Deployment.

Scaling GKE Workloads and Tuning Resources

Manual Horizontal Scaling

Scale a Deployment with `kubectl scale deployment webapp --replicas=5` or by setting `spec.replicas` in YAML and applying it.

Requests vs Limits

`requests` reserve minimum CPU/memory for scheduling; `limits` cap how much a container can use. Both are set per container.

Sample Resource Block

Example: `requests: cpu 250m, memory 256Mi` and `limits: cpu 500m, memory 512Mi` help balance performance and cost.

Common Pitfalls

Too‑high requests can leave pods Pending; too‑low requests can cause contention. Fix by tuning resources or scaling node pools, not only replicas.

Autoscaling in GKE: HPA and Cluster Autoscaler

Why Autoscaling?

Autoscaling reacts to real‑time load. In GKE you typically use Horizontal Pod Autoscaler (HPA) plus cluster autoscaler.

HPA Basics

HPA changes pod replicas based on metrics like CPU. Example: `kubectl autoscale deployment webapp --cpu-percent=70 --min=2 --max=10`.

Cluster Autoscaler Role

Cluster autoscaler adjusts node counts in a node pool when pods are Pending due to insufficient resources, within configured min and max.

How They Work Together

HPA scales pods for load; cluster autoscaler adds nodes if pods cannot be scheduled. Exam tip: pods vs nodes is a common distinction.

Rolling Updates and Rollbacks in GKE

Goal: Zero‑Downtime Updates

Kubernetes Deployments support rolling updates, gradually swapping old pods for new ones to avoid downtime.

Updating the Image

Change the container image in YAML and `kubectl apply`, or run `kubectl set image deployment/webapp webapp=gcr.io/PROJECT/webapp:v2`.

Monitoring Rollouts

Use `kubectl rollout status deployment/webapp` to track progress, and `kubectl get rs` to see old and new ReplicaSets.

Rolling Back

If a release is bad, run `kubectl rollout undo deployment/webapp`. Avoid deleting the Deployment; that loses rollout history.

Cloud Run Operations: Revisions, Scaling, and Concurrency

Services and Revisions

In Cloud Run, a service is the stable URL and IAM policy. Every config change creates a new revision, an immutable snapshot.

Traffic and Revisions

You can send 100% of traffic to one revision or split it, such as 90% to v1 and 10% to v2, for gradual rollouts.

Scaling by Concurrency

Cloud Run scales instances based on request volume and concurrency, the number of simultaneous requests per instance.

CPU and Memory Choices

You configure CPU and memory per revision. Cloud Run offers presets and options for CPU allocation during and sometimes between requests.

Configuring Cloud Run: Concurrency, CPU, Memory, and Instances

This step shows concrete commands and YAML‑style configuration snippets to adjust Cloud Run operational settings.

You can configure Cloud Run either via the console or the `gcloud` CLI. For the exam, you should recognize and interpret these flags.

Deploy with specific resources and concurrency

```bash

Deploy a Cloud Run service with tuned settings

SERVICE=webapi

REGION=us-central1

IMAGE=gcr.io/PROJECT_ID/webapi:v2

gcloud run deploy $SERVICE \

--image=$IMAGE \

--region=$REGION \

--platform=managed \

--memory=512Mi \

--cpu=1 \

--concurrency=20 \

--min-instances=1 \

--max-instances=10 \

--allow-unauthenticated

```

Key flags:

  • `--memory` and `--cpu`: per instance limits.
  • `--concurrency`: simultaneous requests per instance.
  • `--min-instances`: keep at least this many instances warm.
  • `--max-instances`: cap autoscaling to protect backends and control costs.

Update configuration without changing the image

```bash

Increase max instances and adjust concurrency

gcloud run services update $SERVICE \

--region=$REGION \

--concurrency=10 \

--max-instances=20

```

This creates a new revision with the same image but new config. Existing traffic routing is preserved unless you explicitly change it.

Traffic Splitting and Rollbacks in Cloud Run

Reading Traffic Splits

If `REVISION-001: 80%` and `REVISION-002: 20%`, then 80% of requests hit the old version and 20% hit the new one.

Setting Traffic via gcloud

Use `gcloud run services update-traffic webapi --to-revisions webapi-00001=100` to send all traffic to a specific revision.

Rollback Strategy

To roll back, shift traffic back to a known-good revision instead of deleting the bad one. You can do this instantly or gradually.

GKE vs Cloud Run Rollbacks

GKE uses `kubectl rollout undo` on Deployments; Cloud Run uses traffic splitting between revisions for rollbacks.

Troubleshooting with Logs and Metrics (GKE and Cloud Run)

Cloud Logging and Monitoring

GKE and Cloud Run integrate with Cloud Logging for logs and Cloud Monitoring for metrics, dashboards, and alerts.

GKE Debugging

Use `kubectl logs` and `kubectl describe pod` for quick checks, but rely on Cloud Logging for historical and aggregated analysis.

Cloud Run Observability

Cloud Run automatically logs each request with latency and status; your app logs to stdout/stderr appear as structured logs.

Typical Issues and Signals

CrashLoopBackOff or OOMKilled suggest resource issues; high first-request latency suggests tuning Cloud Run `min-instances`.

Thought Exercise: Choosing Scaling and Update Strategies

Work through this scenario mentally to connect concepts.

Scenario:

You run a web API on both GKE and Cloud Run for different teams.

  • The GKE API is experiencing CPU spikes at peak hours, and some pods are `Pending`.
  • The Cloud Run API has occasional high latency for the first request after long idle times.
  • You plan to release a risky new version for both.

Questions to think through:

  1. GKE CPU spikes and Pending pods
  • Which features would you enable or adjust: HPA, cluster autoscaler, resource requests/limits, or just replica counts?
  • How would you confirm your changes are working (which commands or metrics)?
  1. Cloud Run cold starts and latency
  • Which settings would you tune: concurrency, min instances, max instances, CPU/memory?
  • How would you check whether cold starts are the cause in Cloud Logging/Monitoring?
  1. Risky new version rollout
  • On GKE, how would you deploy and be ready to roll back quickly?
  • On Cloud Run, how could you expose only a small percentage of traffic to the new revision?

Pause and outline your answers before moving on. This mirrors the reasoning style used in Associate Cloud Engineer scenarios.

Quiz 1: GKE Operations

Answer this question to check your understanding of GKE scaling and rollouts.

Your GKE Deployment 'api' is configured with an HPA (min 2, max 10, target CPU 70%). During a traffic spike, the HPA increases replicas to 10, but several pods stay in Pending due to lack of node resources. What is the MOST appropriate action?

  1. Increase the HPA max replicas to 20 so Kubernetes can schedule more pods.
  2. Enable or tune the cluster autoscaler on the node pool so it can add nodes when pods are Pending.
  3. Manually delete the Pending pods so the HPA will recreate them on existing nodes.
  4. Disable the HPA and scale the Deployment back down to 2 replicas.
Show Answer

Answer: B) Enable or tune the cluster autoscaler on the node pool so it can add nodes when pods are Pending.

Pending pods indicate insufficient node capacity, not a lack of replicas. The correct operational fix is to enable or tune the cluster autoscaler on the node pool so it can add nodes and schedule the HPA-created pods. Increasing the HPA max alone does not help; deleting pods just recreates the same problem; scaling down ignores the traffic spike.

Quiz 2: Cloud Run Scaling and Rollbacks

Answer this question to reinforce Cloud Run operational concepts.

You deploy a new revision of a Cloud Run service and immediately notice increased error rates. You want to minimize impact while you investigate, without losing the ability to quickly send traffic back to this revision later. What should you do?

  1. Delete the new revision so all traffic automatically goes to the previous revision.
  2. Use 'gcloud run services update-traffic' to shift 100% of traffic back to the previous revision.
  3. Lower the concurrency value on the new revision so it handles fewer simultaneous requests.
  4. Increase the max instances on the new revision so it can scale out further.
Show Answer

Answer: B) Use 'gcloud run services update-traffic' to shift 100% of traffic back to the previous revision.

The safest rollback is to change traffic splitting so 100% of traffic goes to the known-good previous revision using 'gcloud run services update-traffic'. Deleting the new revision removes it entirely. Changing concurrency or max instances does not fix functional errors and continues to send traffic to the broken revision.

Key Term Review: GKE and Cloud Run Operations

Flip through these cards to reinforce core operational terms and behaviors.

Deployment (GKE)
A Kubernetes controller that manages ReplicaSets and provides declarative updates to pods, including rolling updates and rollbacks. It is the main object you scale and update for stateless workloads.
Horizontal Pod Autoscaler (HPA)
A Kubernetes feature that automatically adjusts the number of pod replicas in a scalable resource (such as a Deployment) based on observed metrics like CPU utilization.
Cluster autoscaler (GKE)
A feature that automatically adjusts the number of nodes in a node pool based on pods that cannot be scheduled due to insufficient resources, within configured min and max node counts.
Cloud Run revision
An immutable snapshot of a Cloud Run service's code and configuration (image, env vars, CPU/memory, concurrency, etc.). Each deployment or config change creates a new revision, and traffic can be routed to one or more revisions.
Cloud Run concurrency
The number of simultaneous requests a single Cloud Run container instance can handle. Lower values can reduce latency but may increase the number of instances and cost.
Cloud Run min-instances
A setting that keeps at least a specified number of Cloud Run container instances warm, reducing cold-start latency at the cost of running resources even when idle.
CrashLoopBackOff
A Kubernetes pod state indicating that a container is repeatedly crashing and being restarted. Often caused by application bugs, misconfiguration, or insufficient resources (for example, OOMKilled).
OOMKilled
A container termination reason indicating it was killed by the system for exceeding its memory limit. The fix is usually to optimize memory usage or increase the pod's memory limit.
Traffic splitting (Cloud Run)
The ability to distribute incoming requests across multiple revisions of a Cloud Run service by percentage, enabling canary releases, gradual rollouts, and quick rollbacks.
Service (GKE)
A Kubernetes resource that provides a stable virtual IP and DNS name and load balances traffic to a set of pods selected by labels. Types include ClusterIP, NodePort, and LoadBalancer.

Key Terms

OOMKilled
A container termination reason indicating it was killed for exceeding its memory limit.
Rolling update
A deployment strategy that gradually replaces instances of the previous version of an application with instances of the new version to minimize downtime.
CrashLoopBackOff
A Kubernetes pod state indicating a container is repeatedly crashing and being restarted.
Deployment (GKE)
A Kubernetes controller that manages ReplicaSets and provides declarative updates to pods, including rolling updates and rollbacks.
Cloud Run service
The top-level resource in Cloud Run that has a stable URL and IAM policy and points to one or more revisions.
Cloud Run revision
An immutable snapshot of a Cloud Run service's code and configuration created on each deployment or config change.
Concurrency (Cloud Run)
The number of simultaneous requests a single Cloud Run container instance can handle.
Cluster autoscaler (GKE)
A feature that automatically adjusts the number of nodes in a node pool based on unschedulable pods due to lack of resources.
Traffic splitting (Cloud Run)
The mechanism for routing percentages of traffic to different revisions of a Cloud Run service.
Horizontal Pod Autoscaler (HPA)
A Kubernetes feature that automatically adjusts the number of pod replicas in a scalable resource based on observed metrics like CPU utilization.

Finished reading?

Test your understanding with a custom practice exam on this chapter.

Test yourself