Chapter 13 of 27
Deploying App Engine Applications and Choosing Runtimes
App Engine offers a managed platform for web apps; learn how to deploy, configure services, and decide when it fits better than other compute options.
Big Picture: What Is App Engine and When To Use It
What Is App Engine?
App Engine is Google Cloud's PaaS for running web apps without managing servers. You deploy code plus `app.yaml`, and Google handles instances, load balancing, and scaling.
Two Environments
App Engine has two environments: standard (language sandboxes, fast scaling, free tier) and flexible (Docker containers on VMs, more control, higher base cost).
Exam Context
For the Associate Cloud Engineer exam, you must know when to choose App Engine versus Compute Engine, Google Kubernetes Engine, Cloud Run, or Cloud Functions.
Workload Fit
App Engine is best for stateless, HTTP-based web apps and APIs. For long-running, stateful, or highly customized workloads, consider GKE or Compute Engine instead.
Standard vs Flexible Environments: Deep Comparison
Standard Environment
Standard uses Google-managed language runtimes in sandboxes. It is highly opinionated, very fast to scale, and offers a free tier for supported languages.
Standard: Constraints
Standard restricts local disk writes (except `/tmp`), background work, and system-level access. It expects short-lived, HTTP-driven, stateless handlers.
Flexible Environment
Flexible runs Docker containers on Compute Engine VMs. You can use built-in or custom runtimes and have more control over OS, libraries, and networking.
Flexible: Trade-offs
Flexible scales more slowly and has a higher minimum cost but supports more languages, local disk writes, and long-running or resource-heavy processes.
Choosing for the Exam
For exam questions: standard = fast, managed, free tier; flexible = custom, container-based, more control. Long-running or custom runtime needs usually point to flexible.
Core Concepts: Applications, Services, Versions, and Traffic
App Engine Hierarchy
Each project has one App Engine application, which contains multiple services. Each service has multiple versions, and each version runs on one or more instances.
Services
Services (like `default`, `api`, `worker`) each have their own `app.yaml`, scaling settings, and traffic configuration. They behave like separate microservices.
Versions
Every deployment creates a new version. Versions are immutable and multiple versions can run at once, which enables safe testing and rollbacks.
Traffic Splitting
You can route all traffic to one version or split traffic across versions (e.g., 90/10) for gradual rollouts or A/B tests.
Exam Signal
Phrases like "gradual rollout" or "send 10% of users to a new release" are strong hints toward App Engine traffic splitting or a similar managed platform feature.
Hands-on: A Minimal app.yaml and First Deployment
In App Engine, the `app.yaml` file is the heart of your configuration. It tells the platform which runtime to use, how to scale, and more. Here is a minimal example for a Python 3 app in the standard environment, plus the commands to deploy it.
Study this carefully: questions often show a partial `app.yaml` and ask what behavior to expect.
Deploying with gcloud: Commands and Typical Workflow
Create the App
First-time setup: run `gcloud app create --region=REGION`. This creates the App Engine application in that region for your project.
Basic Deployment
From the directory with `app.yaml`, run `gcloud app deploy`. This deploys a new version of the service and may move traffic to it.
Control Versions
Use `gcloud app deploy --version=v1 --no-promote` to deploy a version without sending traffic yet. This is useful for testing and canary releases.
Set Traffic Splits
Use `gcloud app services set-traffic default --splits=v1=0.1,v2=0.9` to split traffic across versions for gradual rollouts.
Exam Reminder
Watch for `--no-promote` in questions. If you see "deploy but do not affect current users", that flag (plus later traffic changes) is usually the right answer.
Scaling Options and Quotas in App Engine
Automatic Scaling
Automatic scaling adjusts instance count based on traffic and CPU. Use `mininstances`, `maxinstances`, and `targetcpuutilization` in `app.yaml`.
Basic Scaling
Basic scaling starts instances on demand and shuts them down after an `idle_timeout`. It still responds to HTTP requests but is not always-on.
Manual Scaling
Manual scaling keeps a fixed number of instances running. It is useful for long-lived connections or predictable workloads.
Quotas and Limits
App Engine has quotas on requests, bandwidth, and more. Free-tier quotas reset daily. Exceeding quotas can cause 429 or 503 errors.
Exam Angle
If a scenario mentions "scale to zero and minimize cost", think automatic scaling with `min_instances: 0`. For steady capacity, think manual scaling.
Thought Exercise: Choosing Standard vs Flexible vs Other Compute
Work through these scenarios mentally and decide which platform you would choose. Then compare with the suggested answers to check your reasoning.
Scenario A
You are building a public REST API in Node.js. It is stateless, uses Firestore as a backend, and traffic is spiky but not massive. You want minimal ops and the ability to scale to zero to save cost.
- What would you choose: App Engine standard, App Engine flexible, Cloud Run, or Cloud Functions?
- Why?
Scenario B
You have a legacy Java app that requires a specific non-standard system library and writes files to local disk during processing. It must handle long-running requests (several minutes).
- Which environment fits better: App Engine standard, App Engine flexible, GKE, or Compute Engine?
- Why?
Scenario C
Your team wants to deploy a Python 3 web app with very fast cold starts, simple configuration, and a free tier for development. You do not need custom native libraries.
- Would you pick App Engine standard or flexible? Any special scaling settings you would consider?
Pause and answer in your own words. Then reveal the suggested answers:
Suggested answers (do not peek until you think):
- Scenario A: Cloud Run or App Engine standard could both work. Cloud Run is very natural for stateless containers and scales to zero. App Engine standard also works if you want built-in runtime and free tier.
- Scenario B: App Engine flexible, GKE, or Compute Engine are better because they allow custom libraries and long-running requests. Standard is too restrictive.
- Scenario C: App Engine standard is ideal: fast scaling, managed runtime, free tier. Use automatic scaling with `mininstances: 0` for cost savings or a small `mininstances` for lower latency.
Quick Check: Services, Versions, and Traffic
Test your understanding of App Engine's deployment model.
You deploy a new version `v2` of the `default` service using `gcloud app deploy --version=v2 --no-promote`. What happens immediately after this command completes successfully?
- All user traffic is routed to version v2 of the default service.
- Version v2 is created and can be accessed via a version-specific URL, but user traffic still goes to the previously promoted version.
- Version v2 is created but remains stopped until you manually start it from the console.
- Deployment fails because you must always promote a new version during deployment.
Show Answer
Answer: B) Version v2 is created and can be accessed via a version-specific URL, but user traffic still goes to the previously promoted version.
The `--no-promote` flag tells App Engine to deploy the new version without directing user traffic to it. The version is serving and reachable via its version-specific URL, but the existing promoted version continues to receive default traffic until you change traffic settings.
Quick Check: Scaling and Quotas
Check your understanding of scaling behavior in App Engine standard.
Your App Engine standard service uses automatic scaling with `min_instances: 0` and `max_instances: 2`. During a traffic spike, you see requests being rejected with 503 errors. Which is the MOST likely reason?
- App Engine standard cannot scale automatically beyond one instance.
- The `min_instances` value is too low; it must be at least 1 for automatic scaling.
- The `max_instances` limit is too low to handle the spike, so App Engine cannot create more than two instances.
- You must switch to manual scaling for App Engine to handle any traffic spike.
Show Answer
Answer: C) The `max_instances` limit is too low to handle the spike, so App Engine cannot create more than two instances.
With automatic scaling, App Engine can scale between `min_instances` and `max_instances`. If `max_instances` is set too low for the incoming load, new instances cannot be created and some requests may be rejected with 503 errors. App Engine standard supports automatic scaling beyond one instance, and `min_instances` can be zero.
Managing Versions and Traffic: A Gradual Rollout Walkthrough
Deploy v2 Safely
Deploy a new version with `gcloud app deploy --version=v2 --no-promote`. v2 is live at its own URL, but users still see v1.
Start Canary Traffic
Use `gcloud app services set-traffic default --splits=v1=0.9,v2=0.1` to send 10% of traffic to v2 while keeping 90% on v1.
Promote or Roll Back
If v2 is healthy, move to `--splits=v2=1`. If not, revert instantly with `--splits=v1=1` without redeploying.
Clean Up Versions
List versions with `gcloud app versions list` and delete unused ones using `gcloud app versions delete VERSION_ID` to reduce cost.
Exam Pattern
Keywords like "gradual rollout" or "easy rollback" for a web app map well to App Engine traffic splitting between versions.
Key Term Review: App Engine Essentials
Flip through these flashcards to reinforce core App Engine concepts before moving on.
- App Engine application
- A top-level construct tied to a Google Cloud project and region. Each project can have at most one App Engine application, and its region cannot be changed after creation.
- App Engine service
- A logical component within an App Engine application, defined by its own `app.yaml`. Services (like `default`, `api`, `worker`) have independent code, configuration, scaling, and traffic settings.
- App Engine version
- An immutable deployment of a service. Each deployment creates a new version. Multiple versions can run at once, enabling testing, gradual rollouts, and quick rollbacks.
- App Engine standard environment
- An App Engine environment that runs apps in Google-managed language sandboxes, with fast automatic scaling, a free tier, and some runtime restrictions (e.g., limited local disk writes and background work).
- App Engine flexible environment
- An App Engine environment that runs apps in Docker containers on Compute Engine VMs, supporting custom runtimes and system libraries with more control but slower scaling and higher minimum cost.
- automatic_scaling (standard)
- A scaling mode where App Engine adjusts instance count based on traffic and CPU. You configure settings like `min_instances`, `max_instances`, and `target_cpu_utilization` in `app.yaml`.
- basic_scaling (standard)
- A scaling mode where instances start when requests arrive and stop after being idle for a configured `idle_timeout`. Good for intermittent workloads that do not need to be always-on.
- manual_scaling
- A scaling mode where you specify a fixed number of instances that remain running, suitable for long-lived connections or predictable capacity needs.
- Traffic splitting in App Engine
- A feature that lets you route a percentage of traffic to different versions of a service (e.g., 90% to v1, 10% to v2) to support canary releases, A/B testing, and safe rollouts.
- gcloud app deploy --no-promote
- A deployment command flag that creates a new App Engine version without automatically directing user traffic to it, allowing testing before promotion.
Key Terms
- app.yaml
- The main configuration file for an App Engine service, specifying runtime, scaling, handlers, environment variables, and other settings.
- App Engine
- Google Cloud's platform-as-a-service for building and running web applications without managing underlying servers, using configuration files like `app.yaml`.
- Basic scaling
- A scaling mode where instances are started on demand and shut down after being idle for a specified time.
- Manual scaling
- A scaling mode where a fixed number of instances is configured to stay running at all times.
- Automatic scaling
- A scaling mode where App Engine adjusts the number of instances based on demand and performance metrics.
- Traffic splitting
- An App Engine feature that routes a defined percentage of user traffic to different versions of a service for gradual rollouts or experiments.
- Service (App Engine)
- A logical component within an App Engine application, each with its own code and configuration, often representing a microservice or functional area.
- Version (App Engine)
- A specific, immutable deployment of an App Engine service. Multiple versions can run concurrently for testing, rollouts, and rollbacks.
- App Engine flexible environment
- An App Engine environment that runs applications in Docker containers on Compute Engine VMs, allowing custom runtimes and more OS-level control.
- App Engine standard environment
- An App Engine environment that runs applications in Google-managed language sandboxes with fast scaling, a free tier, and some runtime restrictions.