SkarpSkarp

Chapter 14 of 26

Global Resilience and Routing with Amazon Route 53 and Multi-Region Designs

When a single Region is not enough, Route 53 becomes your traffic director; learn how to route users intelligently and keep services reachable during failures.

27 min readen

Big Picture: Why Route 53 Matters for Global Resilience

From Single Region to Global

Single-Region designs can survive instance or AZ failures, but not full Region outages or global latency issues. Multi-Region designs fix that.

What Route 53 Does

Amazon Route 53 is AWS’s highly available DNS service. It maps names like `api.example.com` to IPs and can do this intelligently based on rules.

Exam-Relevant Capabilities

Route 53 can route using health, latency, and geography. On the exam, you must pick the correct routing policy and failover design.

Tying to Well-Architected

Route 53 directly supports the Reliability pillar (stay available during failures) and Performance efficiency (route closer to users).

Route 53 Core Building Blocks: Hosted Zones and Records

Hosted Zones

A hosted zone is a container for DNS records for a domain like `example.com`. Public zones are internet-facing; private zones are VPC-internal.

Key Record Types

Know A, AAAA, CNAME, and AWS Alias records. Alias records point to AWS resources and can be used at the zone apex, unlike CNAME.

TTL and Caching

TTL controls how long DNS answers are cached. Low TTL speeds failover but increases DNS traffic; high TTL is cheaper but slower to change.

Exam Angle

For failover designs, look for hints to use public hosted zones, alias records to ALBs, and shorter TTLs on critical records.

Routing Policies Overview: How Route 53 Decides Where to Send Traffic

Why Routing Policies Matter

Routing policies define how Route 53 answers DNS queries. Many exam questions ask you to pick the correct policy for a scenario.

Simple and Weighted

Simple routing is basic, no failover. Weighted routing splits traffic by percentages, useful for A/B tests or gradual migrations.

Latency and Failover

Latency-based routing sends users to the lowest-latency Region. Failover routing uses primary/secondary records for active-passive setups.

Geo and Multivalue

Geolocation and geoproximity use user location. Multivalue returns several healthy IPs and can act like a simple load balancer.

Health Checks and DNS-Based Failover

What Health Checks Do

Route 53 health checks probe HTTP/HTTPS/TCP endpoints or CloudWatch alarms to decide if a record is healthy and should be returned.

Records and Health

Attach a health check to a record. If it fails, policies like failover or latency stop returning that record in DNS answers.

DNS Caching Effects

Failover is limited by TTL. New queries see the new target; clients with cached answers keep using the old IP until TTL expires.

Exam Trap

If a scenario needs instant failover for existing connections, DNS alone is insufficient. Look for load balancers or other techniques.

Active-Passive Multi-Region with Failover Routing

Scenario Setup

You host `www.example.com` with a primary stack in us-east-1 and a DR stack in eu-west-1. Requirement: stay reachable if us-east-1 fails.

Route 53 Configuration

Create two alias A records for `www.example.com`: primary to us-east-1 ALB, secondary to eu-west-1 ALB, both with health checks.

Failover Behavior

When the us-east-1 health check fails, Route 53 stops returning it and responds with the eu-west-1 ALB instead.

Exam Signals

Phrases like active-passive, DR Region, DNS-based failover, and cost-sensitive standby often point to failover routing.

Active-Active Multi-Region with Latency-Based Routing

Global API Scenario

You run `api.example.com` in us-east-1, eu-west-1, and ap-southeast-1. Goal: users hit the lowest-latency Region and survive Region failures.

Latency-Based Setup

Create three alias A records for `api.example.com`, each latency-based and tied to one Region’s ALB, all with health checks.

How Traffic Flows

Route 53 sends users to the Region with lowest measured latency. If a Region fails, its record is removed and users go to the next best.

Recognizing Active-Active

Phrases like global performance, use all Regions, no idle standby usually imply active-active with latency-based or weighted routing.

Design Choice: Active-Active or Active-Passive?

Work through this thought exercise to solidify when to choose active-active vs active-passive. There is no single correct answer, but aim for exam-style reasoning.

Scenario A

A video streaming service has users worldwide. The business wants the lowest possible latency and is willing to pay for multiple fully sized Regions. They also want resilience against a full Region outage.

  • Would you choose active-active or active-passive?
  • Which Route 53 routing policy best fits?
  • What are the trade-offs in cost and complexity?

Pause and answer in your own words, then compare:

  • Likely answer: active-active with latency-based routing.
  • Justification: all Regions serve traffic; users get the closest Region; if one Region fails, others keep serving.
  • Trade-offs: higher cost (all Regions scaled for normal load), more complex data replication.

Scenario B

An internal HR application mostly used in one country. Regulatory requirements demand a DR Region in another country, but the RTO is 4 hours and cost sensitivity is high. Only a small team uses it.

  • Would you choose active-active or active-passive?
  • Which Route 53 routing policy best fits?

Reflect first, then compare:

  • Likely answer: active-passive with failover routing.
  • Primary Region handles all traffic; DR Region can be scaled smaller and only scaled up during disaster.

Use these patterns when reading exam scenarios: map business goals (latency, cost, RTO/RPO) to routing policies and multi-Region patterns.

Quiz 1: Routing Policy Selection

Test your understanding of which Route 53 routing policy to use.

Your company runs identical stacks in us-east-1 and ap-southeast-1. You want to send 70% of traffic to us-east-1 and 30% to ap-southeast-1, and you want Route 53 to stop sending traffic to a Region if its health checks fail. Which routing policy should you use?

  1. Simple routing
  2. Weighted routing
  3. Latency-based routing
  4. Failover routing
Show Answer

Answer: B) Weighted routing

**Weighted routing** is correct because you want a specific percentage split (70/30) and automatic health-based removal of unhealthy endpoints. Simple routing does not support this; latency-based routes by latency, not percentage; failover is for primary/secondary, not proportional traffic sharing.

Quiz 2: DNS-Based Failover Behavior

Check your understanding of how TTL and failover interact.

You configure failover routing for `app.example.com` with a TTL of 300 seconds. The primary Region fails at 12:00:00. Health checks detect the failure and Route 53 starts answering with the secondary Region at 12:00:30. When will *all* users definitely be using the secondary Region?

  1. Immediately at 12:00:30
  2. Within 30 seconds of 12:00:30
  3. Within 300 seconds of 12:00:30
  4. It is impossible to predict due to DNS caching behavior
Show Answer

Answer: C) Within 300 seconds of 12:00:30

With a TTL of 300 seconds, resolvers and clients can legally cache the old primary IP for up to 300 seconds. After Route 53 changes its answers at 12:00:30, it can take up to 300 seconds for all caches to expire, so **within 300 seconds of 12:00:30** is the best answer in an exam context.

Geolocation vs Latency Routing and Common Exam Traps

Latency vs Geolocation

Latency-based routing picks the lowest-latency Region. Geolocation routing picks based on the user’s geographic location.

When to Use Geolocation

Use geolocation for legal, compliance, or localization needs, such as data residency or country-specific content.

Exam Gotchas

Do not confuse geolocation with geoproximity, and remember DNS is stateless and does not replicate data between Regions.

Reading Requirements

Words like 'data residency' or 'country-based access' imply geolocation; 'lowest latency' implies latency-based routing.

Key Term Review: Route 53 and Multi-Region Patterns

Flip these cards to reinforce core concepts before moving on.

Hosted zone
A container for DNS records for a specific domain, such as example.com. Public hosted zones are visible on the internet; private hosted zones are visible only within one or more VPCs.
Alias record
An AWS-specific record type that lets you map a DNS name to certain AWS resources (like ALBs, CloudFront, S3 websites) and can be used at the zone apex. Alias targets automatically track IP changes and incur no extra DNS query charge.
Failover routing policy
A Route 53 routing policy that uses primary and secondary records. Combined with health checks, it routes traffic to the secondary when the primary becomes unhealthy, enabling active-passive architectures.
Latency-based routing (LBR)
A routing policy that directs users to the AWS Region with the lowest network latency, based on Route 53 measurements. Commonly used for active-active multi-Region architectures.
Geolocation routing
A routing policy that directs traffic based on the geographic location of the user’s DNS resolver IP (continent, country, or state). Useful for data residency, compliance, and localization.
Active-active multi-Region
An architecture where multiple Regions actively serve production traffic at the same time, often using latency-based or weighted routing. Improves global performance and resilience but increases cost and complexity.
Active-passive multi-Region
An architecture with a primary Region serving traffic and a secondary Region on standby for disaster recovery, typically using failover routing. Reduces cost but may have higher RTO.
Route 53 health check
A mechanism for Route 53 to monitor the health of endpoints (HTTP/HTTPS/TCP or CloudWatch alarms) and stop returning DNS records that are considered unhealthy.
TTL (Time To Live)
A DNS setting that defines how long resolvers and clients can cache a DNS response. Shorter TTLs enable faster failover but increase DNS query volume.
Multivalue answer routing
A routing policy where Route 53 returns multiple healthy IP addresses for a record. Each IP has its own health check, providing simple load balancing and resilience without a dedicated load balancer.

Pulling It Together and Next Steps in Your Study Path

Route 53 as Traffic Director

Route 53 sits at your global front door, using routing policies and health checks to direct users to healthy, appropriate endpoints.

Architectural Patterns

Combine Route 53 with ALBs, Auto Scaling, and multi-Region data to build active-active or active-passive architectures.

Exam Blueprint Fit

These topics map to designing resilient and high-performing architectures in the Solutions Architect Associate exam.

Your Next Moves

Use the diagnostic, mock exams, and spaced review in this course to reinforce Route 53 and multi-Region design decisions.

Key Terms

Hosted zone
A container for DNS records for a domain, such as example.com. Public hosted zones are accessible on the internet; private hosted zones are accessible only within associated VPCs.
Health check
A Route 53 feature that monitors the health of specified endpoints (HTTP/HTTPS/TCP or CloudWatch alarms) and marks records unhealthy when checks fail, enabling DNS-based failover.
Active-active
A multi-Region architecture where multiple Regions simultaneously serve production traffic, often using latency-based or weighted routing for load sharing and resilience.
Active-passive
A multi-Region architecture where one Region is primary and serves traffic, while another Region is on standby for disaster recovery, typically using failover routing.
Routing policy
A configuration in Route 53 that determines how DNS queries are answered, such as simple, weighted, latency-based, failover, geolocation, geoproximity, or multivalue answer routing.
Amazon Route 53
AWS’s scalable, highly available Domain Name System (DNS) web service that translates domain names into IP addresses and supports advanced routing policies and health checks.
Failover routing
A Route 53 routing policy that uses primary and secondary records with health checks to route traffic to a secondary endpoint when the primary becomes unhealthy.
TTL (Time To Live)
A value in DNS records that specifies how long resolvers and clients may cache a DNS response before requesting it again, affecting both performance and failover speed.
Geolocation routing
A Route 53 routing policy that directs traffic based on the geographic location of the DNS resolver’s IP address, used for localization and data residency requirements.
Latency-based routing
A Route 53 routing policy that directs traffic to the Region with the lowest network latency for the user, improving global performance.

Finished reading?

Test your understanding with a custom practice exam on this chapter.

Test yourself