Chapter 14 of 26
Global Resilience and Routing with Amazon Route 53 and Multi-Region Designs
When a single Region is not enough, Route 53 becomes your traffic director; learn how to route users intelligently and keep services reachable during failures.
Big Picture: Why Route 53 Matters for Global Resilience
From Single Region to Global
Single-Region designs can survive instance or AZ failures, but not full Region outages or global latency issues. Multi-Region designs fix that.
What Route 53 Does
Amazon Route 53 is AWS’s highly available DNS service. It maps names like `api.example.com` to IPs and can do this intelligently based on rules.
Exam-Relevant Capabilities
Route 53 can route using health, latency, and geography. On the exam, you must pick the correct routing policy and failover design.
Tying to Well-Architected
Route 53 directly supports the Reliability pillar (stay available during failures) and Performance efficiency (route closer to users).
Route 53 Core Building Blocks: Hosted Zones and Records
Hosted Zones
A hosted zone is a container for DNS records for a domain like `example.com`. Public zones are internet-facing; private zones are VPC-internal.
Key Record Types
Know A, AAAA, CNAME, and AWS Alias records. Alias records point to AWS resources and can be used at the zone apex, unlike CNAME.
TTL and Caching
TTL controls how long DNS answers are cached. Low TTL speeds failover but increases DNS traffic; high TTL is cheaper but slower to change.
Exam Angle
For failover designs, look for hints to use public hosted zones, alias records to ALBs, and shorter TTLs on critical records.
Routing Policies Overview: How Route 53 Decides Where to Send Traffic
Why Routing Policies Matter
Routing policies define how Route 53 answers DNS queries. Many exam questions ask you to pick the correct policy for a scenario.
Simple and Weighted
Simple routing is basic, no failover. Weighted routing splits traffic by percentages, useful for A/B tests or gradual migrations.
Latency and Failover
Latency-based routing sends users to the lowest-latency Region. Failover routing uses primary/secondary records for active-passive setups.
Geo and Multivalue
Geolocation and geoproximity use user location. Multivalue returns several healthy IPs and can act like a simple load balancer.
Health Checks and DNS-Based Failover
What Health Checks Do
Route 53 health checks probe HTTP/HTTPS/TCP endpoints or CloudWatch alarms to decide if a record is healthy and should be returned.
Records and Health
Attach a health check to a record. If it fails, policies like failover or latency stop returning that record in DNS answers.
DNS Caching Effects
Failover is limited by TTL. New queries see the new target; clients with cached answers keep using the old IP until TTL expires.
Exam Trap
If a scenario needs instant failover for existing connections, DNS alone is insufficient. Look for load balancers or other techniques.
Active-Passive Multi-Region with Failover Routing
Scenario Setup
You host `www.example.com` with a primary stack in us-east-1 and a DR stack in eu-west-1. Requirement: stay reachable if us-east-1 fails.
Route 53 Configuration
Create two alias A records for `www.example.com`: primary to us-east-1 ALB, secondary to eu-west-1 ALB, both with health checks.
Failover Behavior
When the us-east-1 health check fails, Route 53 stops returning it and responds with the eu-west-1 ALB instead.
Exam Signals
Phrases like active-passive, DR Region, DNS-based failover, and cost-sensitive standby often point to failover routing.
Active-Active Multi-Region with Latency-Based Routing
Global API Scenario
You run `api.example.com` in us-east-1, eu-west-1, and ap-southeast-1. Goal: users hit the lowest-latency Region and survive Region failures.
Latency-Based Setup
Create three alias A records for `api.example.com`, each latency-based and tied to one Region’s ALB, all with health checks.
How Traffic Flows
Route 53 sends users to the Region with lowest measured latency. If a Region fails, its record is removed and users go to the next best.
Recognizing Active-Active
Phrases like global performance, use all Regions, no idle standby usually imply active-active with latency-based or weighted routing.
Design Choice: Active-Active or Active-Passive?
Work through this thought exercise to solidify when to choose active-active vs active-passive. There is no single correct answer, but aim for exam-style reasoning.
Scenario A
A video streaming service has users worldwide. The business wants the lowest possible latency and is willing to pay for multiple fully sized Regions. They also want resilience against a full Region outage.
- Would you choose active-active or active-passive?
- Which Route 53 routing policy best fits?
- What are the trade-offs in cost and complexity?
Pause and answer in your own words, then compare:
- Likely answer: active-active with latency-based routing.
- Justification: all Regions serve traffic; users get the closest Region; if one Region fails, others keep serving.
- Trade-offs: higher cost (all Regions scaled for normal load), more complex data replication.
Scenario B
An internal HR application mostly used in one country. Regulatory requirements demand a DR Region in another country, but the RTO is 4 hours and cost sensitivity is high. Only a small team uses it.
- Would you choose active-active or active-passive?
- Which Route 53 routing policy best fits?
Reflect first, then compare:
- Likely answer: active-passive with failover routing.
- Primary Region handles all traffic; DR Region can be scaled smaller and only scaled up during disaster.
Use these patterns when reading exam scenarios: map business goals (latency, cost, RTO/RPO) to routing policies and multi-Region patterns.
Quiz 1: Routing Policy Selection
Test your understanding of which Route 53 routing policy to use.
Your company runs identical stacks in us-east-1 and ap-southeast-1. You want to send 70% of traffic to us-east-1 and 30% to ap-southeast-1, and you want Route 53 to stop sending traffic to a Region if its health checks fail. Which routing policy should you use?
- Simple routing
- Weighted routing
- Latency-based routing
- Failover routing
Show Answer
Answer: B) Weighted routing
**Weighted routing** is correct because you want a specific percentage split (70/30) and automatic health-based removal of unhealthy endpoints. Simple routing does not support this; latency-based routes by latency, not percentage; failover is for primary/secondary, not proportional traffic sharing.
Quiz 2: DNS-Based Failover Behavior
Check your understanding of how TTL and failover interact.
You configure failover routing for `app.example.com` with a TTL of 300 seconds. The primary Region fails at 12:00:00. Health checks detect the failure and Route 53 starts answering with the secondary Region at 12:00:30. When will *all* users definitely be using the secondary Region?
- Immediately at 12:00:30
- Within 30 seconds of 12:00:30
- Within 300 seconds of 12:00:30
- It is impossible to predict due to DNS caching behavior
Show Answer
Answer: C) Within 300 seconds of 12:00:30
With a TTL of 300 seconds, resolvers and clients can legally cache the old primary IP for up to 300 seconds. After Route 53 changes its answers at 12:00:30, it can take up to 300 seconds for all caches to expire, so **within 300 seconds of 12:00:30** is the best answer in an exam context.
Geolocation vs Latency Routing and Common Exam Traps
Latency vs Geolocation
Latency-based routing picks the lowest-latency Region. Geolocation routing picks based on the user’s geographic location.
When to Use Geolocation
Use geolocation for legal, compliance, or localization needs, such as data residency or country-specific content.
Exam Gotchas
Do not confuse geolocation with geoproximity, and remember DNS is stateless and does not replicate data between Regions.
Reading Requirements
Words like 'data residency' or 'country-based access' imply geolocation; 'lowest latency' implies latency-based routing.
Key Term Review: Route 53 and Multi-Region Patterns
Flip these cards to reinforce core concepts before moving on.
- Hosted zone
- A container for DNS records for a specific domain, such as example.com. Public hosted zones are visible on the internet; private hosted zones are visible only within one or more VPCs.
- Alias record
- An AWS-specific record type that lets you map a DNS name to certain AWS resources (like ALBs, CloudFront, S3 websites) and can be used at the zone apex. Alias targets automatically track IP changes and incur no extra DNS query charge.
- Failover routing policy
- A Route 53 routing policy that uses primary and secondary records. Combined with health checks, it routes traffic to the secondary when the primary becomes unhealthy, enabling active-passive architectures.
- Latency-based routing (LBR)
- A routing policy that directs users to the AWS Region with the lowest network latency, based on Route 53 measurements. Commonly used for active-active multi-Region architectures.
- Geolocation routing
- A routing policy that directs traffic based on the geographic location of the user’s DNS resolver IP (continent, country, or state). Useful for data residency, compliance, and localization.
- Active-active multi-Region
- An architecture where multiple Regions actively serve production traffic at the same time, often using latency-based or weighted routing. Improves global performance and resilience but increases cost and complexity.
- Active-passive multi-Region
- An architecture with a primary Region serving traffic and a secondary Region on standby for disaster recovery, typically using failover routing. Reduces cost but may have higher RTO.
- Route 53 health check
- A mechanism for Route 53 to monitor the health of endpoints (HTTP/HTTPS/TCP or CloudWatch alarms) and stop returning DNS records that are considered unhealthy.
- TTL (Time To Live)
- A DNS setting that defines how long resolvers and clients can cache a DNS response. Shorter TTLs enable faster failover but increase DNS query volume.
- Multivalue answer routing
- A routing policy where Route 53 returns multiple healthy IP addresses for a record. Each IP has its own health check, providing simple load balancing and resilience without a dedicated load balancer.
Pulling It Together and Next Steps in Your Study Path
Route 53 as Traffic Director
Route 53 sits at your global front door, using routing policies and health checks to direct users to healthy, appropriate endpoints.
Architectural Patterns
Combine Route 53 with ALBs, Auto Scaling, and multi-Region data to build active-active or active-passive architectures.
Exam Blueprint Fit
These topics map to designing resilient and high-performing architectures in the Solutions Architect Associate exam.
Your Next Moves
Use the diagnostic, mock exams, and spaced review in this course to reinforce Route 53 and multi-Region design decisions.
Key Terms
- Hosted zone
- A container for DNS records for a domain, such as example.com. Public hosted zones are accessible on the internet; private hosted zones are accessible only within associated VPCs.
- Health check
- A Route 53 feature that monitors the health of specified endpoints (HTTP/HTTPS/TCP or CloudWatch alarms) and marks records unhealthy when checks fail, enabling DNS-based failover.
- Active-active
- A multi-Region architecture where multiple Regions simultaneously serve production traffic, often using latency-based or weighted routing for load sharing and resilience.
- Active-passive
- A multi-Region architecture where one Region is primary and serves traffic, while another Region is on standby for disaster recovery, typically using failover routing.
- Routing policy
- A configuration in Route 53 that determines how DNS queries are answered, such as simple, weighted, latency-based, failover, geolocation, geoproximity, or multivalue answer routing.
- Amazon Route 53
- AWS’s scalable, highly available Domain Name System (DNS) web service that translates domain names into IP addresses and supports advanced routing policies and health checks.
- Failover routing
- A Route 53 routing policy that uses primary and secondary records with health checks to route traffic to a secondary endpoint when the primary becomes unhealthy.
- TTL (Time To Live)
- A value in DNS records that specifies how long resolvers and clients may cache a DNS response before requesting it again, affecting both performance and failover speed.
- Geolocation routing
- A Route 53 routing policy that directs traffic based on the geographic location of the DNS resolver’s IP address, used for localization and data residency requirements.
- Latency-based routing
- A Route 53 routing policy that directs traffic to the Region with the lowest network latency for the user, improving global performance.