Chapter 14 of 26
High-Performing Databases with Amazon RDS and Caching Strategies
Database bottlenecks are a common theme in performance questions. This module covers Amazon RDS performance features and how to offload load with read replicas and caching layers.
Module Overview: RDS Performance in the Bigger Picture
Where We Are in the Course
You have already learned how to scale compute (EC2) and storage (S3, EBS). Now we focus on the data tier: designing high‑performing relational databases on Amazon RDS and using caching.
Module Outcomes
You will pick RDS instance classes and storage for performance, separate Multi‑AZ (availability) from read replicas (scalability), design caching with ElastiCache, and choose when RDS is the right service.
Performance Efficiency Pillar
Keep the Performance efficiency pillar in mind: "The performance efficiency pillar focuses on the efficient use of computing resources to meet requirements and maintain that efficiency as demand changes and technologies evolve."
Roadmap
We will cover: RDS engines and instances, storage options, read replicas vs Multi‑AZ, connection management, ElastiCache basics, a design walkthrough, and how to choose RDS vs other data stores.
Exam Mindset
Visualize exam stems like: "RDS MySQL is slow for reads at peak." You must quickly match patterns: scale up, add read replicas, enable caching, or switch engines.
RDS Engines and Instance Classes: Performance Basics
RDS Engine Options
RDS supports Aurora, MySQL, MariaDB, PostgreSQL, Oracle, and SQL Server. For the exam, most performance questions compare Aurora vs standard RDS engines like MySQL or PostgreSQL.
Aurora for Performance
Aurora uses a distributed storage layer and supports up to 15 low‑latency read replicas per cluster. It typically delivers higher throughput than MySQL/PostgreSQL on the same hardware.
Standard RDS Engines
RDS MySQL and PostgreSQL are common for general workloads. Oracle and SQL Server appear mainly when licensing or existing enterprise systems drive the choice.
Instance Class Families
Burstable (T3/T4g) are cheap but use CPU credits. General purpose (M5/M6g) balance CPU and RAM. Memory optimized (R5/R6g) provide more RAM for large in‑memory datasets.
Typical Exam Patterns
If CPU is always high or credits are exhausted on a T instance, move to M or R. If reads are heavy, consider scaling up or adding read replicas, especially with Aurora.
RDS Storage Performance: gp3, io1/io2, and Throughput
RDS Uses EBS Underneath
RDS databases store data on EBS volumes. Your choice of storage type and configuration directly affects latency, IOPS, and throughput.
gp3 General Purpose SSD
gp3 is a common default. You can tune IOPS and throughput separately from volume size, giving a flexible balance of performance and cost.
Provisioned IOPS (io1/io2)
io1/io2 are for I/O‑intensive, latency‑sensitive databases. You explicitly provision high IOPS and pay more for predictable performance.
Recognizing I/O Bottlenecks
If CloudWatch shows IOPS or throughput at the limit while CPU is fine, storage is the bottleneck. Scale gp3 IOPS/throughput or move to io1/io2.
Multi‑AZ and Storage
Multi‑AZ replicates storage synchronously to a standby, doubling storage usage and slightly increasing write latency. It improves availability, not raw performance.
Multi‑AZ vs Read Replicas: Availability vs Read Scalability
Multi‑AZ Purpose
Multi‑AZ is about availability. It uses synchronous replication to a standby in another AZ and provides automatic failover, but the standby is not used for reads.
Read Replicas Purpose
Read replicas are about scalability. They use asynchronous replication and are used to offload read‑only traffic, but they do not provide automatic failover.
Consistency Trade‑off
Because replication to read replicas is asynchronous, they can lag behind the primary. For strongly consistent reads, you must read from the primary.
Aurora Nuance
Aurora replicas share distributed storage and can be used for reads and failover. Still, the core idea remains: Multi‑AZ/cluster for availability, replicas for scaling reads.
Exam Shortcut
If the stem says "improve read performance" choose read replicas or Aurora replicas. If it says "automatic failover" or "AZ failure" choose Multi‑AZ or an Aurora cluster.
Connection Management and Scaling Patterns
Why Connections Matter
Each database connection consumes memory and CPU. Rapidly opening and closing many connections can overload an RDS instance even if queries are simple.
Connection Pools
Connection pooling keeps a small set of persistent connections and reuses them. This reduces connection overhead and helps you stay within `max_connections` limits.
AWS RDS Proxy
RDS Proxy is a managed connection pool for RDS and Aurora. It multiplexes many client connections onto fewer DB connections, ideal for Lambda, ECS, and spiky loads.
Scaling Patterns
Scale up (bigger instance), scale out reads (read replicas), add caching (ElastiCache), and in advanced cases, shard or decompose the data model.
Exam Hint
If a stem says "many Lambda functions are exhausting DB connections," think RDS Proxy, not just changing parameters or blindly scaling up.
Caching with Amazon ElastiCache: Redis vs Memcached
Why Cache?
Caching moves hot data from disk‑backed databases into memory. Memory lookups are microseconds, so caching can massively reduce latency and database load.
ElastiCache Engines
ElastiCache supports Redis and Memcached. Both are in‑memory key‑value stores, but Redis offers richer features, while Memcached is simpler and easy to shard.
Redis Highlights
Redis supports replication, Multi‑AZ with failover, persistence, and rich data types. It suits caching, sessions, leaderboards, pub/sub, and rate limiting.
Memcached Highlights
Memcached is a simple, non‑persistent, multi‑node cache. It is good when you only need a straightforward, horizontally scalable key‑value cache.
Typical Exam Cues
If a stem says "frequently accessed, read‑heavy data" and "reduce load on RDS," selecting ElastiCache is usually better than only adding more read replicas.
Design Walkthrough: Scaling a Read‑Heavy Web App
Scenario Setup
An e‑commerce site on EC2 uses RDS MySQL. During sales, users see slow product browsing. CPU and read IOPS on RDS are high, with many connections from web servers.
Step 1: Availability
First, enable Multi‑AZ for RDS so an AZ failure does not take down the database. This improves availability but does not by itself solve performance.
Step 2: Read Replicas
Create RDS read replicas and route read‑only product queries to them. The primary still handles writes. This offloads some read traffic from the primary.
Step 3: Add Redis Cache
Deploy ElastiCache Redis. Use a cache‑aside pattern to store popular product details in Redis with a TTL, so most reads are served from memory instead of RDS.
Step 4–5: Connections and Storage
Add RDS Proxy to pool connections and reduce overhead. If IOPS are still a bottleneck, increase gp3 IOPS/throughput or move to io2 for predictable I/O.
Choosing RDS vs Aurora, DynamoDB, and Other Stores
When to Use RDS
Choose RDS when you need a relational model, SQL, joins, ACID transactions, and an easy migration path from existing relational databases.
When to Use Aurora
Choose Aurora when you still need relational features but also higher performance, more replicas, and faster failover than a single RDS instance can provide.
When to Use DynamoDB
Choose DynamoDB for fully managed, massively scalable NoSQL workloads with key‑value or document access patterns and huge, unpredictable scale.
Other Data Stores
Use S3 for object storage and data lakes, and Redshift for analytical data warehousing, not for transactional OLTP workloads.
Exam Traps
Massive key‑value scale usually points to DynamoDB. Complex joins and ACID transactions point to RDS or Aurora. MySQL needing better scaling often points to Aurora.
Design Exercise: Which Feature Would You Use?
Scenario A: Read Spikes
News site on RDS PostgreSQL sees 10x read spikes during breaking news. Writes are low. Users see slow article pages. Which two features would you use to reduce read load and improve performance?
Scenario B: Lambda Overload
Lambda functions hitting RDS MySQL cause connection errors when traffic spikes. CPU is moderate but `max_connections` is exceeded. Which managed AWS service best solves this?
Scenario C: Profiles Store
User profiles need flexible queries and joins plus strong consistency. Should the startup choose RDS MySQL or DynamoDB, and what is your reasoning?
Reflect on Your Answers
Check if you used ElastiCache and read replicas for A, RDS Proxy for B, and RDS MySQL for C. If not, revisit earlier steps and adjust your mental patterns.
Quiz 1: Multi‑AZ vs Read Replicas
Test your understanding of availability vs scalability.
An application runs on a single‑AZ RDS PostgreSQL instance. Management wants automatic failover to another AZ and minimal data loss if the primary AZ fails. Read performance is currently acceptable. What is the MOST appropriate change?
- Add two read replicas in different AZs and direct some reads to them
- Enable Multi‑AZ on the RDS instance
- Migrate the database to DynamoDB with global tables
- Add an ElastiCache Redis cluster in front of the database
Show Answer
Answer: B) Enable Multi‑AZ on the RDS instance
The requirement is automatic failover and minimal data loss, not additional read capacity. Enabling Multi‑AZ provides synchronous replication to a standby in another AZ and automatic failover. Read replicas are asynchronous and do not provide automatic failover. DynamoDB and ElastiCache do not directly address the relational failover requirement.
Quiz 2: Caching and Scaling Patterns
Check how well you can match patterns to services.
A gaming leaderboard service stores scores in RDS MySQL. Reads are extremely frequent and must be very fast. Scores update every few seconds. Which option BEST improves performance while minimizing load on RDS?
- Scale the RDS instance to a larger class and enable Multi‑AZ
- Create multiple cross‑Region read replicas and direct reads to them
- Store and serve leaderboard data from ElastiCache for Redis using sorted sets
- Migrate the database to Amazon S3 and query it with Amazon Athena
Show Answer
Answer: C) Store and serve leaderboard data from ElastiCache for Redis using sorted sets
Leaderboards are a classic fit for Redis sorted sets, which can quickly fetch top scores and ranges from memory. This offloads reads from RDS and provides very low latency. Simply scaling RDS or adding read replicas still keeps disk in the critical path. S3 and Athena are not appropriate for low‑latency transactional reads.
Key Term Flashcards: RDS and Caching
Use these flashcards to reinforce core concepts before moving on.
- Amazon RDS
- A managed relational database service that supports engines like Aurora, MySQL, PostgreSQL, MariaDB, Oracle, and SQL Server, handling backups, patching, and basic availability features for you.
- Multi‑AZ deployment (RDS)
- An RDS configuration that synchronously replicates data to a standby instance in another Availability Zone to provide high availability and automatic failover, but does not provide additional read capacity.
- Read replica (RDS)
- An asynchronously replicated copy of a primary RDS database used to offload read‑only traffic and improve read scalability; it does not provide automatic failover by default.
- Amazon Aurora
- A MySQL‑ and PostgreSQL‑compatible relational database built for the cloud, using a distributed storage layer to provide higher performance, up to 15 replicas, and fast failover compared to standard RDS engines.
- Amazon ElastiCache
- A managed in‑memory caching service that supports Redis and Memcached, used to reduce database load and improve application response times by serving hot data from memory.
- RDS Proxy
- A fully managed database proxy for RDS and Aurora that pools and shares connections, helping applications like Lambda and ECS scale without overwhelming the database with connections.
- gp3 vs io2 for RDS
- gp3 is general‑purpose SSD storage where you can provision IOPS and throughput independently of size; io2 is provisioned IOPS SSD designed for I/O‑intensive, latency‑sensitive workloads requiring consistent high performance.
- Cache‑aside pattern
- A caching strategy where the application first checks the cache, and on a miss reads from the database and then populates the cache, controlling when to invalidate or refresh entries.
- Performance efficiency pillar
- The performance efficiency pillar focuses on the efficient use of computing resources to meet requirements and maintain that efficiency as demand changes and technologies evolve.
- When to choose DynamoDB over RDS
- Choose DynamoDB when you need a fully managed, massively scalable NoSQL key‑value or document store with predictable low latency and do not require complex joins or traditional relational features.
Key Terms
- gp3
- General Purpose SSD EBS volume type used by RDS that allows independent provisioning of IOPS and throughput from volume size, suitable for many database workloads.
- io1/io2
- Provisioned IOPS SSD EBS volume types designed for I/O-intensive, latency-sensitive workloads that require consistent high IOPS and low latency.
- DynamoDB
- A fully managed, serverless NoSQL key-value and document database that provides single-digit millisecond performance at virtually any scale, often used instead of RDS when relational features are not required.
- RDS Proxy
- A fully managed database proxy for RDS and Aurora that pools and shares connections, helping applications like Lambda and ECS scale without overwhelming the database with connections.
- Amazon RDS
- A managed relational database service that supports engines like Aurora, MySQL, PostgreSQL, MariaDB, Oracle, and SQL Server, handling backups, patching, and basic availability features for you.
- Read replica
- An asynchronously replicated copy of a primary RDS database used to offload read-only traffic and improve read scalability; it does not provide automatic failover by default.
- Amazon Aurora
- A MySQL- and PostgreSQL-compatible relational database built for the cloud, using a distributed storage layer to provide higher performance, up to 15 replicas, and fast failover compared to standard RDS engines.
- Amazon ElastiCache
- A managed in-memory caching service that supports Redis and Memcached, used to reduce database load and improve application response times by serving hot data from memory.
- Cache-aside pattern
- A caching strategy where the application first checks the cache, and on a miss reads from the database and then populates the cache, controlling when to invalidate or refresh entries.
- Multi-AZ deployment
- An RDS configuration that synchronously replicates data to a standby instance in another Availability Zone to provide high availability and automatic failover, but does not provide additional read capacity.
- Performance efficiency pillar
- The performance efficiency pillar focuses on the efficient use of computing resources to meet requirements and maintain that efficiency as demand changes and technologies evolve.