Chapter 12 of 20
Storage and Databases on AWS: Foundations of Persistent Data
Look under the hood of how AWS stores data long-term, from object storage to managed databases, and when to choose each option.
Big Picture: Persistent Data on AWS
Why Persistent Data Matters
Compute is temporary; memory is wiped when processes stop. Anything you want to keep long-term must live in storage or databases.
Two Big Categories
AWS splits persistent data into storage services (S3, EBS, EFS, FSx) and database services (RDS, Aurora, DynamoDB, and others).
What You Must Be Able To Do
For the exam, you must describe object vs block vs file storage, explain relational vs non-relational databases, and relate choices to durability, availability, performance, and cost.
Link to Global Infrastructure
Storage and databases live in Regions and AZs, so you can design for resilience and regulatory needs using the same building blocks as compute.
Core Storage Concepts: Durability, Availability, Performance, Cost
Durability
Durability is the probability your data remains intact over time. AWS achieves high durability by storing multiple copies across devices and often AZs.
Availability
Availability is how often the service is up and reachable. More availability generally means more redundancy and higher cost.
Performance
Performance includes latency (speed of one operation) and throughput (volume per second). Different workloads emphasize different aspects.
Cost Trade-offs
Cost includes $/GB-month, request pricing, and data transfer. Cheaper storage classes usually trade off access speed or availability.
Reading Scenarios
On the exam, clues like "infrequent access" or "millisecond latency" hint at which dimension is most important and which service family to choose.
Object Storage: Amazon S3
S3 as Object Storage
S3 stores data as objects in buckets. Each object has data and metadata, and is identified by a key. You access it over HTTP/HTTPS, not as a disk.
Durability and Scale
S3 is designed for extremely high durability by keeping multiple copies across many devices and usually multiple AZs, with virtually unlimited capacity.
Typical Use Cases
Use S3 for static websites, user uploads, data lakes, and backups/archives. It excels at write-once, read-many workloads.
Storage Classes Overview
Standard for frequent access, *-IA for infrequent access, and Glacier classes for archival. Cheaper storage usually means slower or priced retrieval.
Common Exam Trap
S3 is not a file system or disk. You do not run databases on S3; you store assets and backups in S3.
Block Storage: Amazon EBS and Instance Store
What Is Block Storage?
Block storage looks like a raw disk to an operating system. You format it with a file system and mount it to an EC2 instance.
Amazon EBS
EBS provides network-attached, persistent block volumes in a single AZ. You can snapshot them to S3 and choose SSD or HDD types for different workloads.
Instance Store
Instance store is physically attached to the EC2 host. It is very fast but ephemeral: data is lost when the instance stops or is terminated.
When to Use Which
Use EBS for boot volumes and databases; use instance store for temporary data like caches or scratch space where loss is acceptable.
Block vs Object
Block storage underlies OS and databases needing a disk; object storage like S3 is for API-accessed files and large, unstructured datasets.
File Storage: Amazon EFS and Amazon FSx
What Is File Storage?
File storage provides a shared file system with directories and file paths, accessed via NFS or SMB by multiple servers.
Amazon EFS
EFS is a managed NFS file system for Linux. It is elastic, shared across many EC2 instances, and spans multiple AZs in a Region.
Amazon FSx Family
FSx offers managed file systems tuned for specific needs, like Windows file shares (SMB) or high-performance Lustre for HPC and ML.
Choosing File Storage
Use file storage when apps expect a file system and multiple instances must share files. It avoids running your own NFS/SMB servers.
Exam Clues
Phrases like "shared file system" or "Windows file shares" strongly hint toward EFS or FSx rather than S3 or EBS.
Thought Exercise: Match Workloads to Storage Types
Use this thought exercise to solidify object vs block vs file storage.
For each scenario, decide which storage type (object, block, or file) is the best conceptual fit, then check the model answer mentally.
- Photo-sharing app uploads
- Millions of user photos, accessed via a web/mobile app.
- Needs high durability, global distribution via CDN, and simple HTTP access.
- Which storage type? (Hint: object, block, or file?)
- Relational database for an e-commerce site
- Requires low-latency random reads/writes, strict consistency, and frequent small updates.
- Runs on an EC2 instance or a managed database service that expects a disk.
- Which storage type under the hood?
- Shared content directory for a cluster of web servers
- Multiple EC2 instances need to read and write the same configuration and media files.
- Application expects a normal file system with paths like `/var/www/html`.
- Which storage type?
Model answers (do not peek until you have tried):
- Photo-sharing uploads: Object storage (Amazon S3)
- E-commerce database: Block storage (Amazon EBS under RDS/EC2)
- Shared content directory: File storage (Amazon EFS or FSx)
As you practice more questions in this course, watch for these patterns. The exam rarely asks "what is block storage" outright; it describes a workload and expects you to infer the right family.
Relational Databases on AWS: RDS and Aurora
Relational Basics
Relational databases store data in tables with rows and columns, use SQL, and excel at strong consistency and complex joins.
Amazon RDS
RDS manages relational engines like MySQL and PostgreSQL. AWS handles provisioning, patching, and backups, you handle schema and queries.
Availability Features
RDS supports automated backups, Multi-AZ deployments for high availability, and read replicas to scale reads for some engines.
Amazon Aurora
Aurora is a high-performance, MySQL/PostgreSQL-compatible engine whose storage auto-scales and replicates across multiple AZs.
When to Use Relational
Choose relational databases for transactions, strong integrity, and structured data like orders, customers, and inventory.
Non-Relational Databases: DynamoDB and Beyond
What Is Non-Relational?
Non-relational databases trade some relational features for scalability, flexibility, or speed, focusing on specific data models.
Amazon DynamoDB
DynamoDB is a fully managed key-value and document database. It is serverless and designed for millisecond latency at massive scale.
DynamoDB Concepts
Data lives in tables, items, and attributes. The primary key controls distribution; you pick on-demand or provisioned capacity.
DynamoDB Use Cases
Use DynamoDB for high-traffic backends, user profiles, sessions, IoT data, and workloads needing consistent low latency at scale.
Relational vs Non-Relational
Relational emphasizes schema and joins; non-relational emphasizes flexible schema and scale. DynamoDB does not support SQL joins.
Quick Check: Storage Types
Test your understanding of object, block, and file storage.
A startup runs a Linux web application on multiple EC2 instances across two AZs. All instances must read and write to the same set of configuration files using normal file paths. Which AWS storage option is the best fit?
- Amazon S3 Standard
- Amazon Elastic Block Store (EBS) General Purpose SSD
- Amazon Elastic File System (EFS)
- Instance store volumes
Show Answer
Answer: C) Amazon Elastic File System (EFS)
Amazon EFS provides a managed, shared NFS file system that multiple EC2 instances across AZs can mount and use with normal file paths. S3 is object storage (not a mounted file system), EBS volumes attach to a single instance in one AZ, and instance store is ephemeral and not shared.
Quick Check: Relational vs Non-Relational
Test how well you can choose between RDS and DynamoDB.
A financial application needs ACID transactions and complex SQL joins across multiple tables (accounts, customers, transactions). Which AWS database service family is the best conceptual fit?
- Amazon DynamoDB
- Amazon RDS / Amazon Aurora
- Amazon ElastiCache
- Amazon S3 with CSV files
Show Answer
Answer: B) Amazon RDS / Amazon Aurora
Relational databases like those managed by Amazon RDS and Amazon Aurora are designed for ACID transactions and complex SQL joins. DynamoDB is non-relational and does not support joins, ElastiCache is for in-memory caching, and S3 is object storage, not a transactional database.
Putting It Together: Designing a Simple Web App Stack
Scenario Overview
Design an online store: users browse products, upload photos, and place orders. You need strong consistency and AZ-level resilience.
Compute Layer
Run the app on multiple EC2 instances or managed compute across two AZs to survive a single AZ failure.
Object Storage with S3
Store product and profile images in S3, possibly fronted by CloudFront for global, low-latency delivery.
Relational Data with RDS
Use Amazon RDS or Aurora for catalog, orders, and users. RDS handles backups and Multi-AZ; it uses EBS under the hood.
Optional Acceleration
Add DynamoDB or ElastiCache for fast session storage or key-value lookups if the workload demands it.
Durability and Lifecycle
Leverage RDS backups to S3 and S3 lifecycle rules to move old data to Glacier classes for lower-cost archiving.
Key Term Review: Storage and Databases
Flip through these cards to reinforce core concepts before moving on.
- Object storage (on AWS)
- A storage model where data is stored as objects (data + metadata) inside buckets and accessed via APIs or HTTP/HTTPS. On AWS, Amazon S3 is the primary object storage service.
- Block storage (on AWS)
- Storage that presents itself as raw disk volumes to an operating system. You format and mount it like a drive. On AWS, Amazon EBS and instance store provide block storage for EC2.
- File storage (on AWS)
- Shared storage that exposes a file system interface over protocols like NFS or SMB, allowing multiple servers to access the same files. On AWS, Amazon EFS and Amazon FSx provide managed file storage.
- Amazon S3 bucket
- A logical container for objects in Amazon S3. Each bucket holds many objects, each identified by a unique key within that bucket.
- Amazon EBS volume
- A persistent block storage volume for use with Amazon EC2 instances in a single Availability Zone, supporting low-latency read/write operations.
- Instance store
- Ephemeral block storage physically attached to the host running an EC2 instance. It offers very high performance but data persists only for the life of the instance.
- Amazon EFS
- A fully managed, elastic NFS file system for Linux-based workloads that can be mounted concurrently by multiple EC2 instances across multiple AZs in a Region.
- Amazon RDS
- A managed relational database service that supports engines like MySQL, PostgreSQL, MariaDB, Oracle Database, SQL Server, and Amazon Aurora, handling provisioning, patching, and backups.
- Amazon Aurora
- A high-performance, MySQL- and PostgreSQL-compatible relational database engine in the RDS family, with storage that automatically scales and replicates across multiple AZs.
- Amazon DynamoDB
- A fully managed, serverless key-value and document database service designed for single-digit millisecond latency at any scale, without managing servers or storage.
- Durability vs Availability
- Durability is the likelihood that data remains intact and uncorrupted over time. Availability is how often the service is up and reachable. A service can be highly durable even if temporarily unavailable.
- Relational vs Non-relational database
- Relational databases store structured data in tables, use SQL, and support joins and ACID transactions. Non-relational databases use models like key-value or document, offering flexible schemas and horizontal scalability.
Key Terms
- Amazon S3
- Amazon Simple Storage Service, AWS's highly durable, scalable object storage service used for backups, static content, data lakes, and more.
- Amazon EBS
- Amazon Elastic Block Store, a service that provides persistent block storage volumes for use with Amazon EC2 instances in a single Availability Zone.
- Amazon EFS
- Amazon Elastic File System, a fully managed, elastic NFS file system for Linux workloads that multiple EC2 instances can mount concurrently.
- Amazon FSx
- A family of managed file system services on AWS, including FSx for Windows File Server, FSx for Lustre, and FSx for NetApp ONTAP, each optimized for specific workloads.
- Amazon RDS
- Amazon Relational Database Service, a managed service for setting up, operating, and scaling relational databases in the cloud.
- Durability
- A measure of how likely it is that stored data will remain intact and uncorrupted over time, often expressed as a number of nines.
- Availability
- The proportion of time that a system or service is operational and accessible, also often expressed as a number of nines.
- File storage
- Storage that organizes data into files and directories and exposes them via network file system protocols such as NFS or SMB.
- Amazon Aurora
- A MySQL- and PostgreSQL-compatible relational database engine built for the cloud, offering high performance and availability as part of Amazon RDS.
- Block storage
- Low-level storage that presents itself as raw fixed-size blocks, which an operating system formats into a file system and uses like a disk.
- Instance store
- Ephemeral block storage physically attached to the host server for an EC2 instance, providing high performance but no persistence after instance stop or termination.
- Object storage
- A storage model where data is stored as discrete objects (data plus metadata) in a flat address space and accessed via APIs, rather than as blocks or files.
- Amazon DynamoDB
- A fully managed, serverless NoSQL database service on AWS that provides key-value and document data models with single-digit millisecond latency.
- Relational database
- A database that stores data in structured tables with rows and columns, supports SQL, and enforces relationships and constraints between tables.
- Non-relational database
- A database that uses data models such as key-value, document, graph, or wide-column, typically offering flexible schemas and easy horizontal scaling.