SkarpSkarp

Chapter 12 of 20

Storage and Databases on AWS: Foundations of Persistent Data

Look under the hood of how AWS stores data long-term, from object storage to managed databases, and when to choose each option.

27 min readen

Big Picture: Persistent Data on AWS

Why Persistent Data Matters

Compute is temporary; memory is wiped when processes stop. Anything you want to keep long-term must live in storage or databases.

Two Big Categories

AWS splits persistent data into storage services (S3, EBS, EFS, FSx) and database services (RDS, Aurora, DynamoDB, and others).

What You Must Be Able To Do

For the exam, you must describe object vs block vs file storage, explain relational vs non-relational databases, and relate choices to durability, availability, performance, and cost.

Link to Global Infrastructure

Storage and databases live in Regions and AZs, so you can design for resilience and regulatory needs using the same building blocks as compute.

Core Storage Concepts: Durability, Availability, Performance, Cost

Durability

Durability is the probability your data remains intact over time. AWS achieves high durability by storing multiple copies across devices and often AZs.

Availability

Availability is how often the service is up and reachable. More availability generally means more redundancy and higher cost.

Performance

Performance includes latency (speed of one operation) and throughput (volume per second). Different workloads emphasize different aspects.

Cost Trade-offs

Cost includes $/GB-month, request pricing, and data transfer. Cheaper storage classes usually trade off access speed or availability.

Reading Scenarios

On the exam, clues like "infrequent access" or "millisecond latency" hint at which dimension is most important and which service family to choose.

Object Storage: Amazon S3

S3 as Object Storage

S3 stores data as objects in buckets. Each object has data and metadata, and is identified by a key. You access it over HTTP/HTTPS, not as a disk.

Durability and Scale

S3 is designed for extremely high durability by keeping multiple copies across many devices and usually multiple AZs, with virtually unlimited capacity.

Typical Use Cases

Use S3 for static websites, user uploads, data lakes, and backups/archives. It excels at write-once, read-many workloads.

Storage Classes Overview

Standard for frequent access, *-IA for infrequent access, and Glacier classes for archival. Cheaper storage usually means slower or priced retrieval.

Common Exam Trap

S3 is not a file system or disk. You do not run databases on S3; you store assets and backups in S3.

Block Storage: Amazon EBS and Instance Store

What Is Block Storage?

Block storage looks like a raw disk to an operating system. You format it with a file system and mount it to an EC2 instance.

Amazon EBS

EBS provides network-attached, persistent block volumes in a single AZ. You can snapshot them to S3 and choose SSD or HDD types for different workloads.

Instance Store

Instance store is physically attached to the EC2 host. It is very fast but ephemeral: data is lost when the instance stops or is terminated.

When to Use Which

Use EBS for boot volumes and databases; use instance store for temporary data like caches or scratch space where loss is acceptable.

Block vs Object

Block storage underlies OS and databases needing a disk; object storage like S3 is for API-accessed files and large, unstructured datasets.

File Storage: Amazon EFS and Amazon FSx

What Is File Storage?

File storage provides a shared file system with directories and file paths, accessed via NFS or SMB by multiple servers.

Amazon EFS

EFS is a managed NFS file system for Linux. It is elastic, shared across many EC2 instances, and spans multiple AZs in a Region.

Amazon FSx Family

FSx offers managed file systems tuned for specific needs, like Windows file shares (SMB) or high-performance Lustre for HPC and ML.

Choosing File Storage

Use file storage when apps expect a file system and multiple instances must share files. It avoids running your own NFS/SMB servers.

Exam Clues

Phrases like "shared file system" or "Windows file shares" strongly hint toward EFS or FSx rather than S3 or EBS.

Thought Exercise: Match Workloads to Storage Types

Use this thought exercise to solidify object vs block vs file storage.

For each scenario, decide which storage type (object, block, or file) is the best conceptual fit, then check the model answer mentally.

  1. Photo-sharing app uploads
  • Millions of user photos, accessed via a web/mobile app.
  • Needs high durability, global distribution via CDN, and simple HTTP access.
  • Which storage type? (Hint: object, block, or file?)
  1. Relational database for an e-commerce site
  • Requires low-latency random reads/writes, strict consistency, and frequent small updates.
  • Runs on an EC2 instance or a managed database service that expects a disk.
  • Which storage type under the hood?
  1. Shared content directory for a cluster of web servers
  • Multiple EC2 instances need to read and write the same configuration and media files.
  • Application expects a normal file system with paths like `/var/www/html`.
  • Which storage type?

Model answers (do not peek until you have tried):

  1. Photo-sharing uploads: Object storage (Amazon S3)
  2. E-commerce database: Block storage (Amazon EBS under RDS/EC2)
  3. Shared content directory: File storage (Amazon EFS or FSx)

As you practice more questions in this course, watch for these patterns. The exam rarely asks "what is block storage" outright; it describes a workload and expects you to infer the right family.

Relational Databases on AWS: RDS and Aurora

Relational Basics

Relational databases store data in tables with rows and columns, use SQL, and excel at strong consistency and complex joins.

Amazon RDS

RDS manages relational engines like MySQL and PostgreSQL. AWS handles provisioning, patching, and backups, you handle schema and queries.

Availability Features

RDS supports automated backups, Multi-AZ deployments for high availability, and read replicas to scale reads for some engines.

Amazon Aurora

Aurora is a high-performance, MySQL/PostgreSQL-compatible engine whose storage auto-scales and replicates across multiple AZs.

When to Use Relational

Choose relational databases for transactions, strong integrity, and structured data like orders, customers, and inventory.

Non-Relational Databases: DynamoDB and Beyond

What Is Non-Relational?

Non-relational databases trade some relational features for scalability, flexibility, or speed, focusing on specific data models.

Amazon DynamoDB

DynamoDB is a fully managed key-value and document database. It is serverless and designed for millisecond latency at massive scale.

DynamoDB Concepts

Data lives in tables, items, and attributes. The primary key controls distribution; you pick on-demand or provisioned capacity.

DynamoDB Use Cases

Use DynamoDB for high-traffic backends, user profiles, sessions, IoT data, and workloads needing consistent low latency at scale.

Relational vs Non-Relational

Relational emphasizes schema and joins; non-relational emphasizes flexible schema and scale. DynamoDB does not support SQL joins.

Quick Check: Storage Types

Test your understanding of object, block, and file storage.

A startup runs a Linux web application on multiple EC2 instances across two AZs. All instances must read and write to the same set of configuration files using normal file paths. Which AWS storage option is the best fit?

  1. Amazon S3 Standard
  2. Amazon Elastic Block Store (EBS) General Purpose SSD
  3. Amazon Elastic File System (EFS)
  4. Instance store volumes
Show Answer

Answer: C) Amazon Elastic File System (EFS)

Amazon EFS provides a managed, shared NFS file system that multiple EC2 instances across AZs can mount and use with normal file paths. S3 is object storage (not a mounted file system), EBS volumes attach to a single instance in one AZ, and instance store is ephemeral and not shared.

Quick Check: Relational vs Non-Relational

Test how well you can choose between RDS and DynamoDB.

A financial application needs ACID transactions and complex SQL joins across multiple tables (accounts, customers, transactions). Which AWS database service family is the best conceptual fit?

  1. Amazon DynamoDB
  2. Amazon RDS / Amazon Aurora
  3. Amazon ElastiCache
  4. Amazon S3 with CSV files
Show Answer

Answer: B) Amazon RDS / Amazon Aurora

Relational databases like those managed by Amazon RDS and Amazon Aurora are designed for ACID transactions and complex SQL joins. DynamoDB is non-relational and does not support joins, ElastiCache is for in-memory caching, and S3 is object storage, not a transactional database.

Putting It Together: Designing a Simple Web App Stack

Scenario Overview

Design an online store: users browse products, upload photos, and place orders. You need strong consistency and AZ-level resilience.

Compute Layer

Run the app on multiple EC2 instances or managed compute across two AZs to survive a single AZ failure.

Object Storage with S3

Store product and profile images in S3, possibly fronted by CloudFront for global, low-latency delivery.

Relational Data with RDS

Use Amazon RDS or Aurora for catalog, orders, and users. RDS handles backups and Multi-AZ; it uses EBS under the hood.

Optional Acceleration

Add DynamoDB or ElastiCache for fast session storage or key-value lookups if the workload demands it.

Durability and Lifecycle

Leverage RDS backups to S3 and S3 lifecycle rules to move old data to Glacier classes for lower-cost archiving.

Key Term Review: Storage and Databases

Flip through these cards to reinforce core concepts before moving on.

Object storage (on AWS)
A storage model where data is stored as objects (data + metadata) inside buckets and accessed via APIs or HTTP/HTTPS. On AWS, Amazon S3 is the primary object storage service.
Block storage (on AWS)
Storage that presents itself as raw disk volumes to an operating system. You format and mount it like a drive. On AWS, Amazon EBS and instance store provide block storage for EC2.
File storage (on AWS)
Shared storage that exposes a file system interface over protocols like NFS or SMB, allowing multiple servers to access the same files. On AWS, Amazon EFS and Amazon FSx provide managed file storage.
Amazon S3 bucket
A logical container for objects in Amazon S3. Each bucket holds many objects, each identified by a unique key within that bucket.
Amazon EBS volume
A persistent block storage volume for use with Amazon EC2 instances in a single Availability Zone, supporting low-latency read/write operations.
Instance store
Ephemeral block storage physically attached to the host running an EC2 instance. It offers very high performance but data persists only for the life of the instance.
Amazon EFS
A fully managed, elastic NFS file system for Linux-based workloads that can be mounted concurrently by multiple EC2 instances across multiple AZs in a Region.
Amazon RDS
A managed relational database service that supports engines like MySQL, PostgreSQL, MariaDB, Oracle Database, SQL Server, and Amazon Aurora, handling provisioning, patching, and backups.
Amazon Aurora
A high-performance, MySQL- and PostgreSQL-compatible relational database engine in the RDS family, with storage that automatically scales and replicates across multiple AZs.
Amazon DynamoDB
A fully managed, serverless key-value and document database service designed for single-digit millisecond latency at any scale, without managing servers or storage.
Durability vs Availability
Durability is the likelihood that data remains intact and uncorrupted over time. Availability is how often the service is up and reachable. A service can be highly durable even if temporarily unavailable.
Relational vs Non-relational database
Relational databases store structured data in tables, use SQL, and support joins and ACID transactions. Non-relational databases use models like key-value or document, offering flexible schemas and horizontal scalability.

Key Terms

Amazon S3
Amazon Simple Storage Service, AWS's highly durable, scalable object storage service used for backups, static content, data lakes, and more.
Amazon EBS
Amazon Elastic Block Store, a service that provides persistent block storage volumes for use with Amazon EC2 instances in a single Availability Zone.
Amazon EFS
Amazon Elastic File System, a fully managed, elastic NFS file system for Linux workloads that multiple EC2 instances can mount concurrently.
Amazon FSx
A family of managed file system services on AWS, including FSx for Windows File Server, FSx for Lustre, and FSx for NetApp ONTAP, each optimized for specific workloads.
Amazon RDS
Amazon Relational Database Service, a managed service for setting up, operating, and scaling relational databases in the cloud.
Durability
A measure of how likely it is that stored data will remain intact and uncorrupted over time, often expressed as a number of nines.
Availability
The proportion of time that a system or service is operational and accessible, also often expressed as a number of nines.
File storage
Storage that organizes data into files and directories and exposes them via network file system protocols such as NFS or SMB.
Amazon Aurora
A MySQL- and PostgreSQL-compatible relational database engine built for the cloud, offering high performance and availability as part of Amazon RDS.
Block storage
Low-level storage that presents itself as raw fixed-size blocks, which an operating system formats into a file system and uses like a disk.
Instance store
Ephemeral block storage physically attached to the host server for an EC2 instance, providing high performance but no persistence after instance stop or termination.
Object storage
A storage model where data is stored as discrete objects (data plus metadata) in a flat address space and accessed via APIs, rather than as blocks or files.
Amazon DynamoDB
A fully managed, serverless NoSQL database service on AWS that provides key-value and document data models with single-digit millisecond latency.
Relational database
A database that stores data in structured tables with rows and columns, supports SQL, and enforces relationships and constraints between tables.
Non-relational database
A database that uses data models such as key-value, document, graph, or wide-column, typically offering flexible schemas and easy horizontal scaling.

Finished reading?

Test your understanding with a custom practice exam on this chapter.

Test yourself