SkarpSkarp

Chapter 13 of 26

High-Performing Storage Architectures with Amazon S3 and EBS

Storage choices can make or break performance. This module dives into Amazon S3 and EBS performance characteristics and shows how to align them with throughput, latency, and durability requirements.

27 min readen

Big Picture: Storage, Performance, and the Exam

Why Storage Performance Matters

Storage choices affect latency, throughput, durability, and cost. On the exam, many scenarios are really testing whether you understand how S3 and EBS behave under load.

S3 vs EBS at a Glance

  • Amazon S3: object storage, massive scale, very high durability.
  • Amazon EBS: block storage for EC2, low-latency consistent I/O for disks and databases.

Link to Well-Architected

This module maps to the performance efficiency, reliability, and cost optimization pillars. You must balance speed, consistency, and price in your design choices.

Your Outcomes

You will learn to pick S3 classes, choose EBS volume types, design upload patterns, and combine S3, EBS, and EC2 in exam-style scenarios.

Amazon S3 Performance Basics: Throughput, Latency, and Request Rates

S3 Scales Horizontally

S3 is built for massive scale. It can handle very high request rates and large throughput when you use parallelism and good access patterns.

Latency Expectations

S3 latency is usually single to low double-digit ms from the same Region. It is not as fast as EBS or local disks for per-record transactional I/O.

Throughput Techniques

To increase S3 throughput: use multipart uploads for large objects, multiple parallel connections, and keep compute in the same Region as the bucket.

Consistency and Durability

S3 now offers strong consistency for all operations and about 11 nines of durability for standard classes. It is great for durable, large-scale object storage.

Exam Reminder

If the scenario needs sub-millisecond, transactional reads and writes (for example, databases), S3 is not the primary storage. Think EBS or database services.

S3 Storage Classes and Performance-Related Behavior

Hot Storage: S3 Standard

S3 Standard is for frequently accessed, performance-sensitive data. It offers high availability and low latency for most general-purpose workloads.

S3 Intelligent-Tiering

Intelligent-Tiering keeps performance like Standard for active tiers, but can move data to cheaper archive tiers with slower retrieval to save cost.

Infrequent Access Classes

S3 Standard-IA and One Zone-IA give similar latency to Standard but are cheaper to store and more expensive to retrieve. One Zone-IA uses a single AZ.

Glacier Family

Glacier Instant Retrieval is fast but archival; Flexible Retrieval and Deep Archive have retrieval delays from minutes to hours and require restore jobs.

Lifecycle Policies and Performance

Lifecycle rules move objects to colder storage to cut cost, not to improve speed. Do not archive data that must be accessed quickly and frequently.

Designing S3 Access Patterns: Parallelism, Prefixes, and Multipart Uploads

Object Keys and Prefixes

Keys like `logs/2026/05/28/app1/...` act as prefixes. S3 now scales better per prefix, but well-structured keys still help you parallelize processing.

Why Multipart Uploads

Multipart uploads split large files into parts that upload in parallel. This boosts throughput and lets you retry only failed parts.

Parallel Access

Use multiple threads or processes to read and write objects concurrently. AWS CLI and SDKs provide transfer managers that handle parallelism.

Network and Region

Keep compute and S3 in the same Region. For distant clients needing fast uploads, consider S3 Transfer Acceleration to use edge locations.

Exam Signal

Phrases like "very large files" and "maximize upload speed" hint at multipart uploads plus parallel connections as the correct design.

Amazon EBS Volume Types: Performance Profiles (gp3, io1/io2, st1, sc1)

EBS Volume Families

EBS offers SSD-based (gp3, io1, io2) and HDD-based (st1, sc1) volumes. Each is tuned for different performance patterns and workloads.

gp3: General Purpose SSD

gp3 is the default choice for many workloads. It balances cost and performance and lets you provision IOPS and throughput independently of size.

io1/io2: Provisioned IOPS SSD

io1 and io2 deliver high, consistent IOPS for large, latency-sensitive databases. io2 adds higher durability and stronger SLAs.

st1 and sc1: HDD Options

st1 is for high-throughput sequential workloads like big data. sc1 is the cheapest for cold, infrequently accessed data with sequential access.

Exam Reminder

Avoid HDD volumes for boot and transactional databases. Choose gp3 or io1/io2 for those workloads, depending on required IOPS and consistency.

IOPS vs Throughput: Matching EBS to Workload Patterns

Key Metrics: IOPS vs Throughput

IOPS is the number of operations per second. Throughput is MB/s. Latency is how long each operation takes. Different workloads stress different metrics.

Small, Random I/O

OLTP and key-value databases perform many small random reads and writes. They need high IOPS and low latency, so prefer SSD volumes (gp3, io1, io2).

Large, Sequential I/O

Analytics and log processing read or write big, sequential chunks. They are limited by throughput, so st1 can be a good, low-cost option.

Mixed Workloads and Tuning

Many systems mix patterns. Start with gp3, monitor performance, then adjust provisioned IOPS/throughput or move to io1/io2 if needed.

Exam Clues

Phrases like "thousands of small transactions" hint at IOPS-focused SSDs. "Scan terabytes of logs" hints at throughput-focused options like st1.

Worked Scenarios: Choosing Between S3 and EBS for Performance

Scenario 1: Video Streaming

Petabytes of video, high durability, and global access point to S3. Use S3 Standard for hot videos and IA or Intelligent-Tiering for older content.

Why Not EBS for Video

EBS volumes are limited in size and tied to EC2. They are not ideal for petabyte-scale, multi-consumer content distribution workloads.

Scenario 2: E-commerce Database

Thousands of small, low-latency transactions per second indicate a relational or NoSQL database backed by SSD EBS volumes, not S3.

Choosing EBS Type

For mission-critical, heavy OLTP, io2 is a strong choice. For moderate loads, gp3 with appropriate IOPS and throughput is often sufficient.

Blending S3 and EBS

Use EBS for hot, transactional data, and S3 for backups, exports, and archival logs. This balances performance, durability, and cost.

Design Exercise: Picking Storage for Three Workloads

Work through this thought exercise. For each workload, decide:

  1. S3 or EBS as the primary storage.
  2. If S3: which storage class?
  3. If EBS: which volume type?

Then compare your reasoning with the suggested answers.

Workload A: Log analytics cluster

  • Hundreds of GB of application logs generated daily.
  • Logs are written once, then scanned in large batches every night for analytics.
  • Cost is important; occasional extra seconds of query time are acceptable.

Pause and decide.

Suggested design:

  • Primary storage: S3.
  • Class: S3 Standard for the last few days of logs, transition older logs to Standard-IA or Glacier Instant Retrieval via lifecycle.
  • Rationale: Write-once, read-many, large sequential scans. S3 gives durability and cheap storage. Analytics tools (Athena, EMR) read directly from S3.

Workload B: Shared file system for a legacy app

  • Multiple EC2 instances must mount the same file system.
  • Mix of small and medium files, frequent reads/writes, low latency required.

Pause and decide.

Suggested design:

  • Primary: EBS is not shareable across instances, so use EFS for a shared file system (remember this for the broader exam). However, if the question forces a choice between S3 and EBS for low-latency, POSIX-like access, EBS is closer.
  • Volume type: If each instance needs its own disk, gp3 for balanced performance.

Workload C: Cold image archive

  • Tens of TB of medical images.
  • Accessed a few times per year for audits.
  • Retrieval can wait hours; cost must be minimized.

Suggested design:

  • Primary: S3 Glacier Deep Archive.
  • Use lifecycle from S3 Standard for the first 30–90 days, then move to Deep Archive.
  • Rationale: Very infrequent access, tolerant to long retrieval times, extremely low storage cost.

Quiz 1: S3 Performance and Storage Classes

Check your understanding of S3 performance and storage classes.

A company stores 500 TB of log files in S3. Logs are written continuously and queried several times per day using Athena. They want to reduce storage cost but keep millisecond access for queries. Which is the BEST choice?

  1. Move all logs immediately to S3 Glacier Deep Archive using a lifecycle rule.
  2. Store recent logs in S3 Standard and transition logs older than 30 days to S3 Standard-IA.
  3. Store all logs in S3 Intelligent-Tiering Archive tier.
  4. Move logs to S3 One Zone-IA in another Region.
Show Answer

Answer: B) Store recent logs in S3 Standard and transition logs older than 30 days to S3 Standard-IA.

They need millisecond access several times per day, so Glacier Deep Archive and archive tiers are too slow. S3 Standard-IA keeps millisecond latency but lowers storage cost for older, less frequently accessed logs. One Zone-IA reduces resilience and cross-Region adds latency without clear benefit.

Quiz 2: EBS Volume Selection and I/O Patterns

Test your understanding of EBS volume types and I/O characteristics.

An EC2-based MySQL database handles thousands of small, random read/write operations per second and must keep latency as low and consistent as possible. Which EBS volume type is MOST appropriate?

  1. Cold HDD (sc1)
  2. Throughput Optimized HDD (st1)
  3. General Purpose SSD (gp3) or Provisioned IOPS SSD (io2), depending on required IOPS
  4. S3 Standard with lifecycle rules to S3 Glacier
Show Answer

Answer: C) General Purpose SSD (gp3) or Provisioned IOPS SSD (io2), depending on required IOPS

Databases with many small random I/Os need SSD-based volumes. gp3 works for many workloads; for very high, consistent IOPS and mission-critical use, io2 is preferred. st1 and sc1 are HDD types optimized for sequential throughput, not low-latency random I/O. S3 is not suitable as primary storage for a transactional database.

Key Term Flashcards: S3 and EBS Performance

Flip through these cards to reinforce key concepts and exam-ready phrases.

When should you use multipart uploads in S3?
Use multipart uploads for large objects (recommended for 100 MB+, strongly recommended for 5 GB+) to increase throughput and allow retrying individual parts.
S3 Standard vs S3 Standard-IA: performance difference?
Both provide millisecond access latency and similar per-request performance. Standard-IA has lower storage cost but higher retrieval and early deletion costs, and is intended for infrequently accessed data.
Best EBS volume type for high IOPS, low-latency databases?
Provisioned IOPS SSD volumes (io1 or io2), with io2 offering higher durability and better SLA for mission-critical databases.
What does gp3 let you configure independently?
gp3 lets you provision IOPS and throughput independently of volume size, giving flexible performance tuning without overprovisioning storage.
IOPS vs Throughput in storage performance
IOPS is the number of I/O operations per second and matters for small, random I/O. Throughput is MB/s and matters for large, sequential reads and writes.
Primary reason to choose st1 over gp3
Choose st1 when you need high, cost-effective throughput for large, sequential workloads (for example, big data, log processing) and can tolerate higher latency than SSD.
Is S3 suitable as primary storage for OLTP databases?
No. S3 has higher latency and object semantics. Use EBS SSD volumes or managed database storage for OLTP; S3 is better for backups, exports, and logs.
Performance impact of S3 lifecycle policies
Lifecycle policies are for cost and data placement over time. They do not improve performance; moving data to colder classes usually reduces immediate accessibility.
How to increase S3 read/write throughput from EC2?
Use multiple parallel connections or threads, multipart uploads/downloads for large objects, and ensure EC2 and the S3 bucket are in the same Region.
Durability of standard S3 storage classes
Standard S3 storage classes are designed for 99.999999999% (11 nines) durability across multiple Availability Zones in a Region.

Putting It All Together: Balancing Performance, Cost, and Availability

S3: Where It Shines

Use S3 for large, durable objects, logs, backups, and data lakes. Combine multipart uploads and parallelism to unlock very high throughput.

EBS: Where It Shines

Use EBS for low-latency block storage on EC2: boot volumes, app disks, and databases. Match volume type to IOPS or throughput needs.

Balancing AWS Pillars

Balance performance efficiency, reliability, and cost optimization. Choose storage classes and volume types that meet requirements without overspending.

Exam Strategy

In scenarios, look for clues about access frequency, latency, I/O pattern, and cost sensitivity. Map those directly to S3 vs EBS and specific classes or volume types.

Key Terms

gp3
General Purpose SSD EBS volume type that offers a balance of price and performance and allows independent configuration of IOPS and throughput from volume size.
sc1
Cold HDD EBS volume type designed for infrequently accessed, large, sequential workloads where the lowest storage cost is required and performance is less critical.
st1
Throughput Optimized HDD EBS volume type designed for frequently accessed, throughput-intensive, large, sequential workloads like big data and log processing.
IOPS
Input/Output Operations Per Second; a measure of how many read or write operations a storage system can perform each second.
Latency
The time delay between initiating an I/O request and its completion; low latency is critical for transactional workloads.
io1/io2
Provisioned IOPS SSD EBS volume types that let you specify a high, consistent number of IOPS for I/O-intensive, latency-sensitive workloads such as large databases.
Amazon S3
An object storage service that offers industry-leading scalability, data availability, security, and performance for storing and retrieving any amount of data from anywhere.
Amazon EBS
A block storage service designed for use with Amazon EC2 that provides persistent, low-latency storage volumes for workloads such as file systems, databases, and applications.
Throughput
The amount of data transferred per unit time, typically measured in megabytes per second (MB/s), indicating how quickly large amounts of data can be read or written.
Multipart upload
An S3 feature that allows you to upload a single object as a set of parts in parallel, improving throughput and reliability for large objects.
S3 storage class
A configuration for S3 objects that defines durability, availability, retrieval characteristics, and cost (for example, S3 Standard, Standard-IA, Intelligent-Tiering, Glacier classes).
Reliability pillar
The reliability pillar encompasses the ability of a workload to perform its intended function correctly and consistently when it’s expected to. This includes the ability to operate and test the workload through its total lifecycle.
S3 lifecycle policy
A set of rules that define how S3 automatically transitions objects between storage classes or expires them over time to optimize cost.
S3 Intelligent-Tiering
An S3 storage class that automatically moves objects between frequent and infrequent access tiers, and optionally archive tiers, based on changing access patterns.
S3 Glacier Deep Archive
An S3 storage class optimized for long-term data archiving with the lowest storage cost and retrieval times of hours.
Cost optimization pillar
The cost optimization pillar includes the continual process of refinement and improvement of a system over its entire lifecycle to build and operate cost-aware systems that achieve business outcomes and minimize costs.
Performance efficiency pillar
The performance efficiency pillar focuses on the efficient use of computing resources to meet requirements and maintain that efficiency as demand changes and technologies evolve.

Finished reading?

Test your understanding with a custom practice exam on this chapter.

Test yourself