SkarpSkarp

Chapter 15 of 26

High-Performing Storage Solutions with Amazon S3 and Block/File Storage

Storage choices can make or break performance; compare S3 storage classes, EBS volume types, and file services so you can tune for throughput, IOPS, and latency.

27 min readen

Performance Efficiency Pillar and Storage Overview

Why Storage Performance Matters

On the Solutions Architect – Associate exam, you often choose between S3, EBS, and file services based on throughput, IOPS, and latency. Picking the wrong storage type can bottleneck an otherwise well-designed system.

Performance Efficiency Pillar

Memorize this: "The performance efficiency pillar focuses on the efficient use of computing resources to meet requirements and maintain that efficiency as demand changes and technologies evolve." Storage is a key part of this.

Storage Categories

  • S3 (object): massive scale, high durability, good throughput, higher latency.
  • EBS (block): low-latency, consistent IOPS for EC2.
  • File (EFS/FSx): shared file system interface for apps expecting NFS or SMB.

Link to Earlier Modules

Earlier you optimized for resilience: backups, multi-AZ, multi-Region. Here you keep those patterns but tune for speed and efficient resource use, guided by access patterns and performance needs.

Amazon S3 Performance Fundamentals and Storage Classes

S3 Performance Basics

S3 is object storage with tens of milliseconds latency, very high throughput, and automatic scaling of request rates. You no longer need to design special key prefixes for performance.

Frequent Access Classes

  • S3 Standard: Default for hot data.
  • S3 Intelligent-Tiering: Same performance as Standard for active tiers, but auto-moves data based on access, charging a small monitoring fee.

Infrequent and Archive Classes

  • Standard-IA / One Zone-IA: Similar latency to Standard, cheaper storage, higher retrieval cost.
  • Glacier Instant Retrieval: Millisecond access for archives.
  • Glacier Flexible / Deep Archive: Minutes–hours retrieval, lowest cost.

Exam Angle

On the exam, focus on access frequency and retrieval time tolerance. Most online S3 classes have similar latency; the main tradeoffs are cost and how quickly you can restore archived data.

Choosing S3 Storage Classes for Performance and Cost

Scenario 1: Profile Photos

Frequent reads, small objects, must load quickly. Choose S3 Standard for low latency and high availability across multiple AZs. Intelligent-Tiering may add cost without much benefit.

Scenario 2: Compliance Logs

High write rate, rarely read after a few days. Use Intelligent-Tiering or lifecycle from Standard to Standard-IA or Glacier Instant Retrieval to balance cost and still allow reasonably quick reads.

Scenario 3: Long-Term Backups

Write once, almost never read; hours of restore time is acceptable. S3 Glacier Deep Archive gives the lowest cost, with retrieval taking hours, which is fine for rare audits.

Exam Trap: Archive but Fast

If the question stresses "archive" plus millisecond retrieval, the best match is S3 Glacier Instant Retrieval: cheaper than Standard but still fast to access.

Amazon EBS Volume Types and Performance Characteristics

What is EBS?

EBS is network-attached block storage for EC2 with low-latency access. You format it with a file system and mount it like a disk. It is ideal for single-instance workloads that need consistent IOPS.

General Purpose SSD (gp3/gp2)

Balanced price/performance for most workloads. gp3 lets you provision IOPS and throughput independently of volume size. Use for boot volumes, app servers, and many small to medium databases.

Provisioned IOPS SSD (io2)

For mission-critical, I/O-intensive databases needing very high IOPS and consistent low latency. You explicitly provision IOPS; cost is higher but performance is predictable.

HDD Volumes (st1, sc1)

st1: Throughput-optimized HDD for big, sequential workloads.

sc1: Cold HDD for infrequently accessed data. Both are unsuitable for boot volumes or random I/O-heavy applications.

Mapping EC2 Workloads to EBS Volume Types

Scenario: OLTP Database

Many small, random reads/writes, very performance-sensitive. Choose io2 for provisioned IOPS and low latency. This is the go-to for mission-critical transactional databases on EC2.

Scenario: Web/App Servers

Boot volume plus modest data and logs. Choose gp3 for balanced performance and cost. You can tune IOPS and throughput without growing the volume size.

Scenario: Big Data Analytics

Large, sequential scans and writes. Choose st1 (Throughput Optimized HDD) for high MB/s throughput at lower cost than SSD. Not suitable for random I/O or boot volumes.

Scenario: Cold Data on EC2

Large, infrequently accessed datasets where cost dominates. Choose sc1 (Cold HDD). Remember: for boot or random I/O, do not pick st1 or sc1 on the exam.

File Storage Options: Amazon EFS and Amazon FSx

Why File Storage?

Some apps need a shared file system with directories, file permissions, and file locks. They expect NFS or SMB, not S3 APIs. AWS offers EFS and FSx for these cases.

Amazon EFS Overview

EFS is a managed, elastic NFS file system for Linux. Many EC2 instances and containers can mount it at once. It has performance modes (General Purpose, Max I/O) and throughput modes (Bursting, Provisioned).

Amazon FSx Family

FSx offers managed file systems: Lustre (HPC), NetApp ONTAP, Windows File Server (SMB), and OpenZFS. These target specialized performance or protocol needs beyond EFS.

Exam Comparisons

Common comparisons: EFS vs FSx for Lustre (general vs HPC), EFS vs EBS (shared file system vs single-instance block), and FSx for Windows vs EFS (SMB/Windows vs NFS/Linux).

Thought Exercise: Mapping Workloads to S3, EBS, and File Storage

Work through these scenarios and decide which storage option (and subtype) you would choose. Think it through before you peek at the suggested answers.

  1. Media streaming platform
  • Stores video files; many users stream them concurrently.
  • Requirements: High throughput to many clients, global distribution via CloudFront, cost-effective at petabyte scale.
  • Question: Would you choose S3, EBS, or a file system (EFS/FSx)? Which S3 storage class?
  1. Shared home directory for Linux developers
  • Many EC2 instances need to share user home directories (`/home/users`), with POSIX permissions.
  • Requirements: Shared access, simple mount on many instances, reasonable latency.
  • Question: EBS, EFS, or FSx? Why?
  1. High-performance machine learning training
  • Large training datasets (TBs) must be read quickly by GPU instances.
  • Requirements: Very high throughput and low latency, often reading the same data repeatedly.
  • Question: EBS, EFS, or FSx? If FSx, which flavor?
  1. Windows-based line-of-business app
  • Multiple Windows servers need a shared file share with SMB, integrated with Active Directory.
  • Requirements: Familiar Windows file semantics, ACLs, and good performance.
  • Question: EFS or FSx? Which variant?

Pause and write down your answers. Then compare:

  • 1: S3 (object) with S3 Standard for hot media, fronted by CloudFront.
  • 2: EFS (NFS shared file system for Linux home dirs).
  • 3: FSx for Lustre (HPC/ML training with S3 integration), or high-performance io2 EBS if data is local to a single instance.
  • 4: FSx for Windows File Server (native SMB, AD integration).

Quiz: S3 and EBS Performance Choices

Check your understanding of S3 storage classes and EBS volume types.

An analytics team runs a Hadoop cluster on EC2 that scans multi-GB log files in large, sequential reads. They want good throughput at low cost. Which EBS volume type is the BEST fit for the data volumes attached to the worker nodes?

  1. gp3 General Purpose SSD
  2. io2 Provisioned IOPS SSD
  3. st1 Throughput Optimized HDD
  4. sc1 Cold HDD
Show Answer

Answer: C) st1 Throughput Optimized HDD

**st1 Throughput Optimized HDD** is designed for large, sequential I/O with high throughput (MB/s) at lower cost, which matches big log scans in Hadoop. gp3 and io2 are SSDs optimized for random I/O and cost more than needed here. sc1 is for infrequently accessed cold data and offers lower performance than st1, making it a weaker choice for active analytics.

Quiz: File Storage and Performance Efficiency

Test your understanding of file storage options and the performance efficiency pillar.

A team needs a shared file system for Linux-based microservices. They expect thousands of concurrent connections and want AWS to automatically scale throughput as the dataset grows. Which service and configuration is MOST appropriate?

  1. Single large gp3 EBS volume attached to one EC2 instance and shared via NFS
  2. Amazon EFS with General Purpose performance mode and Bursting throughput mode
  3. Amazon FSx for Windows File Server
  4. Amazon S3 Standard with an S3 bucket mounted via a third-party FUSE driver
Show Answer

Answer: B) Amazon EFS with General Purpose performance mode and Bursting throughput mode

**Amazon EFS with General Purpose performance mode and Bursting throughput mode** is designed as a managed, elastic NFS file system for Linux, automatically scaling throughput with size. Exporting EBS via NFS from one instance creates a single point of failure and scaling bottleneck. FSx for Windows targets SMB/Windows, not NFS/Linux. Mounting S3 via FUSE is not an AWS-managed, POSIX-compliant file system and has different semantics and performance characteristics.

Flashcards: Key Performance and Storage Terms

Flip these cards to reinforce core definitions and mappings you will need on the exam.

Performance efficiency pillar (definition)
The performance efficiency pillar focuses on the efficient use of computing resources to meet requirements and maintain that efficiency as demand changes and technologies evolve.
Best S3 class for frequently accessed, latency-sensitive data
S3 Standard – optimized for frequent access with low latency and high throughput across multiple Availability Zones.
Best EBS type for critical OLTP databases needing very high IOPS
Provisioned IOPS SSD (io2) – offers high, consistent IOPS and low latency suitable for mission-critical transactional databases.
Best EBS type for big, sequential analytics workloads
Throughput Optimized HDD (st1) – designed for large, sequential I/O with high throughput at lower cost than SSD.
Service: Managed NFS file system for Linux with elastic scaling
Amazon EFS – a managed, elastic NFS file system that can be mounted concurrently by many Linux-based clients in the same Region.
Service: High-performance file system for HPC and ML, integrated with S3
Amazon FSx for Lustre – provides very high throughput and low latency for compute-intensive workloads and can import/export data to S3.
When to prefer S3 Glacier Deep Archive
For long-term backups and compliance archives that are rarely accessed and can tolerate hours of retrieval time in exchange for the lowest storage cost.
Key difference: EBS vs EFS
EBS is block storage attached to a single EC2 instance (low-latency, per-instance disk), while EFS is a shared, managed NFS file system mountable by many instances concurrently.

Bringing It Together: Performance Efficiency Decisions and Next Steps

Storage and Performance Efficiency

Storage is a key lever for the performance efficiency pillar: "The performance efficiency pillar focuses on the efficient use of computing resources to meet requirements and maintain that efficiency as demand changes and technologies evolve."

Step 1: Interface Choice

Ask: object, block, or file? HTTP API and massive scale → S3. Single-instance disk → EBS. Shared file paths (NFS/SMB) → EFS or FSx.

Step 2: Pattern and Performance

Consider access frequency, random vs sequential I/O, latency sensitivity, and whether you need IOPS or throughput. Then map to S3 classes, EBS types, or EFS/FSx modes accordingly.

Step 3: Cost vs Performance

Start with general-purpose options (S3 Standard, gp3, EFS). Move to specialized (io2, FSx for Lustre) or archive (Glacier, sc1) only when the scenario clearly calls for it.

Key Terms

IOPS
Input/Output Operations Per Second, a measure of how many read or write operations a storage device can perform each second.
Latency
The time delay between a request for data and the start of its delivery, usually measured in milliseconds.
Amazon S3
A highly durable, scalable object storage service used for storing and retrieving any amount of data over the internet.
Amazon EBS
Amazon Elastic Block Store, a service that provides persistent block storage volumes for use with Amazon EC2 instances.
Amazon EFS
Amazon Elastic File System, a managed, elastic NFS file system for Linux-based workloads that can be mounted concurrently by many clients.
Amazon FSx
A family of managed file storage services providing fully featured file systems such as Lustre, NetApp ONTAP, Windows File Server, and OpenZFS.
Throughput
The amount of data that can be transferred per unit time, typically measured in MB/s for storage systems.
gp3 volume
A General Purpose SSD EBS volume type that lets you provision IOPS and throughput independent of storage size.
io2 volume
A Provisioned IOPS SSD EBS volume type designed for critical, I/O-intensive workloads needing high, consistent IOPS and low latency.
sc1 volume
A Cold HDD EBS volume type for infrequently accessed data where low storage cost is more important than performance.
st1 volume
A Throughput Optimized HDD EBS volume type for frequently accessed, large, sequential workloads requiring high throughput.
S3 storage class
A configuration option in Amazon S3 that defines cost, availability, and retrieval characteristics for objects (e.g., Standard, Intelligent-Tiering, Standard-IA, Glacier).
Amazon FSx for Lustre
A high-performance file system optimized for compute-intensive workloads like HPC and ML, with tight integration to Amazon S3.
Bursting throughput (EFS)
A mode where EFS throughput scales with the size of the file system and can temporarily exceed the baseline using burst credits.
Performance efficiency pillar
The performance efficiency pillar focuses on the efficient use of computing resources to meet requirements and maintain that efficiency as demand changes and technologies evolve.

Finished reading?

Test your understanding with a custom practice exam on this chapter.

Test yourself