Chapter 17 of 26
Cost-Optimized Storage: Amazon S3, EBS, and Lifecycle Management
Storage can quietly dominate your AWS bill if you’re not intentional. This module shows how to use Amazon S3 storage classes, lifecycle policies, and EBS options to minimize cost without sacrificing required performance or durability.
Big Picture: Storage Cost Optimization on AWS
Why Storage Costs Matter
On AWS, storage is easy to spin up and hard to keep under control. Old logs, snapshots, and unused volumes quietly accumulate, and your monthly bill grows even when traffic is flat.
Well-Architected Context
This module mainly targets the cost optimization pillar: "The cost optimization pillar includes the continual process of refinement and improvement of a system over its entire lifecycle to build and operate cost-aware systems that achieve business outcomes and minimize costs."
Connected Pillars
Storage choices also affect the performance efficiency pillar and reliability pillar, because the storage type you pick changes latency, throughput, and durability characteristics.
Your Exam Task
Exam questions often hide cost decisions inside performance or durability scenarios. You must map different data types to the right mix of S3 storage classes, lifecycle rules, and EBS volume types.
What You Will Do
You will learn to pick S3 storage classes, design lifecycle policies, choose EBS volumes and snapshots, and decide when S3 vs EBS is the most economical choice for a workload.
Amazon S3 Storage Classes: Cost, Durability, and Access
S3 Basics
Amazon S3 is object storage with very high durability. Storage classes differ in availability, minimum storage duration, retrieval cost, and per-GB price, not in basic API.
S3 Standard
S3 Standard offers high availability, no minimum storage duration, and low latency. It is the most expensive per GB but has no retrieval fees.
S3 Standard-IA
Standard-IA is cheaper per GB but adds per-GB retrieval fees and a minimum storage duration. It suits data accessed less often but still needing millisecond access.
S3 One Zone-IA
One Zone-IA stores data in a single AZ, making it cheaper but less resilient. Use only if you can tolerate AZ loss or have backups elsewhere.
S3 Intelligent-Tiering
Intelligent-Tiering automatically moves objects between frequent and infrequent tiers based on access patterns, charging a small monitoring fee per object.
Durability vs Availability Trap
Exam trap: most S3 classes share the same high durability. Lower price typically comes from lower availability, retrieval fees, and minimum storage durations, not weaker durability.
Archival with S3 Glacier Classes (High-Level View)
Why Glacier?
S3 Glacier classes are for long-term, rarely accessed data. They keep S3 durability but trade away instant access for very low storage cost.
Glacier Instant Retrieval
Glacier Instant Retrieval offers millisecond access at lower cost than Standard-IA, with minimum storage duration and retrieval fees.
Glacier Flexible Retrieval
Glacier Flexible Retrieval targets archives with retrieval times of minutes to hours. It is cheaper than instant classes but has more complex retrieval pricing.
Glacier Deep Archive
Glacier Deep Archive is the lowest-cost S3 class with retrieval in hours. It suits data you almost never need but must retain for years.
Exam Pattern
On exams, Glacier appears when retention is in years, access is very rare, and retrieval can take minutes to hours. Do not use Deep Archive for data you query weekly.
Designing S3 Lifecycle Policies for Cost Savings
What Lifecycle Policies Do
S3 lifecycle policies automatically transition objects between storage classes and delete them over time, keeping long-lived data from inflating your bill.
Scope of Rules
Rules can target a whole bucket, a prefix like logs/app1/, or objects with specific tags such as env=prod or data_type=logs.
Lifecycle Actions
Key actions: transition to cheaper storage after N days, expire objects after N days, and delete noncurrent versions in versioned buckets.
Sample Log Policy
Example: 0–30 days in Standard, then to Standard-IA, then to Glacier, then delete after 365 days. This follows a warm → cold → archive → delete pattern.
Minimum Duration Trap
Respect minimum storage durations. If a class has a 30-day minimum, deleting or moving objects earlier still incurs 30 days of charges.
Evaluation Frequency
Lifecycle rules are evaluated once per day, which is usually fine for cost control and does not affect exam answers.
Hands-On Thought Exercise: Build a Lifecycle Policy
Scenario Overview
You run a web app with two data types: high-volume access logs and user-uploaded images. Each has different access and retention needs.
Log Requirements
Logs are heavily analyzed for 7 days, then rarely read but must be retained for 1 year. After 1 year they can be deleted.
Image Requirements
Images are frequently accessed for 30 days, then access becomes unpredictable. They must be kept for at least 3 years.
Sample Log Strategy
Store logs in Standard for 7 days, then Standard-IA until day 30, then Glacier Flexible Retrieval, and finally delete after 365 days.
Sample Image Strategy
Store images in Standard for 30 days, then move to Intelligent-Tiering for the rest of the 3-year retention period.
Exam Traps
Avoid pushing user images to Glacier after 30 days or using One Zone-IA for compliance logs if losing an AZ is not acceptable.
Amazon EBS: Pricing Dimensions and Volume Types
What EBS Is For
EBS is block storage for EC2, ideal for OS disks and databases needing low-latency, consistent I/O. It is AZ-scoped and attached to one instance at a time.
EBS Pricing Dimensions
EBS cost depends on volume type, provisioned size, provisioned IOPS and throughput (for some types), and snapshots stored in S3.
gp3 vs gp2
gp3 is the modern general-purpose SSD, cheaper per GB than gp2 and lets you provision IOPS and throughput separately from size. gp2 ties IOPS to volume size.
Provisioned IOPS SSD
io1/io2 volumes are for high-performance databases. You explicitly provision and pay for IOPS. io2 offers higher durability and IOPS density than io1.
HDD Volume Types
st1 is throughput-optimized HDD for large, sequential workloads. sc1 is cold HDD for infrequently accessed data with the lowest cost per GB.
Exam Mapping
Relational DB with heavy transactions → gp3 or io2. Large sequential analytics scans → st1. For cost optimization, prefer gp3 when you need extra performance.
EBS Snapshots and Cost-Aware Backup Patterns
What EBS Snapshots Are
EBS snapshots are incremental, point-in-time backups of volumes stored in S3. They are used for backup, recovery, and cloning volumes.
Incremental Storage
After the first snapshot, only changed blocks are saved. This incremental model keeps snapshot storage costs lower than repeated full copies.
Cost Risks
You pay per GB-month of snapshot data. Old snapshots that are never deleted will quietly add to your monthly bill.
Backup Pattern
Define RPO/RTO, schedule snapshots to meet RPO, and set retention like daily for 7 days, weekly for 3 months, and monthly for 1 year.
Automation Tools
Use EBS Data Lifecycle Manager or AWS Backup to automate snapshot creation and deletion instead of managing them manually.
Exam Trap
S3 lifecycle rules do not manage EBS snapshots. Snapshots live in S3 internally, but appear only in EBS and backup services, not as S3 objects.
S3 vs EBS: Cost and Performance Trade-Offs
Different Storage Models
EBS is block storage for a single EC2 instance in one AZ. S3 is object storage accessed via HTTP APIs, designed for massive scale and multi-AZ durability.
EBS Cost Model
EBS charges per provisioned GB and, for some types, provisioned IOPS and throughput, regardless of how much you actually use.
S3 Cost Model
S3 charges per GB stored plus request and retrieval fees, with multiple storage classes for different access patterns and price points.
Workload Mapping
Databases and OS disks → EBS. Static website assets and media → S3. Data lakes and analytics → S3. Backups and archives → S3 or Glacier.
Common Exam Traps
You cannot run RDS directly on S3. Using EBS for long-term log storage is usually a cost-inefficient design compared to S3.
Mental Shortcut
Think: EBS for live, low-latency block workloads; S3 for shared, scalable, tiered-cost object storage, especially logs, backups, and static content.
Quiz 1: S3 Storage Classes and Lifecycle
Test your understanding of S3 storage classes and lifecycle design.
A company stores user activity logs that are heavily queried for 14 days, then rarely accessed but must be retained for 2 years. They want to minimize storage cost while keeping retrieval possible within minutes. Which lifecycle policy is MOST appropriate?
- Keep logs in S3 Standard for 2 years, then delete.
- Store logs in S3 Standard for 14 days, then transition to S3 Standard-IA, then to S3 Glacier Flexible Retrieval after 30 days, and delete after 2 years.
- Store logs immediately in S3 Glacier Deep Archive and keep for 2 years.
- Store logs in S3 One Zone-IA for 2 years to minimize cost.
Show Answer
Answer: B) Store logs in S3 Standard for 14 days, then transition to S3 Standard-IA, then to S3 Glacier Flexible Retrieval after 30 days, and delete after 2 years.
Option 2 uses S3 Standard during the heavy-query period, then Standard-IA for warm but less-frequent access, then Glacier Flexible Retrieval for long-term archives with minute-level retrieval, and finally deletes after 2 years. Deep Archive (option 3) would make retrieval too slow, and One Zone-IA (option 4) is risky for compliance logs. Keeping everything in Standard (option 1) is simple but not cost-optimized.
Quiz 2: EBS Volume Types and Cost
Check your understanding of EBS cost and performance trade-offs.
You run a MySQL database on EC2. Performance profiling shows it needs higher IOPS, but the current gp2 volume is already much larger than the data set because you kept increasing size to get more IOPS. You want to reduce cost while meeting IOPS needs. What is the BEST change?
- Switch to sc1 and double the volume size to increase throughput.
- Stay on gp2 and continue increasing volume size until IOPS requirements are met.
- Migrate the database to a smaller gp3 volume and provision the required IOPS and throughput separately.
- Move the database files to S3 Standard and mount S3 as a filesystem.
Show Answer
Answer: C) Migrate the database to a smaller gp3 volume and provision the required IOPS and throughput separately.
gp3 allows you to provision IOPS and throughput independently from volume size, so you can shrink the volume to match actual data size and still meet performance needs. sc1 is a cold HDD type, not suitable for a transactional database. Continuing to grow gp2 wastes storage, and S3 cannot be used as a block device for a MySQL database.
Design Challenge: S3 vs EBS for a Data Pipeline
Consider this end-to-end analytics pipeline (building on your previous modules about ingestion and transformation):
- Raw clickstream data is ingested in real time from clients.
- A streaming application aggregates and enriches events.
- A nightly batch job runs heavy analytical queries over the last 90 days of data.
- Analysts occasionally run historical queries over the last 3 years.
Your task:
- Decide where each stage stores its data (S3 vs EBS), and which S3 storage classes you would use.
- Describe a lifecycle policy for the historical data.
- Identify where EBS snapshots make sense, if at all.
Pause and think through your design before reading a sample approach.
Sample approach (for self-check):
- Ingested raw data: land in S3 Standard (or Intelligent-Tiering if access patterns are unclear), partitioned by date/hour.
- Streaming aggregation state: kept on EBS gp3 attached to the streaming application instances (low-latency stateful processing).
- Analytical queries: run using EMR, Athena, or similar directly on S3 data; no need for EBS to store the bulk dataset.
- Lifecycle for analytics data:
- 0–90 days: S3 Standard (frequent queries).
- 90 days–1 year: Standard-IA.
- 1–3 years: Glacier Instant Retrieval or Glacier Flexible Retrieval depending on how fast analysts need results.
- Delete after 3 years if allowed by policy.
- EBS snapshots: used for backing up the streaming application's stateful EBS volumes and any EC2-based databases, not for the S3-based data lake itself.
Reflect: Would you change any classes if queries became more or less frequent? How would that affect cost?
Key Term Review: S3, EBS, and Lifecycle
Flip through these flashcards to reinforce core terms and exam-ready distinctions.
- S3 Standard
- General-purpose S3 storage class with high availability, low latency, no minimum storage duration, and no retrieval fees. Best for frequently accessed, performance-sensitive data.
- S3 Standard-IA
- Infrequent Access storage class with lower cost per GB than Standard but with per-GB retrieval fees and a minimum storage duration. Good for data accessed less often but needing fast retrieval.
- S3 One Zone-IA
- Infrequent Access class that stores data in a single AZ, reducing cost but also resilience. Suitable only when you can tolerate AZ loss or have backups elsewhere.
- S3 Intelligent-Tiering
- S3 storage class that automatically moves objects between frequent and infrequent access tiers based on usage, charging a small monitoring fee per object. Useful when access patterns are unpredictable.
- S3 Glacier Flexible Retrieval
- Archival S3 storage class with very low storage cost and retrieval times from minutes to hours. Ideal for long-term archives that are accessed rarely but must remain retrievable.
- S3 Lifecycle Policy
- Configuration on an S3 bucket that defines rules to automatically transition objects between storage classes and expire (delete) them based on age or version status.
- EBS gp3
- General-purpose SSD EBS volume type that is cheaper per GB than gp2 and allows provisioning of IOPS and throughput independently from volume size.
- EBS gp2
- Older general-purpose SSD EBS volume type where performance (IOPS) scales with volume size. To increase IOPS you must increase the volume size.
- EBS io2
- Provisioned IOPS SSD volume type designed for high-performance, latency-sensitive workloads like large databases. Offers high durability and configurable IOPS independent of size.
- EBS Snapshot
- Incremental, point-in-time backup of an EBS volume stored in S3. Charged per GB-month of data stored and used for backup, restore, and cloning volumes.
- S3 vs EBS (core distinction)
- S3 is region-level object storage with tiered cost options and HTTP access. EBS is AZ-scoped block storage attached to EC2, optimized for low-latency, consistent I/O.
Key Terms
- Amazon S3
- AWS object storage service designed for high durability and scalability, offering multiple storage classes with different cost and access characteristics.
- Amazon EBS
- Amazon Elastic Block Store, a block storage service for EC2 instances that provides low-latency, consistent I/O and supports various SSD and HDD volume types.
- Durability
- The probability that data will remain intact and not be lost over a given year; S3 targets 99.999999999% durability for objects stored in a single Region.
- S3 Glacier
- A family of S3 archival storage classes (including Glacier Instant Retrieval, Glacier Flexible Retrieval, and Glacier Deep Archive) optimized for long-term, infrequently accessed data.
- Availability
- The percentage of time that a service or storage class is accessible; S3 storage classes have different availability targets, often traded off against cost.
- EBS snapshot
- An incremental, point-in-time backup of an EBS volume stored in S3, used for backup, restore, and cloning volumes.
- Data lifecycle
- The stages data goes through from creation and active use to archival and eventual deletion, often managed via policies to optimize cost and compliance.
- EBS volume type
- A specific configuration of EBS storage (such as gp3, gp2, io2, st1, sc1) with defined performance and pricing characteristics.
- S3 storage class
- A configuration for S3 objects that defines cost, availability, retrieval characteristics, and minimum storage duration (e.g., Standard, Standard-IA, Glacier).
- S3 lifecycle policy
- A set of rules on an S3 bucket that automatically transitions objects between storage classes and expires them based on age or version status.