SkarpSkarp

Chapter 20 of 26

Cost Optimization Foundations and the Cost Optimization Pillar

Cost questions are about more than picking the cheapest option; connect the cost optimization pillar to concrete patterns that keep bills low without sacrificing business outcomes.

27 min readen

Cost Optimization Pillar: What It Is and Why It Matters

Cost Is Not Just 'Cheap'

In AWS, cost optimization is about designing cost-aware architectures that still meet performance, reliability, and security goals, not just picking the cheapest-looking service.

Canonical Definition

Memorize this: "The cost optimization pillar includes the continual process of refinement and improvement of a system over its entire lifecycle to build and operate cost-aware systems that achieve business outcomes and minimize costs."

Key Ideas in the Definition

Exam hooks: continual process (ongoing reviews), entire lifecycle (from design to retirement), cost-aware systems (you understand cost drivers), and business outcomes first (do not break SLAs to cut cost).

Position in Well-Architected

The six Well-Architected pillars are: Operational excellence, Security, Reliability, Performance efficiency, Cost optimization, Sustainability. Cost interacts with all of them in scenario questions.

Cost Visibility and Measurement Basics on AWS

Why Visibility First?

You cannot optimize what you cannot see. Cost visibility is the first step: understand which services, Regions, and workloads drive your bill.

Pricing Dimensions

Most AWS services charge by usage: compute hours, GB-months of storage, number of requests, and data transfer. Know which dimension your workload stresses most.

Cost Explorer and CUR

Cost Explorer shows visual breakdowns by service, tag, and account. Cost & Usage Reports give detailed line items you can query with Athena or Redshift.

Budgets and Tags

AWS Budgets lets you set cost or usage thresholds with alerts. Cost allocation tags (like Project or Environment) let you attribute spend to teams or apps.

Core Cost Drivers in Typical AWS Architectures

Four Big Cost Buckets

Most AWS bills are dominated by compute, storage, data transfer, and managed service features. Learn to spot which bucket an architecture will stress.

Compute Cost Drivers

EC2: instance type, purchase option, hours. Fargate: vCPU, memory, time. Lambda: invocations, memory size, duration. Always-on compute quickly adds up.

Storage Cost Drivers

EBS: GB and volume type. S3: GB, requests, data transfer out. EFS: GB and throughput mode. Snapshots and backups also contribute to storage cost.

Network and Database Costs

Data transfer out to the internet and cross-Region links can spike costs. RDS, Aurora, DynamoDB, and analytics services add instance, capacity, and feature-based charges.

Reading an AWS Bill from an Architecture Diagram

Sample Web App Architecture

Picture: ALB -> EC2 Auto Scaling across 2 AZs -> Multi-AZ RDS. Static assets in S3, fronted by CloudFront, serving global users.

Compute and ALB Costs

ALBs charge per hour and LCU. EC2 costs come from instance type, purchase model, and hours. Over-provisioned instances or poor Auto Scaling drive up cost.

Database Layer Costs

Multi-AZ RDS means two instances plus extra storage. Read replicas add more. Caching and right-sizing can reduce required DB capacity.

S3, CloudFront, and Transfer

S3 charges for storage, requests, and transfer out. CloudFront caches content closer to users, often lowering total transfer cost and improving performance.

Designing Cost-Aware Compute Architectures

Three 'Rights' of Compute

Design compute for cost by right-sizing instances, right-pricing with the correct purchase model, and right-architecting using managed or serverless services.

Right-Sizing and Auto Scaling

Avoid giant instances running at 10% CPU. Use CloudWatch metrics and Auto Scaling groups to match capacity to real demand.

Right-Pricing Models

Use On-Demand for unpredictable load, Savings Plans or Reserved Instances for steady-state, and Spot for interruptible, fault-tolerant workloads.

Serverless Tradeoffs

Lambda and Fargate shine for intermittent or spiky workloads. For always-on, high-throughput systems, committed EC2 may be cheaper overall.

Cost-Aware Storage and Database Design

S3 Classes and Lifecycle

Use S3 Standard for hot data, IA classes for infrequent access, and Glacier tiers for archives. Lifecycle policies automatically transition or expire objects.

EBS and EFS Choices

gp3 volumes often balance cost and performance. io1/io2 are expensive but high performance. EFS is simple but can be pricey; use EFS IA for colder data.

Relational DB Cost Levers

RDS and Aurora costs come from instance size, storage, backups, and replicas. Use caching and archiving to avoid over-scaling the database.

DynamoDB Capacity Models

Choose provisioned capacity (with auto scaling) for predictable workloads, or on-demand for unpredictable or low-traffic tables. Good key design avoids hot partitions.

Tradeoffs Between Cost and Other Well-Architected Pillars

Pillars Interact

Cost optimization interacts with Operational excellence, Security, Reliability, Performance efficiency, and Sustainability. Tradeoffs are central to exam scenarios.

Cost vs Reliability & Performance

Multi-AZ and multi-Region designs cost more but improve availability. Oversizing for performance raises cost; right-sizing and caching can improve both.

Cost vs Security

Extra security controls can add cost, but you should not remove essential protections just to save money. Security requirements typically override cost savings.

Cost and Sustainability

Right-sizing and high utilization usually reduce both cost and environmental impact, aligning the Cost optimization and Sustainability pillars.

Thought Exercise: Picking the Best Cost Optimization Strategy

Work through these scenarios and decide on a cost-aware approach. There are no single "correct" answers here, but compare your reasoning with the guidance.

  1. Scenario A: Steady, predictable web traffic
  • A SaaS company has a web app with very stable daily traffic. It runs on an Auto Scaling group of m6i.large EC2 instances behind an ALB. Utilization is around 50% most of the time.
  • Question: What cost strategies make sense?
  • Think about: right-sizing, purchase models, and whether Spot is appropriate.
  1. Scenario B: Monthly batch processing
  • A data team runs a big ETL job once per month that processes terabytes of data using EMR. Jobs run for 6–8 hours and can tolerate retries.
  • Question: How would you optimize cost here?
  • Think about: Spot Instances, instance families, and cluster sizing.
  1. Scenario C: Unpredictable API traffic
  • A mobile app experiences unpredictable spikes in API calls when marketing campaigns run. The API is currently on EC2.
  • Question: What architectural changes could reduce cost while handling spikes?
  • Think about: serverless, Auto Scaling, and managed services.
  1. Scenario D: Growing log archive
  • Compliance requires logs to be kept for 7 years, but they are rarely accessed after the first month.
  • Question: Which storage classes and policies would you use?
  • Think about: S3 lifecycle rules and Glacier tiers.

After you have your answers, compare them with this guidance:

  • A: Use Savings Plans/Reserved Instances for baseline, right-size to increase utilization, maybe smaller instance types with more instances.
  • B: Use Spot Instances for EMR core/task nodes, and right-size clusters based on actual job duration.
  • C: Consider moving the API to Lambda + API Gateway or Fargate, with Auto Scaling and possibly on-demand DynamoDB.
  • D: Use S3 lifecycle policies to move logs from Standard to IA, then Glacier tiers, and eventually delete when allowed.

Quiz 1: Core Concepts of the Cost Optimization Pillar

Check your understanding of the core definition and basic tools.

Which option best aligns with the **cost optimization pillar** as defined in the AWS Well-Architected Framework?

  1. Choosing the cheapest AWS services available and avoiding managed services where possible.
  2. The continual process of refinement and improvement of a system over its entire lifecycle to build and operate cost-aware systems that achieve business outcomes and minimize costs.
  3. Designing architectures that maximize performance regardless of cost, then adding budgets later.
  4. Performing a one-time cost review during initial deployment and locking in all instance sizes.
Show Answer

Answer: B) The continual process of refinement and improvement of a system over its entire lifecycle to build and operate cost-aware systems that achieve business outcomes and minimize costs.

The canonical definition you must know is: "The cost optimization pillar includes the continual process of refinement and improvement of a system over its entire lifecycle to build and operate cost-aware systems that achieve business outcomes and minimize costs." The other options either ignore business outcomes, ignore the continual nature, or treat cost as a one-time exercise.

Quiz 2: Identifying Cost Drivers and Tools

Apply what you learned about cost drivers and visibility.

A company notices a sudden increase in their AWS bill. They want to understand which projects and environments are responsible. Which combination is the **most appropriate first step**?

  1. Enable AWS Budgets and immediately terminate any EC2 instances over a certain size.
  2. Use AWS Cost Explorer with properly configured cost allocation tags such as Project and Environment.
  3. Switch all EC2 instances to Spot Instances and move all data to S3 Glacier Deep Archive.
  4. Create a new AWS account for each project and migrate all resources before investigating costs.
Show Answer

Answer: B) Use AWS Cost Explorer with properly configured cost allocation tags such as Project and Environment.

Cost Explorer combined with cost allocation tags is the best way to attribute costs to projects and environments. Budgets and aggressive terminations are risky without visibility. Switching all instances to Spot or Glacier is unrealistic and ignores workload needs. Creating new accounts before understanding the current spend adds complexity without solving the immediate visibility problem.

Key Terms and Definitions Review

Use these flashcards to reinforce the most exam-relevant definitions and concepts from this module.

Cost optimization pillar (definition)
The cost optimization pillar includes the continual process of refinement and improvement of a system over its entire lifecycle to build and operate cost-aware systems that achieve business outcomes and minimize costs.
AWS Well-Architected Framework (definition)
The AWS Well-Architected Framework provides a consistent set of best practices for customers and partners to evaluate architectures, and a set of questions you can use to evaluate how well an architecture is aligned to AWS best practices.
Main AWS cost drivers
Compute (instance hours, vCPU/memory), storage (GB-months, storage class), requests and I/O, data transfer (especially out to the internet and cross-Region), and managed service features (e.g., RDS replicas, DynamoDB capacity).
AWS Cost Explorer
A tool that provides visualizations and reports of your AWS costs and usage, allowing you to break down spend by service, account, Region, tag, and more.
AWS Budgets
A service that lets you set custom cost and usage budgets and receive alerts via email or SNS when your usage approaches or exceeds those thresholds.
Cost allocation tags
User-defined tags (for example, Project, Environment, CostCenter) that you activate for cost allocation so you can attribute AWS costs to specific projects, teams, or applications.
Right-sizing
The practice of matching resource types and sizes (such as EC2 instance types) to the actual workload needs, avoiding both over-provisioning and under-provisioning.
Savings Plans / Reserved Instances use case
Best suited for steady-state, predictable workloads where you can commit to a certain level of compute usage for 1 or 3 years in exchange for a lower effective hourly rate.
Spot Instances typical use case
Workloads that are fault-tolerant and can handle interruptions, such as batch processing, big data analytics, CI/CD workloads, and stateless, horizontally scalable services.
S3 lifecycle policy
Rules that automatically transition objects between S3 storage classes (for example, from Standard to IA to Glacier) or expire them after a specified period, helping control long-term storage costs.

Key Terms

AWS Budgets
An AWS service that lets you set custom budgets for cost and usage and receive alerts when thresholds are approached or exceeded.
Right-sizing
Adjusting resource types and sizes (such as EC2 instance types) to better match actual workload demands, reducing waste and cost.
Cost Explorer
An AWS service that provides visual tools and reports to analyze your AWS costs and usage over time, broken down by various dimensions such as service, Region, tag, and account.
Spot Instances
Spare AWS compute capacity available at discounts of up to 90% compared to On-Demand, with the risk of interruption when capacity is needed elsewhere.
S3 lifecycle policy
A set of rules that automatically transition S3 objects between storage classes or expire them after a defined period to optimize storage costs.
Cost allocation tags
User-defined tags enabled for cost allocation so that AWS can attribute resource costs to specific projects, teams, or environments.
DynamoDB capacity modes
Provisioned capacity mode, where you specify read/write capacity units, and on-demand capacity mode, where you pay per request and do not manage capacity directly.
Cost optimization pillar
The cost optimization pillar includes the continual process of refinement and improvement of a system over its entire lifecycle to build and operate cost-aware systems that achieve business outcomes and minimize costs.
AWS Well-Architected Framework
The AWS Well-Architected Framework provides a consistent set of best practices for customers and partners to evaluate architectures, and a set of questions you can use to evaluate how well an architecture is aligned to AWS best practices.
Savings Plans / Reserved Instances
Pricing models that offer lower compute costs in exchange for a commitment to consistent usage over a 1- or 3-year term.

Finished reading?

Test your understanding with a custom practice exam on this chapter.

Test yourself