Chapter 20 of 26
Applying the Cost Optimization Pillar of the AWS Well-Architected Framework
Cost optimization isn’t just about cutting spend; it’s about aligning cost with value over the workload lifecycle. This module ties concrete AWS savings tactics back to the Cost optimization pillar of the AWS Well-Architected Framework.
Cost Optimization in the Well-Architected Framework
The Well-Architected Context
The AWS Well-Architected Framework gives a consistent set of best practices and questions to evaluate how well an architecture aligns to AWS best practices.
Cost Optimization Pillar: Canonical Definition
The cost optimization pillar is defined as: "The cost optimization pillar includes the continual process of refinement and improvement of a system over its entire lifecycle to build and operate cost-aware systems that achieve business outcomes and minimize costs."
Three Embedded Ideas
Key ideas: 1) Continual process over the workload lifecycle, 2) Cost-aware systems that surface cost to teams, 3) Focus on business outcomes, not just the lowest bill.
Link to Prior Modules
You already saw concrete tactics: EC2 purchase options, right-sizing, Auto Scaling, RDS and caching, and network design. Here we connect those to Well-Architected language and exam-style trade-offs.
Workload Lifecycle and Continuous Optimization
Why Lifecycle Matters
Cost optimization is not a one-time event. The pillar explicitly covers the entire lifecycle: from prototype to production and eventual retirement of a workload.
1. Experiment / Prototype
Early stage: prioritize speed and learning. Use on-demand, managed services, simple architectures. Avoid heavy long-term commitments for workloads that might change or be discarded.
2. Pilot / Pre-Production
As patterns emerge, add tags, budgets, and monitoring. Use CloudWatch, Cost Explorer, and Compute Optimizer to start measuring utilization and identifying right-sizing options.
3. Production / Scale-Out
Once usage stabilizes, introduce Savings Plans, RIs, refined Auto Scaling, and storage class tuning. This is where big savings are typically realized safely.
4. Optimization and Retirement
Continuously remove waste: idle EBS volumes, unused Elastic IPs, idle load balancers, old snapshots. Exam scenarios often reward answers that automate this cleanup.
Monitoring, Visibility, and Cost-Aware Culture
Visibility Enables Optimization
You cannot optimize what you cannot see. Cost optimization assumes you have visibility into usage and ownership of resources across your AWS accounts.
Cost Allocation Tags
Use tags like `Environment`, `Owner`, `Application`. Activate them as cost allocation tags so Cost Explorer can group and report spend by these dimensions.
Budgets vs Cost Explorer
AWS Budgets: define thresholds and send alerts when cost/usage exceeds them. AWS Cost Explorer: explore and visualize historical cost and usage trends.
Anomaly Detection and Optimizer
Cost Anomaly Detection finds unusual spend spikes. AWS Compute Optimizer uses metrics to recommend right-sizing EC2, EBS, Lambda, and some other resources.
Cost-Aware Culture
Tagging, dashboards, and alerts create a cost-aware culture, where teams see the impact of their designs and can iterate toward better cost-performance trade-offs.
Right-Sizing and Auto Scaling: A Concrete Scenario
Scenario Setup
You run a web app on 4 `m5.large` instances behind an ALB. Auto Scaling desired capacity is fixed at 4. CPU is ~12% by day and ~3% at night. Costs are higher than expected.
Step 1: Measure
Use CloudWatch to confirm low utilization. Use AWS Compute Optimizer; it recommends smaller `t3.medium` instances and scale-in at night.
Step 2: Apply Pillars
Cost optimization: match capacity to demand. Performance efficiency: ensure new instance type still meets needs. Reliability: keep at least 2 instances across AZs.
Step 3: Implement
Update the launch template to `t3.medium`. Configure an Auto Scaling group with min=2, max=6, and dynamic or scheduled scaling policies to reduce capacity overnight.
Exam Pattern
When you see "low utilization and fixed capacity," think right-sizing plus Auto Scaling, not just buying long-term commitments for over-sized instances.
Design Trade-Offs: Cost vs Performance, Reliability, and Security
Pillars in Tension
Cost optimization must be balanced with performance, reliability, and security. The goal is not "cheapest" but "best value" for stated requirements.
Performance Efficiency and Reliability
Performance efficiency focuses on using computing resources efficiently as demand and tech evolve. Reliability is about the workload performing correctly and consistently when expected.
Trade-Off: Single-AZ vs Multi-AZ
Single-AZ RDS is cheaper. But when high availability and low RTO are requirements, Multi-AZ is the correct choice even at higher cost.
Trade-Off: Caching and Storage Classes
You might add ElastiCache to reduce DB size or load, or move rarely accessed S3 data to cheaper classes like Glacier. Each change trades cost against latency and complexity.
Security is Non-Negotiable
Under the shared responsibility model, you must configure security in the cloud correctly, even if it costs more. Do not remove encryption or logging just to save money in exam scenarios.
Sustainability and Cost: Aligning Environmental and Financial Efficiency
Sustainability Pillar: Definition
The sustainability pillar "focuses on minimizing the environmental impacts of running cloud workloads by maximizing utilization and minimizing the resources required, and by reducing the energy required to deliver business value."
Overlap with Cost Optimization
High utilization and minimal resource waste usually mean lower cost and lower environmental impact. Underutilized instances waste both money and energy.
Examples that Help Both
Serverless (Lambda, Fargate), right-sizing, Auto Scaling, and S3 lifecycle policies reduce idle capacity and unnecessary storage, saving cost and energy.
Lifecycle Perspective
Refactoring to more efficient architectures may cost effort now but often pays off over the workload lifecycle in both sustainability and cost terms.
Exam Signal
Options that remove idle resources, reduce data transfer, or move to efficient managed services usually support both cost optimization and sustainability.
Thought Exercise: Picking the Right Cost Strategy by Stage
Work through these short scenarios mentally. The goal is to map lifecycle stage to the most appropriate cost strategy, not just the cheapest.
Scenario A: New Analytics Prototype
- A data science team is experimenting with a new recommendation model. They are unsure if it will go to production. They need to run irregular, compute-heavy jobs on EC2 and use a temporary RDS database.
Questions to consider:
- Would you recommend 3-year Reserved Instances, 1-year Savings Plans, or On-Demand for the EC2 jobs? Why?
- How aggressively would you right-size RDS now?
Pause and answer before reading the guidance.
Guidance:
- On-Demand is usually best for highly uncertain, short-lived prototypes. Committing to 3-year RIs conflicts with the lifecycle principle.
- Basic right-sizing is fine, but do not over-invest in optimization. Simplicity and speed of change matter more at this stage.
Scenario B: Stable Production API
- A customer-facing API has run for 18 months. Traffic patterns are very predictable: weekday peaks, quiet weekends. It uses EC2 Auto Scaling and RDS Multi-AZ.
Questions to consider:
- What cost mechanisms make sense now (RIs, Savings Plans, spot, right-sizing)?
- How might you use monitoring data to refine the design?
Guidance:
- Now that usage is stable, long-term Savings Plans or RIs for baseline capacity make sense. You can also consider spot for non-critical background jobs.
- Use CloudWatch and Cost Explorer to fine-tune instance sizes, Auto Scaling thresholds, and storage classes, aligning capacity closely with observed demand.
Quick Check: Lifecycle and Commitments
Test your understanding of how lifecycle stage affects cost decisions.
A startup is launching a brand-new mobile app on AWS. They expect traffic to change rapidly as they iterate on features and do not yet know typical usage patterns. Which cost strategy best aligns with the cost optimization pillar definition?
- Purchase 3-year All Upfront Reserved Instances for all EC2 capacity to minimize hourly rates.
- Use On-Demand instances initially, monitor usage with CloudWatch and Cost Explorer, and consider Savings Plans once patterns stabilize.
- Use only Spot Instances for all workloads, including the production API, to minimize total spend.
- Immediately migrate all compute to AWS Lambda and Fargate, regardless of architecture fit, because serverless is always cheaper.
Show Answer
Answer: B) Use On-Demand instances initially, monitor usage with CloudWatch and Cost Explorer, and consider Savings Plans once patterns stabilize.
The cost optimization pillar emphasizes a continual process over the workload lifecycle. For a brand-new, uncertain workload, it is better to use On-Demand, gather data, and then commit (for example, Savings Plans) once usage stabilizes. Long-term RIs on day one (A) are risky, Spot for all production workloads (C) can violate reliability, and serverless is not always the right fit (D).
Quick Check: Tools for Cost Awareness
Identify the right AWS tool for a given cost optimization need.
Your finance team wants to receive an email if monthly AWS spending for the "Marketing" tagged resources exceeds $10,000. Which AWS feature is the BEST fit?
- Create a report in AWS Cost Explorer and download it monthly.
- Configure an AWS Budget with a cost filter on the "Marketing" cost allocation tag and set an email alert at $10,000.
- Use AWS Cost Anomaly Detection to create a detector for all services.
- Enable AWS Compute Optimizer for the account and review right-sizing recommendations.
Show Answer
Answer: B) Configure an AWS Budget with a cost filter on the "Marketing" cost allocation tag and set an email alert at $10,000.
AWS Budgets is designed for setting cost or usage thresholds and sending alerts when they are exceeded. You can filter by cost allocation tag, like "Marketing". Cost Explorer is for analysis, not alerts (A). Cost Anomaly Detection looks for unusual patterns, not fixed thresholds (C). Compute Optimizer is for right-sizing recommendations, not budget alerts (D).
Key Term Review: Cost and Related Pillars
Use these flashcards to reinforce the canonical pillar definitions and a few core concepts.
- AWS Well-Architected Framework
- The AWS Well-Architected Framework provides a consistent set of best practices for customers and partners to evaluate architectures, and a set of questions you can use to evaluate how well an architecture is aligned to AWS best practices.
- Cost optimization pillar (definition)
- The cost optimization pillar includes the continual process of refinement and improvement of a system over its entire lifecycle to build and operate cost-aware systems that achieve business outcomes and minimize costs.
- Performance efficiency pillar (definition)
- The performance efficiency pillar focuses on the efficient use of computing resources to meet requirements and maintain that efficiency as demand changes and technologies evolve.
- Reliability pillar (definition)
- The reliability pillar encompasses the ability of a workload to perform its intended function correctly and consistently when it’s expected to. This includes the ability to operate and test the workload through its total lifecycle.
- Sustainability pillar (definition)
- The sustainability pillar focuses on minimizing the environmental impacts of running cloud workloads by maximizing utilization and minimizing the resources required, and by reducing the energy required to deliver business value.
- Shared responsibility model (definition)
- The AWS shared responsibility model describes how AWS is responsible for security of the cloud, while customers are responsible for security in the cloud, including the configuration of their services and data.
- AWS Budgets vs AWS Cost Explorer
- AWS Budgets: define cost/usage thresholds and send alerts when exceeded. AWS Cost Explorer: analyze and visualize historical cost and usage; no threshold-based alerts.
- AWS Compute Optimizer
- A service that analyzes usage metrics (for example, from CloudWatch) to recommend right-sizing for EC2, EBS, Lambda, and some other resources, helping to reduce cost while maintaining performance.
- Cost allocation tags
- Tags that you activate in the billing console so that AWS can use them to organize and report cost data by tag key and value (for example, Environment, Owner, Application).
Mini Case Study: Balancing Cost, Reliability, and Sustainability
Consider this mini case and choose an approach you would defend in an exam scenario.
Case:
A company runs an internal reporting application. Requirements:
- Must be available during business hours in one region; occasional short outages are acceptable.
- Reports are generated in batch overnight and interactively during the day.
- Data is stored in Amazon S3 and queried using Amazon Athena.
- There is a small API layer currently running on always-on EC2 instances in one AZ.
- Management wants to reduce cost and environmental impact without hurting the user experience.
Think through these questions:
- Would you move the API layer to AWS Lambda behind Amazon API Gateway, or just right-size the EC2 instances and add Auto Scaling?
- How could you further reduce cost and environmental impact in S3 and Athena?
Reflect, then compare to the guidance below.
Possible answer path:
- Moving the low-traffic API to Lambda + API Gateway can eliminate idle EC2 time, aligning with both cost optimization and sustainability (scale to zero when idle). If latency and cold starts are acceptable for this internal tool, this is a strong choice.
- For S3, enable lifecycle rules (for example, to Intelligent-Tiering or infrequent access classes) for older reports, and compress data to reduce scanned bytes in Athena. Partition data by date or department so Athena queries scan less data, lowering cost and energy used per query.
On the exam, the best answer would clearly state how the chosen design meets availability, cost, and sustainability requirements together, not just one of them.
Key Terms
- AWS Budgets
- Service to set custom cost and usage budgets and receive alerts when thresholds are exceeded.
- Auto Scaling
- AWS capability that automatically adjusts compute capacity (for example, EC2 instances) based on demand according to defined policies.
- Right-sizing
- Adjusting resource types and sizes to better match actual usage, reducing waste while maintaining performance.
- AWS Cost Explorer
- Tool to visualize and analyze historical AWS cost and usage data.
- Reliability pillar
- Encompasses the ability of a workload to perform its intended function correctly and consistently when it’s expected to, including throughout its lifecycle.
- Cost allocation tags
- Tags activated in billing so that AWS can use them to categorize and report costs by tag key and value.
- AWS Compute Optimizer
- Service that recommends optimal AWS resources for workloads to reduce costs and improve performance based on usage metrics.
- Sustainability pillar
- Focuses on minimizing the environmental impacts of running cloud workloads by maximizing utilization, minimizing resources, and reducing energy required for business value.
- Cost optimization pillar
- Includes the continual process of refinement and improvement of a system over its entire lifecycle to build and operate cost-aware systems that achieve business outcomes and minimize costs.
- Shared responsibility model
- Describes how AWS is responsible for security of the cloud, while customers are responsible for security in the cloud, including service and data configuration.
- Performance efficiency pillar
- Focuses on the efficient use of computing resources to meet requirements and maintain that efficiency as demand changes and technologies evolve.
- AWS Well-Architected Framework
- Provides a consistent set of best practices and questions to evaluate how well an architecture aligns to AWS best practices.