Chapter 22 of 26
Cost-Optimized Compute: EC2 Instance Types, Purchasing Options, and Auto Scaling
Compute is often the most visible line item; combine the right EC2 instance types, pricing models, and Auto Scaling strategies to keep workloads efficient and affordable.
Big Picture: Compute Costs and the Cost Optimization Pillar
Why Compute Costs Matter
Compute (mainly Amazon EC2) is often the most visible and controllable part of your AWS bill. This module links the cost optimization pillar directly to EC2 instance, pricing, and Auto Scaling choices.
Cost Optimization Pillar
The cost optimization pillar is: "The cost optimization pillar includes the continual process of refinement and improvement of a system over its entire lifecycle to build and operate cost-aware systems that achieve business outcomes and minimize costs."
Well-Architected Context
The AWS Well-Architected Framework has six pillars: Operational excellence, Security, Reliability, Performance efficiency, Cost optimization, Sustainability. Compute decisions must support all of them, not just cost.
Exam-Relevant Skill
You must be able to read a workload description and choose the right EC2 family, size, pricing mix, and Auto Scaling strategy to hit performance targets without overprovisioning.
EC2 Instance Families and Cost Characteristics
Main EC2 Families
Main families: General purpose (t, m), Compute optimized (c), Memory optimized (r, x), Storage optimized (i, d), Accelerated computing (p, g, inf, trn). Each targets a different resource profile.
Cost Implications
More specialized or larger instances usually cost more. Start with general purpose unless you have a clear CPU, memory, storage, or GPU bottleneck that justifies a specialized family.
Graviton for Price/Performance
Graviton (Arm-based) instances like t4g, m7g, c7g, r7g often give significantly better price/performance than x86 if your software supports Arm. This is a common cost-optimization recommendation.
Burstable vs Non-Burstable
T-family instances are burstable and use CPU credits. They are cheap for low/medium steady CPU with occasional spikes. For sustained high CPU, prefer non-burstable M or C families.
EC2 Pricing Models: On-Demand, Savings Plans/Reserved, and Spot
On-Demand Instances
On-Demand: pay per second or hour, no commitment. Highest flexibility and highest unit price. Ideal for unpredictable or short-lived workloads that cannot be interrupted.
Savings Plans and RIs
Savings Plans and Reserved Instances trade commitment (1 or 3 years) for discounts. Savings Plans are more flexible and are usually preferred for steady-state workloads.
Spot Instances
Spot Instances use spare capacity at deep discounts but can be interrupted with a 2-minute warning. Best for fault-tolerant, flexible, or batch workloads.
Exam Clues
Steady 24/7 usage → Savings Plans/RIs. Flexible and interruptible → Spot. Unpredictable or new workload → On-Demand. Watch for these phrases in questions.
Matching Pricing Models to Workload Patterns
Steady-State Workloads
Steady-state workloads have predictable, flat usage (for example, core APIs, databases). Cover these with Savings Plans or RIs and minimize long-term On-Demand usage.
Spiky or Seasonal Workloads
Spiky workloads have a low baseline and sharp peaks. Use Savings Plans/RIs for the baseline and Auto Scaling with On-Demand (and possibly Spot) to handle spikes.
Fault-Tolerant or Batch
Batch and fault-tolerant jobs (big data, CI, ML training) can be interrupted and retried. Use Spot as primary capacity and optionally On-Demand for guaranteed minimum throughput.
Blended Strategies
The most cost-effective designs blend models: discounted baseline + Auto Scaling + Spot where safe. Exam answers that mix these thoughtfully are often correct.
Right-Sizing: Using Metrics to Pick the Correct Instance Size
What is Right-Sizing?
Right-sizing is choosing the smallest, most appropriate instance type and size that still meets performance and reliability needs, instead of overprovisioning.
Key Metrics
Use CPUUtilization, memory usage, and network/disk I/O. Low utilization over time suggests you can downsize; persistent high utilization suggests scaling up or out.
Right-Sizing Process
Start with a reasonable instance, collect metrics over real traffic, identify the main bottleneck, then adjust instance size or family and re-measure.
Exam Clues
Phrases like "instances are underutilized" or "CPU is below 10%" hint that the best answer is to move to smaller or more appropriate instances or use Auto Scaling.
Auto Scaling Basics: Avoid Overprovisioning While Protecting Performance
What is an ASG?
An Auto Scaling group (ASG) is a logical group of EC2 instances with defined minimum, desired, and maximum capacity. AWS automatically adjusts instance count within these bounds.
Launch Templates and Policies
Launch templates define instance details. Scaling policies (target tracking, step, scheduled) decide when and how the ASG adds or removes instances.
Target Tracking for Cost and Performance
Target tracking policies keep a metric (like average CPU at 50%) near a target. This is the default, exam-friendly way to balance performance and cost.
Cost Effect
With ASGs, you run only the capacity you need. Set min capacity to baseline, let ASG scale for peaks, and consider mixing instance types and purchase options inside the group.
Advanced Auto Scaling for Cost: Mixed Instances and Spot Integration
Mixed Instances Policy
A mixed instances policy in an ASG lets you use multiple instance types and both On-Demand and Spot capacity, with a configurable percentage split.
Benefits for Cost
Using multiple types and Spot increases the chance of available cheap capacity and avoids dependence on a single instance type or AZ.
Cost-Optimized Web Tier Pattern
Typical pattern: ALB → ASG with mixed instances. Run baseline on On-Demand, burst on Spot, and use target tracking on CPU or request count.
Designing for Spot Interruptions
Keep instances stateless, store session/state externally, and rely on ASG health checks so interrupted Spot instances are quickly replaced.
Putting It Together: Three Scenario Walkthroughs
Scenario 1: Steady SaaS API
24/7 stable API with 30–40% CPU. Use ALB + ASG, m6i or m7g, and cover baseline with Savings Plans or RIs. Target tracking around 50% CPU.
Scenario 2: Spiky Retail Site
Low weekday, heavy weekend traffic. Use ALB + ASG, discounted baseline, and scale out on request count. Add Spot via mixed instances for cheaper peaks.
Scenario 3: Nightly Batch
Nightly analytics must finish before morning and can retry. Run mainly on Spot (possibly via ASG or containers) with compute-optimized instances.
Exam Takeaway
Always choose the cheapest combination that still meets uptime and performance needs. Extreme "Spot everywhere" or "tiny instances only" answers are usually wrong.
Design Challenge: Choose the Right Mix
Try this thought exercise. There are no single correct answers, but compare your reasoning to the guidance.
Workload A: University Portal
- Traffic: Very heavy at semester start, moderate during exam weeks, low otherwise.
- Requirements: Must be available; mostly read-heavy; app tier is stateless.
Questions to consider:
- Which EC2 family would you start with for the app tier (general, compute, memory, storage)? Why?
- How would you combine On-Demand, Savings Plans/RIs, and Spot for cost optimization?
- What Auto Scaling strategy would you use (target tracking vs scheduled vs step)?
Pause and write down a 3–4 sentence design.
Suggested reasoning (peek after you think):
- Family: General purpose (for example, m or t) is a good default; the workload is not clearly CPU- or memory-bound from the description.
- Pricing mix: Use Savings Plans/RIs for the baseline needed outside peak periods. Add Spot capacity for peaks because the app tier is stateless and can tolerate some instance loss. Keep some On-Demand to guarantee a minimum level of capacity during critical times.
- Auto Scaling: Combine scheduled scaling (for known semester start and exam weeks) with target tracking on CPU or request count. Scheduled scaling ensures capacity is ready before known surges; target tracking handles unexpected spikes.
As you practice, always explicitly tie each design choice back to cost, performance, and reliability.
Quick Check 1: Pricing Models
Test your understanding of EC2 pricing options.
A startup runs a stateless image-processing service that can retry jobs if an instance fails. Workload runs 12 hours per day, and minimizing cost is the top priority. Which compute purchasing strategy is MOST appropriate for the processing fleet?
- Run all instances as On-Demand in an Auto Scaling group.
- Use Spot Instances in an Auto Scaling group, optionally with a small On-Demand baseline.
- Purchase 3-year Standard Reserved Instances for all required capacity.
- Use only burstable T instances with no Auto Scaling.
Show Answer
Answer: B) Use Spot Instances in an Auto Scaling group, optionally with a small On-Demand baseline.
The workload is stateless, can retry, and cost is the top priority. This is ideal for Spot Instances, managed by an Auto Scaling group to replace interruptions. A small On-Demand baseline can be added if needed for minimum throughput. RIs/Savings Plans are better for steady 24/7 workloads, and running everything On-Demand or only on T instances is more expensive and less flexible.
Quick Check 2: Right-Sizing and Auto Scaling
Test how you think about right-sizing and elasticity.
You have an EC2 Auto Scaling group of m6i.xlarge instances behind an ALB. CloudWatch shows that during business hours, average CPU is 12% and memory is 25%. The application response time is well within the SLA. How can you MOST effectively reduce cost while maintaining performance?
- Replace the m6i.xlarge instances with larger m6i.2xlarge instances to reduce CPU utilization further.
- Disable Auto Scaling and keep only one m6i.xlarge instance running at all times.
- Switch to smaller instances (for example, m6i.large) and adjust the Auto Scaling group to use more instances as needed.
- Move the workload entirely to a single large memory-optimized instance (for example, r6i.4xlarge).
Show Answer
Answer: C) Switch to smaller instances (for example, m6i.large) and adjust the Auto Scaling group to use more instances as needed.
Low CPU and memory utilization indicate overprovisioning. The best option is to right-size to smaller instances and rely on Auto Scaling to add more instances during peaks, preserving performance while reducing cost. Upsizing or moving to a huge memory-optimized instance increases cost. Disabling Auto Scaling removes elasticity and can harm reliability.
Key Term Review: Compute Cost Optimization
Flip through these cards to reinforce core concepts and exam language.
- Cost optimization pillar
- "The cost optimization pillar includes the continual process of refinement and improvement of a system over its entire lifecycle to build and operate cost-aware systems that achieve business outcomes and minimize costs."
- Steady-state workload
- A workload with relatively constant, predictable resource usage over time (for example, an always-on production API). Best served by discounted capacity such as Savings Plans or Reserved Instances.
- Spiky workload
- A workload with low or moderate baseline usage and occasional large peaks (for example, retail during holidays). Typically uses a combination of discounted baseline capacity and Auto Scaling for bursts.
- Spot Instance
- An EC2 instance that uses spare AWS capacity at a steep discount but can be interrupted by AWS with a 2-minute warning. Best for fault-tolerant, flexible workloads.
- Target tracking scaling policy
- An Auto Scaling policy type that automatically adjusts the number of instances to keep a specified metric (such as average CPU utilization) near a target value.
- Right-sizing
- The process of selecting the most appropriate instance family and size based on actual utilization metrics so that resources are neither significantly underused nor overloaded.
- Mixed instances policy
- An Auto Scaling group configuration that uses multiple instance types and purchase options (On-Demand and Spot) to improve availability and reduce cost.
- General purpose instance family
- EC2 instance family (such as t, m, or Graviton equivalents) that provides a balance of compute, memory, and networking resources suitable for many common applications.
- Compute optimized instance family
- EC2 family (such as c7g, c7i, c6i) designed for compute-bound workloads, offering a higher ratio of vCPU to memory at a given price point.
- Memory optimized instance family
- EC2 family (such as r7g, r6i, x2idn) designed for memory-intensive workloads, providing more RAM per vCPU.
Key Terms
- Right-sizing
- The practice of adjusting instance families and sizes based on utilization metrics to avoid overprovisioning or underprovisioning compute resources.
- Savings Plan
- A flexible pricing model where you commit to a consistent amount of compute usage (measured in USD/hour) for a 1- or 3-year term in exchange for lower rates on eligible compute usage.
- Spot Instance
- An EC2 instance that uses spare AWS capacity at a significant discount but can be interrupted by AWS with a short warning when capacity is needed elsewhere.
- Graviton instance
- An EC2 instance powered by AWS-designed Arm-based processors, often providing better price/performance than comparable x86-based instances when applications support Arm.
- EC2 On-Demand Instance
- An EC2 pricing option where you pay for compute capacity by the second or hour with no long-term commitment, offering maximum flexibility but higher unit cost.
- Mixed instances policy
- An Auto Scaling configuration that allows an ASG to use multiple EC2 instance types and purchase options (On-Demand and Spot) to improve availability and reduce cost.
- Reserved Instance (RI)
- An EC2 pricing option that provides a discount compared to On-Demand in exchange for a 1- or 3-year commitment to a specific instance configuration or family.
- Auto Scaling group (ASG)
- A logical group of EC2 instances that AWS manages together, automatically increasing or decreasing the number of instances based on scaling policies and health checks.
- Compute optimized instance
- An EC2 instance type designed for compute-bound workloads, offering a higher ratio of vCPU to memory and typically lower cost per vCPU.
- Burstable instance (T family)
- An EC2 instance type that provides a baseline level of CPU performance with the ability to burst above the baseline using CPU credits.
- Target tracking scaling policy
- An Auto Scaling policy that adjusts capacity to keep a specific CloudWatch metric (such as average CPU utilization) near a user-defined target value.