SkarpSkarp

Chapter 22 of 26

Cost-Optimized Compute: EC2 Instance Types, Purchasing Options, and Auto Scaling

Compute is often the most visible line item; combine the right EC2 instance types, pricing models, and Auto Scaling strategies to keep workloads efficient and affordable.

27 min readen

Big Picture: Compute Costs and the Cost Optimization Pillar

Why Compute Costs Matter

Compute (mainly Amazon EC2) is often the most visible and controllable part of your AWS bill. This module links the cost optimization pillar directly to EC2 instance, pricing, and Auto Scaling choices.

Cost Optimization Pillar

The cost optimization pillar is: "The cost optimization pillar includes the continual process of refinement and improvement of a system over its entire lifecycle to build and operate cost-aware systems that achieve business outcomes and minimize costs."

Well-Architected Context

The AWS Well-Architected Framework has six pillars: Operational excellence, Security, Reliability, Performance efficiency, Cost optimization, Sustainability. Compute decisions must support all of them, not just cost.

Exam-Relevant Skill

You must be able to read a workload description and choose the right EC2 family, size, pricing mix, and Auto Scaling strategy to hit performance targets without overprovisioning.

EC2 Instance Families and Cost Characteristics

Main EC2 Families

Main families: General purpose (t, m), Compute optimized (c), Memory optimized (r, x), Storage optimized (i, d), Accelerated computing (p, g, inf, trn). Each targets a different resource profile.

Cost Implications

More specialized or larger instances usually cost more. Start with general purpose unless you have a clear CPU, memory, storage, or GPU bottleneck that justifies a specialized family.

Graviton for Price/Performance

Graviton (Arm-based) instances like t4g, m7g, c7g, r7g often give significantly better price/performance than x86 if your software supports Arm. This is a common cost-optimization recommendation.

Burstable vs Non-Burstable

T-family instances are burstable and use CPU credits. They are cheap for low/medium steady CPU with occasional spikes. For sustained high CPU, prefer non-burstable M or C families.

EC2 Pricing Models: On-Demand, Savings Plans/Reserved, and Spot

On-Demand Instances

On-Demand: pay per second or hour, no commitment. Highest flexibility and highest unit price. Ideal for unpredictable or short-lived workloads that cannot be interrupted.

Savings Plans and RIs

Savings Plans and Reserved Instances trade commitment (1 or 3 years) for discounts. Savings Plans are more flexible and are usually preferred for steady-state workloads.

Spot Instances

Spot Instances use spare capacity at deep discounts but can be interrupted with a 2-minute warning. Best for fault-tolerant, flexible, or batch workloads.

Exam Clues

Steady 24/7 usage → Savings Plans/RIs. Flexible and interruptible → Spot. Unpredictable or new workload → On-Demand. Watch for these phrases in questions.

Matching Pricing Models to Workload Patterns

Steady-State Workloads

Steady-state workloads have predictable, flat usage (for example, core APIs, databases). Cover these with Savings Plans or RIs and minimize long-term On-Demand usage.

Spiky or Seasonal Workloads

Spiky workloads have a low baseline and sharp peaks. Use Savings Plans/RIs for the baseline and Auto Scaling with On-Demand (and possibly Spot) to handle spikes.

Fault-Tolerant or Batch

Batch and fault-tolerant jobs (big data, CI, ML training) can be interrupted and retried. Use Spot as primary capacity and optionally On-Demand for guaranteed minimum throughput.

Blended Strategies

The most cost-effective designs blend models: discounted baseline + Auto Scaling + Spot where safe. Exam answers that mix these thoughtfully are often correct.

Right-Sizing: Using Metrics to Pick the Correct Instance Size

What is Right-Sizing?

Right-sizing is choosing the smallest, most appropriate instance type and size that still meets performance and reliability needs, instead of overprovisioning.

Key Metrics

Use CPUUtilization, memory usage, and network/disk I/O. Low utilization over time suggests you can downsize; persistent high utilization suggests scaling up or out.

Right-Sizing Process

Start with a reasonable instance, collect metrics over real traffic, identify the main bottleneck, then adjust instance size or family and re-measure.

Exam Clues

Phrases like "instances are underutilized" or "CPU is below 10%" hint that the best answer is to move to smaller or more appropriate instances or use Auto Scaling.

Auto Scaling Basics: Avoid Overprovisioning While Protecting Performance

What is an ASG?

An Auto Scaling group (ASG) is a logical group of EC2 instances with defined minimum, desired, and maximum capacity. AWS automatically adjusts instance count within these bounds.

Launch Templates and Policies

Launch templates define instance details. Scaling policies (target tracking, step, scheduled) decide when and how the ASG adds or removes instances.

Target Tracking for Cost and Performance

Target tracking policies keep a metric (like average CPU at 50%) near a target. This is the default, exam-friendly way to balance performance and cost.

Cost Effect

With ASGs, you run only the capacity you need. Set min capacity to baseline, let ASG scale for peaks, and consider mixing instance types and purchase options inside the group.

Advanced Auto Scaling for Cost: Mixed Instances and Spot Integration

Mixed Instances Policy

A mixed instances policy in an ASG lets you use multiple instance types and both On-Demand and Spot capacity, with a configurable percentage split.

Benefits for Cost

Using multiple types and Spot increases the chance of available cheap capacity and avoids dependence on a single instance type or AZ.

Cost-Optimized Web Tier Pattern

Typical pattern: ALB → ASG with mixed instances. Run baseline on On-Demand, burst on Spot, and use target tracking on CPU or request count.

Designing for Spot Interruptions

Keep instances stateless, store session/state externally, and rely on ASG health checks so interrupted Spot instances are quickly replaced.

Putting It Together: Three Scenario Walkthroughs

Scenario 1: Steady SaaS API

24/7 stable API with 30–40% CPU. Use ALB + ASG, m6i or m7g, and cover baseline with Savings Plans or RIs. Target tracking around 50% CPU.

Scenario 2: Spiky Retail Site

Low weekday, heavy weekend traffic. Use ALB + ASG, discounted baseline, and scale out on request count. Add Spot via mixed instances for cheaper peaks.

Scenario 3: Nightly Batch

Nightly analytics must finish before morning and can retry. Run mainly on Spot (possibly via ASG or containers) with compute-optimized instances.

Exam Takeaway

Always choose the cheapest combination that still meets uptime and performance needs. Extreme "Spot everywhere" or "tiny instances only" answers are usually wrong.

Design Challenge: Choose the Right Mix

Try this thought exercise. There are no single correct answers, but compare your reasoning to the guidance.

Workload A: University Portal

  • Traffic: Very heavy at semester start, moderate during exam weeks, low otherwise.
  • Requirements: Must be available; mostly read-heavy; app tier is stateless.

Questions to consider:

  1. Which EC2 family would you start with for the app tier (general, compute, memory, storage)? Why?
  2. How would you combine On-Demand, Savings Plans/RIs, and Spot for cost optimization?
  3. What Auto Scaling strategy would you use (target tracking vs scheduled vs step)?

Pause and write down a 3–4 sentence design.

Suggested reasoning (peek after you think):

  1. Family: General purpose (for example, m or t) is a good default; the workload is not clearly CPU- or memory-bound from the description.
  2. Pricing mix: Use Savings Plans/RIs for the baseline needed outside peak periods. Add Spot capacity for peaks because the app tier is stateless and can tolerate some instance loss. Keep some On-Demand to guarantee a minimum level of capacity during critical times.
  3. Auto Scaling: Combine scheduled scaling (for known semester start and exam weeks) with target tracking on CPU or request count. Scheduled scaling ensures capacity is ready before known surges; target tracking handles unexpected spikes.

As you practice, always explicitly tie each design choice back to cost, performance, and reliability.

Quick Check 1: Pricing Models

Test your understanding of EC2 pricing options.

A startup runs a stateless image-processing service that can retry jobs if an instance fails. Workload runs 12 hours per day, and minimizing cost is the top priority. Which compute purchasing strategy is MOST appropriate for the processing fleet?

  1. Run all instances as On-Demand in an Auto Scaling group.
  2. Use Spot Instances in an Auto Scaling group, optionally with a small On-Demand baseline.
  3. Purchase 3-year Standard Reserved Instances for all required capacity.
  4. Use only burstable T instances with no Auto Scaling.
Show Answer

Answer: B) Use Spot Instances in an Auto Scaling group, optionally with a small On-Demand baseline.

The workload is stateless, can retry, and cost is the top priority. This is ideal for Spot Instances, managed by an Auto Scaling group to replace interruptions. A small On-Demand baseline can be added if needed for minimum throughput. RIs/Savings Plans are better for steady 24/7 workloads, and running everything On-Demand or only on T instances is more expensive and less flexible.

Quick Check 2: Right-Sizing and Auto Scaling

Test how you think about right-sizing and elasticity.

You have an EC2 Auto Scaling group of m6i.xlarge instances behind an ALB. CloudWatch shows that during business hours, average CPU is 12% and memory is 25%. The application response time is well within the SLA. How can you MOST effectively reduce cost while maintaining performance?

  1. Replace the m6i.xlarge instances with larger m6i.2xlarge instances to reduce CPU utilization further.
  2. Disable Auto Scaling and keep only one m6i.xlarge instance running at all times.
  3. Switch to smaller instances (for example, m6i.large) and adjust the Auto Scaling group to use more instances as needed.
  4. Move the workload entirely to a single large memory-optimized instance (for example, r6i.4xlarge).
Show Answer

Answer: C) Switch to smaller instances (for example, m6i.large) and adjust the Auto Scaling group to use more instances as needed.

Low CPU and memory utilization indicate overprovisioning. The best option is to right-size to smaller instances and rely on Auto Scaling to add more instances during peaks, preserving performance while reducing cost. Upsizing or moving to a huge memory-optimized instance increases cost. Disabling Auto Scaling removes elasticity and can harm reliability.

Key Term Review: Compute Cost Optimization

Flip through these cards to reinforce core concepts and exam language.

Cost optimization pillar
"The cost optimization pillar includes the continual process of refinement and improvement of a system over its entire lifecycle to build and operate cost-aware systems that achieve business outcomes and minimize costs."
Steady-state workload
A workload with relatively constant, predictable resource usage over time (for example, an always-on production API). Best served by discounted capacity such as Savings Plans or Reserved Instances.
Spiky workload
A workload with low or moderate baseline usage and occasional large peaks (for example, retail during holidays). Typically uses a combination of discounted baseline capacity and Auto Scaling for bursts.
Spot Instance
An EC2 instance that uses spare AWS capacity at a steep discount but can be interrupted by AWS with a 2-minute warning. Best for fault-tolerant, flexible workloads.
Target tracking scaling policy
An Auto Scaling policy type that automatically adjusts the number of instances to keep a specified metric (such as average CPU utilization) near a target value.
Right-sizing
The process of selecting the most appropriate instance family and size based on actual utilization metrics so that resources are neither significantly underused nor overloaded.
Mixed instances policy
An Auto Scaling group configuration that uses multiple instance types and purchase options (On-Demand and Spot) to improve availability and reduce cost.
General purpose instance family
EC2 instance family (such as t, m, or Graviton equivalents) that provides a balance of compute, memory, and networking resources suitable for many common applications.
Compute optimized instance family
EC2 family (such as c7g, c7i, c6i) designed for compute-bound workloads, offering a higher ratio of vCPU to memory at a given price point.
Memory optimized instance family
EC2 family (such as r7g, r6i, x2idn) designed for memory-intensive workloads, providing more RAM per vCPU.

Key Terms

Right-sizing
The practice of adjusting instance families and sizes based on utilization metrics to avoid overprovisioning or underprovisioning compute resources.
Savings Plan
A flexible pricing model where you commit to a consistent amount of compute usage (measured in USD/hour) for a 1- or 3-year term in exchange for lower rates on eligible compute usage.
Spot Instance
An EC2 instance that uses spare AWS capacity at a significant discount but can be interrupted by AWS with a short warning when capacity is needed elsewhere.
Graviton instance
An EC2 instance powered by AWS-designed Arm-based processors, often providing better price/performance than comparable x86-based instances when applications support Arm.
EC2 On-Demand Instance
An EC2 pricing option where you pay for compute capacity by the second or hour with no long-term commitment, offering maximum flexibility but higher unit cost.
Mixed instances policy
An Auto Scaling configuration that allows an ASG to use multiple EC2 instance types and purchase options (On-Demand and Spot) to improve availability and reduce cost.
Reserved Instance (RI)
An EC2 pricing option that provides a discount compared to On-Demand in exchange for a 1- or 3-year commitment to a specific instance configuration or family.
Auto Scaling group (ASG)
A logical group of EC2 instances that AWS manages together, automatically increasing or decreasing the number of instances based on scaling policies and health checks.
Compute optimized instance
An EC2 instance type designed for compute-bound workloads, offering a higher ratio of vCPU to memory and typically lower cost per vCPU.
Burstable instance (T family)
An EC2 instance type that provides a baseline level of CPU performance with the ability to burst above the baseline using CPU credits.
Target tracking scaling policy
An Auto Scaling policy that adjusts capacity to keep a specific CloudWatch metric (such as average CPU utilization) near a user-defined target value.

Finished reading?

Test your understanding with a custom practice exam on this chapter.

Test yourself