SkarpSkarp

Chapter 18 of 26

Cost-Optimized Compute: EC2 Purchasing Options, Right-Sizing, and AWS Auto Scaling

Compute is often the largest variable cost in AWS. This module dives into EC2 pricing models, right-sizing using EC2 instance types, and cost-aware use of AWS Auto Scaling.

27 min readen

Big Picture: Compute Cost and Exam Context

Why Compute Cost Matters

Compute is usually the largest variable part of an AWS bill. The exam often asks for the most cost‑effective compute design that still meets performance and availability needs.

Link to Well‑Architected

This module sits in the cost optimization pillar: "The cost optimization pillar includes the continual process of refinement and improvement of a system over its entire lifecycle to build and operate cost-aware systems that achieve business outcomes and minimize costs."

Right-Sizing and Pricing

You will learn to pick the right instance family and size, choose the right purchasing model (On-Demand, Reserved, Savings Plans, Spot), and use Auto Scaling so you do not pay for idle compute.

Connect to Other Modules

Spiky ingestion and streaming pipelines often need bursty compute, and analytics ETL jobs can be interruption-tolerant. Both are great candidates for Auto Scaling and Spot Instances.

Exam Outcomes

You should be able to map a workload to instance type and size, pick the best pricing model, configure Auto Scaling to avoid overprovisioning, and recognize when serverless or containers are cheaper than always-on EC2.

EC2 Instance Families and Right-Sizing Basics

What is Right-Sizing?

Right‑sizing means choosing the smallest instance type that still meets performance requirements, avoiding both overpowered and underpowered EC2 instances.

General and Compute Families

General purpose (t, m) like t4g, m7g, m7i are balanced defaults. Compute optimized (c) like c7g, c7i suit CPU-bound tasks such as batch processing or high-performance web servers.

Memory and Storage Families

Memory optimized (r, x, z) like r7g, r7i fit in-memory DBs and analytics. Storage optimized (i, d families) like i4i offer high IOPS and low-latency local storage.

Accelerated Computing

Accelerated computing (p, g, trn families) provide GPUs or custom chips for ML training, inference, and graphics-heavy workloads.

Right-Sizing Process

1) Measure CPU, memory, I/O with CloudWatch and Compute Optimizer. 2) Find the main bottleneck. 3) Switch family if needed. 4) Resize up or down within that family.

Reliability and Cost

Overprovisioning wastes money; underprovisioning can hurt SLAs and violates the Reliability pillar: "The reliability pillar encompasses the ability of a workload to perform its intended function correctly and consistently when it’s expected to."

Right-Sizing Walkthrough: Web App and Analytics Job

Scenario 1: Steady Web App

A web app runs on m5.4xlarge. CloudWatch shows CPU 10–15% and memory 25%. Performance is fine, so the instance is clearly oversized.

Fixing the Web App

Downsize to m7g.xlarge or m7g.2xlarge (or m7i if you need x86) and place multiple smaller instances in an Auto Scaling group across AZs for resilience and lower cost.

Scenario 2: Nightly Analytics Job

A 4-hour nightly Spark ETL job runs on m5.2xlarge at 100% CPU and 80% memory. It is CPU-bound and runs on a predictable schedule.

Optimizing the Analytics Job

Move to compute optimized, e.g., c7g.2xlarge, to finish faster and cheaper. Use Spot Instances plus some On-Demand in EMR or an Auto Scaling group.

Exam Clues

Phrases like CPU-bound, memory-bound, scheduled batch, or steady 24/7 load hint at which instance family and pricing model (On-Demand, Reserved, Savings, Spot) you should choose.

EC2 Purchasing Models: On-Demand, RIs, and Savings Plans

On-Demand Instances

On-Demand means pay per second or hour with no commitment. It is the most flexible but most expensive per unit, ideal for short-term or unpredictable workloads.

Reserved Instances Basics

Reserved Instances require a 1- or 3-year commitment to an instance family and region. Standard RIs give the biggest discount but are less flexible; Convertible RIs are more flexible but cheaper discount.

Savings Plans Concept

Savings Plans are commitment-based discounts on a dollar-per-hour spend, not on a specific instance. You commit to a minimum $/hour for 1 or 3 years.

Savings Plans Types

Compute Savings Plans apply to any EC2, Fargate, and Lambda. EC2 Instance Savings Plans apply to a specific instance family in a region but allow size and OS flexibility.

Exam Pattern: Steady vs Variable

For long-running predictable workloads, pick Savings Plans or RIs. For variable or unknown workloads, use On-Demand until usage patterns are clear.

Spot Instances and Interruption-Tolerant Workloads

What Are Spot Instances?

Spot Instances use spare EC2 capacity at big discounts, but AWS can interrupt them with 2 minutes of warning when it needs the capacity back.

How Spot Is Priced

You no longer bid; you request capacity and pay the current Spot price, which changes with supply and demand. Discounts can be up to about 90% off On-Demand.

Ideal Workloads for Spot

Best for stateless or fault-tolerant services, batch processing, CI/CD, and ML training where start and finish times are flexible.

When to Avoid Spot

Do not rely solely on Spot for strict SLA workloads or single-instance stateful databases. Use On-Demand or Reserved for baseline, with Spot as optional extra capacity.

Exam Clues for Spot

Phrases like cost is highest priority and interruption-tolerant hint at Spot. Phrases like must not be interrupted mean Spot-only designs are wrong.

AWS Auto Scaling: Core Concepts and Policies

What Is an Auto Scaling Group?

An Auto Scaling group is a logical group of EC2 instances with min, max, and desired capacity, tied to subnets and often a load balancer.

Launch Templates and Policies

Launch templates define AMI, instance type, and networking. Scaling policies decide how desired capacity changes as metrics change.

Target Tracking Scaling

You pick a target metric value, like 50% CPU. Auto Scaling adds or removes instances to keep the metric near that target. This is the most exam-relevant policy.

Step and Scheduled Scaling

Step scaling uses thresholds and step changes; scheduled scaling changes capacity at specific times, useful for known daily or weekly patterns.

Cost-Aware Tips

Set a realistic minimum capacity, use metrics tied to user experience, and consider mixed-instance policies with Spot for cheap burst capacity.

Exam Hint: Spiky Traffic

When you see spiky or unpredictable traffic and a need to avoid overprovisioning, Auto Scaling with target tracking is usually the right design.

Designing a Cost-Efficient Auto Scaling Group

E-Commerce Scenario

An e-commerce site has spiky traffic, must span multiple AZs, and wants cost to scale with demand so idle capacity is minimized.

Instance and Purchase Choices

Use m7g.large general-purpose instances. Cover the 24/7 baseline with Savings Plans or RIs, and use Spot Instances for bursts via a mixed instances policy.

ASG Capacity Settings

Set min=4, max=40, desired=4. Attach the Auto Scaling group to an Application Load Balancer across at least two Availability Zones.

Target Tracking Policy

Configure target tracking on ALB request count per target, such as 1,000 requests per instance, so scaling follows real user demand.

Cost and Reliability Outcome

Baseline traffic runs on discounted capacity; bursts use cheap Spot. If Spot is reclaimed, On-Demand can cover, preserving both cost efficiency and availability.

Serverless and Containers as Cost-Optimization Tools

Serverless for Cost

Serverless compute like Lambda and Fargate charges per request and duration, not for idle servers, which is ideal for event-driven or spiky workloads.

When Serverless Wins

If average utilization on EC2 would be low, or jobs are short and intermittent, serverless often provides lower cost and less operational overhead.

Containers Overview

ECS and EKS let you run containers. On EC2 you manage nodes but can right-size and use Spot; with Fargate you avoid managing EC2 entirely.

Exam Clues for Containers

Microservices, containerized workloads, and needs for cost and portability suggest ECS or EKS with Auto Scaling and possibly Spot Instances.

Long-Running vs Spiky

Long-running, predictable CPU-heavy workloads may be cheaper on EC2 with Savings Plans, while spiky event-driven tasks fit Lambda or Fargate.

Tie-In to Data Pipelines

In ingestion and analytics pipelines, Lambda and Fargate let you pay only when data is flowing, aligning compute cost with actual processing.

Thought Exercise: Match Workloads to Cost Models

Work through these scenarios and decide how you would design compute for cost optimization. There are no single "correct" answers here, but your reasoning should match exam-style thinking.

  1. Real-time chat application
  • Traffic is 24/7 with predictable peaks in the evening.
  • Needs low latency and cannot drop connections.

Questions to ask yourself:

  • Which instance family? General-purpose or memory-optimized?
  • Would you use On-Demand, Savings Plans, or RIs?
  • What kind of Auto Scaling policy, if any?
  1. Image processing for a social app
  • When users upload photos, they are resized into multiple versions.
  • Uploads are bursty (e.g., after events) and unpredictable.

Think about:

  • EC2 vs Lambda vs Fargate: which is most cost-effective and simplest?
  • If you chose EC2, would Spot be safe here?
  1. Monthly financial report generation
  • A heavy ETL job runs once a month and can take up to 10 hours.
  • It can be restarted if it fails.

Consider:

  • Is this a good candidate for Spot Instances? Why or why not?
  • Would you pre-purchase RIs for this, or keep it On-Demand/Spot?

Take a minute to jot down your answers. Then, compare your reasoning with these hints:

  • Predictable, always-on baseline usually favors Savings Plans or RIs.
  • Bursty, event-driven tasks often favor serverless.
  • Interruption-tolerant, restartable batch jobs are great for Spot.

Quiz 1: EC2 Pricing and Right-Sizing

Test your understanding of EC2 pricing models and right‑sizing.

A startup runs a web API on a single m5.4xlarge instance at 15% CPU and 20% RAM, 24/7. Usage is steady and predictable. They want to reduce cost without changing performance or availability. Which approach is MOST cost-optimized and aligned with AWS best practices?

  1. Purchase a 3-year All Upfront Standard RI for the existing m5.4xlarge instance.
  2. Migrate to multiple smaller m7g instances in an Auto Scaling group behind an ALB and cover baseline usage with a 1- or 3-year Compute Savings Plan.
  3. Move the API to Lambda immediately, regardless of code changes required, because Lambda is always cheaper than EC2.
  4. Keep the current instance but enable step scaling policies to add more m5.4xlarge instances on high CPU.
Show Answer

Answer: B) Migrate to multiple smaller m7g instances in an Auto Scaling group behind an ALB and cover baseline usage with a 1- or 3-year Compute Savings Plan.

Option B both right-sizes (smaller m7g instances) and improves availability using an Auto Scaling group and ALB, while using a flexible Savings Plan for steady 24/7 usage. Option A locks in an oversized instance. Option C is incorrect because Lambda is not always cheaper and may require significant changes. Option D adds more large instances without addressing overprovisioning.

Quiz 2: Spot and Auto Scaling Design

Check your understanding of Spot Instances and Auto Scaling policies.

You design a big data processing system that runs MapReduce jobs on large datasets. Jobs are queued and can be restarted if interrupted, and there is no strict deadline. Cost minimization is the top priority. Which design is MOST appropriate?

  1. Run all worker nodes as On-Demand instances in an Auto Scaling group with scheduled scaling.
  2. Run all worker nodes as Spot Instances in an Auto Scaling group with target tracking on CPU utilization.
  3. Use a mixed instances Auto Scaling group with a baseline of On-Demand instances and additional capacity on Spot Instances.
  4. Run the jobs on a single very large Reserved Instance to maximize discount and avoid interruptions.
Show Answer

Answer: C) Use a mixed instances Auto Scaling group with a baseline of On-Demand instances and additional capacity on Spot Instances.

Option C balances cost and availability: Spot provides cheap capacity for batch jobs, while a small On-Demand baseline ensures some workers remain even if Spot is interrupted. Option B (all Spot) is risky if capacity disappears. Option A ignores the big savings from Spot. Option D centralizes work on one large instance and sacrifices scalability and resilience.

Key Term Review: Compute Cost Optimization

Flip through these cards to reinforce key concepts before moving on.

Right-sizing
The process of choosing the smallest EC2 instance type and size that still meets performance requirements, often by analyzing CPU, memory, and I/O usage and adjusting families and sizes accordingly.
On-Demand Instances
EC2 pricing model where you pay per second or hour with no long-term commitment; offers maximum flexibility but the highest unit cost, suitable for short-term or unpredictable workloads.
Reserved Instances (RIs)
Discounted EC2 capacity in exchange for a 1- or 3-year commitment to specific attributes (such as instance family and region). Standard RIs give higher discounts with less flexibility; Convertible RIs allow exchanging for different instance attributes.
Savings Plans
Commitment-based discount model where you commit to a certain $/hour of compute usage for 1 or 3 years. Compute Savings Plans apply broadly to EC2, Fargate, and Lambda; EC2 Instance Savings Plans apply to a specific instance family in a region.
Spot Instances
EC2 instances that use spare AWS capacity at steep discounts, but can be interrupted with 2 minutes of warning. Best for interruption-tolerant, flexible workloads like batch processing and CI/CD.
Auto Scaling group (ASG)
A logical grouping of EC2 instances with a defined min, max, and desired capacity, plus scaling policies that automatically add or remove instances based on demand.
Target tracking scaling policy
An Auto Scaling policy where you specify a target value for a metric (such as average CPU or requests per target), and AWS automatically adjusts capacity to keep that metric near the target.
Serverless compute
Compute model (such as AWS Lambda or AWS Fargate) where you do not manage servers and pay based on requests and duration or vCPU/memory seconds, ideal for event-driven and spiky workloads.
Cost optimization pillar
"The cost optimization pillar includes the continual process of refinement and improvement of a system over its entire lifecycle to build and operate cost-aware systems that achieve business outcomes and minimize costs."
Performance efficiency pillar
"The performance efficiency pillar focuses on the efficient use of computing resources to meet requirements and maintain that efficiency as demand changes and technologies evolve."

Key Terms

Right-sizing
The process of choosing the smallest EC2 instance type and size that still meets performance requirements, often by analyzing CPU, memory, and I/O usage and adjusting families and sizes accordingly.
Savings Plans
Commitment-based discount model where you commit to a certain $/hour of compute usage for 1 or 3 years. Compute Savings Plans apply broadly to EC2, Fargate, and Lambda; EC2 Instance Savings Plans apply to a specific instance family in a region.
Spot Instances
EC2 instances that use spare AWS capacity at steep discounts, but can be interrupted with 2 minutes of warning. Best for interruption-tolerant, flexible workloads like batch processing and CI/CD.
Serverless compute
Compute model (such as AWS Lambda or AWS Fargate) where you do not manage servers and pay based on requests and duration or vCPU/memory seconds, ideal for event-driven and spiky workloads.
On-Demand Instances
EC2 pricing model where you pay per second or hour with no long-term commitment; offers maximum flexibility but the highest unit cost, suitable for short-term or unpredictable workloads.
Auto Scaling group (ASG)
A logical grouping of EC2 instances with a defined min, max, and desired capacity, plus scaling policies that automatically add or remove instances based on demand.
Cost optimization pillar
The cost optimization pillar includes the continual process of refinement and improvement of a system over its entire lifecycle to build and operate cost-aware systems that achieve business outcomes and minimize costs.
Reserved Instances (RIs)
Discounted EC2 capacity in exchange for a 1- or 3-year commitment to specific attributes (such as instance family and region). Standard RIs give higher discounts with less flexibility; Convertible RIs allow exchanging for different instance attributes.
Performance efficiency pillar
The performance efficiency pillar focuses on the efficient use of computing resources to meet requirements and maintain that efficiency as demand changes and technologies evolve.
Target tracking scaling policy
An Auto Scaling policy where you specify a target value for a metric (such as average CPU or requests per target), and AWS automatically adjusts capacity to keep that metric near the target.

Finished reading?

Test your understanding with a custom practice exam on this chapter.

Test yourself