SkarpSkarp

Chapter 16 of 26

High-Performing and Elastic Compute: EC2 Instance Types and AWS Auto Scaling

Not all EC2 instances are created equal; master instance families, purchasing options, and Auto Scaling policies to hit performance targets without overprovisioning.

27 min readen

Module Map: EC2 Performance and Elasticity

Where This Module Fits

You now connect three ideas: choosing EC2 instance types, right-sizing and purchasing options, and using AWS Auto Scaling so compute expands and contracts automatically as load changes.

Well-Architected Context

This module lives mainly in the Performance efficiency and Cost optimization pillars of the AWS Well-Architected Framework, which guide how you design fast, cost-aware architectures.

Your Outcomes

By the end, you should map workloads to EC2 families, right-size instances, choose purchasing options, and reason about target tracking, step, and scheduled scaling on exam scenarios.

EC2 Instance Families: Mental Map for the Exam

Why Families Matter

EC2 instance families are optimized for different resource balances. Many exam questions boil down to: which family best fits this workload’s CPU, memory, storage, and networking profile?

General and Compute Optimized

General purpose (A, T, M) are balanced; T is burstable. Compute optimized (C) suit CPU-bound tasks like batch processing and high-performance web servers.

Memory, Storage, Accelerated

Memory optimized (R, X) suit in-memory DBs. Storage optimized (I, D, H) provide high local SSD/HDD IOPS. Accelerated (P, G, Trn, Inf, F) add GPUs or special chips for ML and graphics.

Mapping Workloads to Instance Families

Scenario: Spiky Web App

Small REST API, low average CPU, occasional spikes, cost sensitive. Best fit: T family (burstable general purpose), such as t3.medium. Compute-optimized C family would be overkill.

Scenario: OLTP Database

Latency-sensitive DB with large working set and high I/O. Best fit: memory optimized R family plus io2 EBS. Storage-optimized with ephemeral disks is risky for primary DB data.

Scenario: ML Training

Deep learning training jobs need GPUs and can tolerate interruption. Best fit: accelerated computing (P or G family) on Spot Instances with checkpointing to resume if interrupted.

Instance Sizes, Right-Sizing, and Burstable T Instances

What is Right-Sizing?

Right-sizing means choosing the smallest EC2 instance that still meets performance requirements, based on CPU, memory, network, and disk utilization over time.

Reading Utilization

If CPU averages 10–20% and memory is low, you are likely over-provisioned. Look at CloudWatch metrics for CPU, memory (via agent), and network/disk saturation to guide downsizing.

Burstable T Instances

T3/T4g earn CPU credits when idle and spend them when bursting. Great for low-average, spiky workloads. For sustained high CPU, move to non-burstable families like M or C.

Purchasing Options: Performance vs Cost

On-Demand and Commitments

On-Demand is pay-as-you-go for unpredictable or new workloads. Savings Plans and Reserved Instances give discounts for 1–3 year usage commitments, ideal for steady-state production.

Spot Instances

Spot uses spare capacity at steep discounts but can be interrupted with 2 minutes notice. Great for fault-tolerant, stateless, or batch jobs where interruptions are acceptable.

Performance vs Cost

Purchasing options don’t change raw performance, but they decide if you can afford enough capacity. Spot affects reliability, so Auto Scaling must handle sudden instance terminations.

AWS Auto Scaling Basics: Groups, Policies, Health

What is an Auto Scaling Group?

An ASG is a logical group of EC2 instances with a launch template, min/max/desired capacity, and subnets. It can add or remove instances automatically.

How Scaling Works

You attach an ASG to a load balancer and define policies, such as keeping average CPU at 50%. The ASG adds instances when load rises and terminates them when load falls.

Health and Reliability

ASGs use EC2 and optional ELB health checks. Unhealthy instances are replaced, supporting the Reliability pillar by keeping the workload functioning correctly over time.

Scaling Policies: Target Tracking, Step, and Scheduled

Target Tracking

Target tracking is like a thermostat. You set a target metric value (for example, CPU 50%), and Auto Scaling adjusts capacity to keep the metric near that target.

Step Scaling

Step scaling uses CloudWatch alarms and step adjustments. For example, if CPU > 60% add 1 instance; if CPU > 80% add 3. It gives granular control but needs more tuning.

Scheduled Scaling

Scheduled scaling runs at fixed times, such as increasing capacity at 08:00 and reducing at 20:00. It suits predictable patterns and is often combined with dynamic policies.

Designing a High-Performing, Elastic Web Tier

E-Commerce Use Case

E-commerce site with low night traffic, daytime steady load, and flash-sale spikes. Needs low latency, high availability across AZs, and strong cost control.

Instances and ASG

Pick general purpose instances for stateless web servers. Create a multi-AZ Auto Scaling group with a launch template and attach an Application Load Balancer for HTTP/HTTPS.

Policies and Cost Tweaks

Use target tracking on CPU, scheduled scaling before flash sales, and optional step scaling on ALB requests. Offload background jobs to Spot-based workers behind SQS.

Thought Exercise: Right-Sizing and Scaling Choices

Work through this scenario mentally. You do not need to write code; just reason step by step.

Scenario: A startup runs an API in one Region. Current setup:

  • 4 `m5.large` instances in an Auto Scaling group.
  • Average CPU is 18%, memory 30%, network well below limits.
  • Traffic pattern: predictable weekday peak from 09:00–18:00, low on nights and weekends.
  • Management complains that the EC2 bill is too high.

Questions to think through:

  1. Right-sizing
  • Given the utilization, would you scale up or down the instance size? Would you consider a different family (for example, `t3.medium`)? Why?
  1. Purchasing model
  • The workload is steady on weekdays and has been running for over a year. Would you recommend On-Demand, Savings Plans, RIs, or Spot for the main API instances? Why?
  1. Scaling policies
  • Which combination of scaling policies would you choose to handle the weekday peaks and nightly lows? Consider target tracking vs step vs scheduled scaling.
  1. Performance efficiency
  • How does your design embody the performance efficiency pillar: "The performance efficiency pillar focuses on the efficient use of computing resources to meet requirements and maintain that efficiency as demand changes and technologies evolve."?

Pause and sketch an answer. Then compare against this reference approach:

  • Right-size to fewer, smaller instances (for example, `t3.medium` or `m6i.medium`) and adjust desired capacity.
  • Use Compute Savings Plans or RIs for baseline capacity.
  • Combine scheduled scaling (weekday patterns) with target tracking for unexpected spikes.
  • Ensure stateless design so scaling does not hurt user experience.

Quiz 1: Instance Families and Right-Sizing

Check your understanding of instance families and right-sizing.

A company runs an in-memory analytics engine that keeps several hundred GB of data in RAM and occasionally writes results to S3. CPU is moderate, but if memory is constrained, performance drops sharply. Which EC2 choice is MOST appropriate to improve performance while controlling cost?

  1. M6i.4xlarge general purpose instances with larger EBS volumes
  2. R6i.4xlarge memory optimized instances with similar vCPU count
  3. C6i.4xlarge compute optimized instances with higher CPU clock speed
  4. I4i.4xlarge storage optimized instances with high local NVMe SSD
Show Answer

Answer: B) R6i.4xlarge memory optimized instances with similar vCPU count

This workload is clearly memory bound: it keeps hundreds of GB in RAM and performance drops when memory is constrained. Memory optimized R6i instances are designed for high memory per vCPU, making option B the best fit. General purpose (A) may not provide enough memory density. Compute optimized (C) targets CPU-bound workloads. Storage optimized (D) focuses on local disk IOPS, which is not the bottleneck here.

Quiz 2: Auto Scaling Policies

Test your understanding of scaling policy types.

Your web application experiences predictable traffic: it increases sharply at 08:30 on weekdays and drops after 19:00. You also want to automatically add capacity if CPU suddenly spikes during a marketing campaign. Which combination of scaling policies is MOST appropriate?

  1. Target tracking only, based on average CPU utilization
  2. Step scaling only, based on CPU thresholds
  3. Scheduled scaling for weekday patterns plus target tracking on CPU
  4. Scheduled scaling for weekday patterns plus step scaling on a fixed schedule
Show Answer

Answer: C) Scheduled scaling for weekday patterns plus target tracking on CPU

You have both predictable daily patterns and potential unexpected spikes. Scheduled scaling handles the known weekday increase and decrease. Target tracking on CPU automatically adds or removes instances when CPU deviates from the target, even during unplanned campaigns. Step scaling alone does not address predictable time-based changes as cleanly, and using step scaling on a fixed schedule (option D) misunderstands how step scaling works.

Key Term Flashcards: EC2 and Auto Scaling

Flip through these cards to reinforce key terms and exam-ready definitions.

General purpose instances (A, T, M)
EC2 families that provide a balance of compute, memory, and networking resources. Suitable for a wide range of workloads including web servers, application servers, and small databases.
Compute optimized instances (C)
EC2 families with a high ratio of CPU to memory, ideal for compute-bound applications like high-performance web servers, batch processing, and scientific modeling.
Memory optimized instances (R, X)
EC2 families designed to deliver fast performance for workloads that process large data sets in memory, such as in-memory databases and real-time big data analytics.
Storage optimized instances (I, D, H)
EC2 families optimized for workloads that require high, sequential read and write access to very large data sets on local storage, such as NoSQL databases and data warehousing.
Accelerated computing instances (P, G, Trn, Inf, F)
EC2 families that use hardware accelerators like GPUs, FPGAs, or custom ASICs for tasks such as machine learning, graphics rendering, and high-performance computing.
Right-sizing
The process of matching EC2 instance types and sizes to workload performance and utilization characteristics, aiming to use the smallest instance that still meets requirements.
On-Demand Instances
EC2 purchasing option where you pay for compute capacity by the second or hour with no long-term commitments, best for short-term, unpredictable workloads.
Spot Instances
EC2 purchasing option that uses spare capacity at significant discounts but can be interrupted with 2 minutes notice, suitable for fault-tolerant and flexible workloads.
Auto Scaling group (ASG)
A logical group of EC2 instances managed together for automatic scaling and health replacement, defined by a launch template/configuration and capacity limits.
Target tracking scaling policy
An Auto Scaling policy type where you specify a target value for a metric (such as average CPU) and the service adjusts capacity to keep the metric near that target.
Step scaling policy
An Auto Scaling policy that uses CloudWatch alarms and step adjustments to change capacity by different amounts depending on how far a metric deviates from thresholds.
Scheduled scaling
An Auto Scaling feature that changes the desired capacity of an Auto Scaling group at specific times and dates, useful for predictable traffic patterns.
Performance efficiency pillar (Well-Architected)
The performance efficiency pillar focuses on the efficient use of computing resources to meet requirements and maintain that efficiency as demand changes and technologies evolve.
Cost optimization pillar (Well-Architected)
The cost optimization pillar includes the continual process of refinement and improvement of a system over its entire lifecycle to build and operate cost-aware systems that achieve business outcomes and minimize costs.

Connecting Back to the Well-Architected Pillars and Next Steps

Pillars in Action

You used EC2 and Auto Scaling to apply performance efficiency, cost optimization, and reliability, choosing the right instance families, right-sizing, and building elastic, multi-AZ designs.

Exam Readiness

Be ready to map workloads to families, spot over-provisioning, choose On-Demand vs Savings vs Spot, and match scaling policies to patterns like predictable peaks or sudden spikes.

Your Study Path

Your next Skarp mock exam and gap guide will show where EC2 and Auto Scaling are solid for you and where to focus. Upcoming modules will combine these compute choices with managed databases.

Key Terms

Right-sizing
The process of matching EC2 instance types and sizes to workload performance and utilization characteristics, aiming to use the smallest instance that still meets requirements.
Spot Instances
EC2 purchasing option that uses spare capacity at significant discounts but can be interrupted with 2 minutes notice, suitable for fault-tolerant and flexible workloads.
Scheduled scaling
An Auto Scaling feature that changes the desired capacity of an Auto Scaling group at specific times and dates, useful for predictable traffic patterns.
On-Demand Instances
EC2 purchasing option where you pay for compute capacity by the second or hour with no long-term commitments, best for short-term, unpredictable workloads.
Step scaling policy
An Auto Scaling policy that uses CloudWatch alarms and step adjustments to change capacity by different amounts depending on how far a metric deviates from thresholds.
Auto Scaling group (ASG)
A logical group of EC2 instances managed together for automatic scaling and health replacement, defined by a launch template/configuration and capacity limits.
Cost optimization pillar
The cost optimization pillar includes the continual process of refinement and improvement of a system over its entire lifecycle to build and operate cost-aware systems that achieve business outcomes and minimize costs.
General purpose instances
EC2 families that provide a balance of compute, memory, and networking resources, suitable for a wide range of common workloads.
Memory optimized instances
EC2 families designed for workloads that process large data sets in memory.
Compute optimized instances
EC2 families with a high ratio of CPU to memory, ideal for compute-bound workloads.
Storage optimized instances
EC2 families optimized for workloads requiring high, sequential read and write access to large data sets on local storage.
Performance efficiency pillar
The performance efficiency pillar focuses on the efficient use of computing resources to meet requirements and maintain that efficiency as demand changes and technologies evolve.
AWS Well-Architected Framework
The AWS Well-Architected Framework provides a consistent set of best practices for customers and partners to evaluate architectures, and a set of questions you can use to evaluate how well an architecture is aligned to AWS best practices.
Target tracking scaling policy
An Auto Scaling policy type where you specify a target value for a metric (such as average CPU) and the service adjusts capacity to keep the metric near that target.
Accelerated computing instances
EC2 families that use hardware accelerators like GPUs, FPGAs, or custom ASICs for specialized compute-intensive tasks.

Finished reading?

Test your understanding with a custom practice exam on this chapter.

Test yourself