SkarpSkarp

Chapter 25 of 26

Integrating Sustainability and Operational Excellence into AWS Architectures

While not separate exam domains, sustainability and operational excellence are increasingly relevant in real-world designs. This module shows how to weave these considerations into architectures without losing focus on core exam objectives.

27 min readen

Big Picture: Why Sustainability and Operational Excellence Matter

Why This Module

Sustainability and operational excellence now shape how modern AWS workloads are designed and run. They appear in exam scenarios even when not named explicitly.

Link to Other Pillars

You have already seen Security, Reliability, Performance Efficiency, and Cost Optimization. Sustainability was added as the sixth pillar in 2021, and operational excellence ties them together.

Focus of This Module

We will: 1) define the Sustainability pillar, 2) spot patterns that reduce environmental impact, and 3) use automation and observability to support excellence.

Exam Angle

On the exam, the best answer often meets requirements while minimizing idle resources, using managed/serverless services, and automating operations to reduce waste.

The Sustainability Pillar: Canonical Definition and Scope

Canonical Definition

AWS: "The sustainability pillar focuses on minimizing the environmental impacts of running cloud workloads by maximizing utilization and minimizing the resources required, and by reducing the energy required to deliver business value."

What This Really Means

You aim to deliver the same or more business value with fewer resources and less energy: avoid idle capacity, avoid unnecessary work, and reduce data bloat.

Utilization and Waste

High utilization without harming performance or reliability is good. Idle EC2 instances, over-sized databases, or unused storage are both costly and less sustainable.

Patterns to Watch For

Right-sizing, serverless and managed services, auto-scaling, and data lifecycle policies are all sustainability-friendly patterns that also often save money.

Operational Excellence: What It Is and Why It Supports All Pillars

What Is Operational Excellence

Operational excellence is about how you organize, run, and evolve workloads: clear processes, automation, observability, and continuous improvement.

Key Practices

Use runbooks, automate repetitive tasks, define infrastructure as code, and instrument workloads with logs, metrics, and traces.

Supports All Pillars

Automation and observability improve security, reliability, performance efficiency, cost optimization, and sustainability by reducing errors and waste.

Exam Signals

Prefer answers that use infrastructure as code, built-in monitoring and alarms, automated scaling and backups, and documented operational procedures.

Designing for Efficient Compute Utilization

Scenario Setup

Retail web app with variable traffic needs high availability, security, cost-effectiveness, and good performance. How do we design compute for this?

Option A: Fixed EC2 Fleet

Two large EC2 instances in an Auto Scaling group (min 2, max 4) running 24/7 with manual SSH deployments. Over-provisioned and error-prone.

Option B: Fargate + CloudFront

Use ECS on Fargate behind an ALB, auto scale tasks, serve static content via S3 + CloudFront, and deploy via CI/CD pipelines.

Why B Is Better

B scales with demand, improves utilization, simplifies patching and deployments, and uses caching to reduce repeated compute and network load.

Data and Storage: Lifecycle, Tiering, and Deletion

Why Data Management Matters

Unmanaged data grows endlessly, driving up storage, backup, and compute costs. Sustainable designs actively manage data through its lifecycle.

S3 Lifecycle Policies

Transition infrequently accessed objects to IA, Intelligent-Tiering, or Glacier, and delete data after defined retention periods automatically.

Right-Sizing Databases

Choose the smallest RDS/Aurora instances that meet needs, or use Aurora Serverless v2 for variable workloads to avoid idle capacity.

Retention, Compression, Partitioning

Set explicit log/backup retention, use AWS Backup lifecycles, and for analytics use compressed columnar formats and partitions to scan less data.

Automation and Observability as Enablers of Excellence

Role of Automation

Automation via IaC, CI/CD, and Systems Manager reduces manual work, errors, and makes it easy to roll out consistent changes across environments.

Automated Operations

Use Systems Manager Automation, Patch Manager, and Lambda triggered by events to handle patching, backups, and simple remediation automatically.

Observability Basics

CloudWatch metrics, logs, and alarms plus X-Ray traces and centralized log analysis give you visibility into health, performance, and usage.

Why It Matters

With observability, you can find idle resources, detect issues early, and measure optimization impact, directly supporting all pillars including sustainability.

Thought Exercise: Balancing Sustainability, Cost, and Performance

Work through this scenario mentally and pick a design direction. There is no single perfect answer, but some options align better with sustainability and operational excellence.

Scenario:

You are designing an internal reporting system for a medium-sized company. Requirements:

  • Analysts run heavy SQL queries on sales data a few times per day.
  • Data volume: about 2 TB, growing slowly.
  • Queries are not latency-critical; a few minutes is acceptable.
  • The team wants to minimize ongoing operational effort.

You are comparing three approaches:

  1. Always-on RDS for PostgreSQL on a large instance, with data loaded from operational databases every hour.
  2. Amazon Redshift cluster sized for peak load, running 24/7, with nightly batch loads.
  3. S3-based data lake with data stored as compressed Parquet in S3, queried via Amazon Athena, with Glue jobs to transform data.

Questions to reflect on:

  • Which option best minimizes idle capacity while still meeting requirements?
  • Which option reduces the need for manual operations (patching, scaling, backups)?
  • How would you add observability to ensure queries remain within acceptable performance?

Pause for a moment and decide which option you would recommend and why. Then compare your reasoning to the guidance on the next slide or when you continue the module.

Worked Scenario: Choosing a Sustainable Analytics Architecture

Reviewing Option 1: RDS

RDS is simple but runs 24/7, even when idle. Scaling and maintenance are manual; you manage more operations and pay for idle capacity.

Reviewing Option 2: Redshift

Redshift is powerful for analytics but also runs 24/7. You still manage cluster sizing and maintenance, leading to potential over-provisioning.

Reviewing Option 3: S3 + Athena + Glue

Serverless, pay-per-use querying and ETL on S3 data. No clusters to manage, and you can use compressed columnar formats to scan less data.

Why Option 3 Wins Here

Option 3 minimizes idle capacity, reduces operations overhead, and aligns with sustainability and cost goals while meeting relaxed latency requirements.

Quick Check: Sustainability Pillar and Resource Utilization

Test your understanding of the Sustainability pillar and how it relates to resource utilization.

Which design choice most clearly aligns with the Sustainability pillar for a batch processing workload that runs once per hour?

  1. Provision a large EC2 instance that runs 24/7 and triggers a cron job every hour.
  2. Use AWS Lambda with an EventBridge rule to trigger processing, ensuring functions run only when needed.
  3. Run the workload on a fixed-size Amazon EMR cluster that you manually start once and keep running.
  4. Deploy the workload on an over-provisioned RDS instance to ensure there is always spare capacity.
Show Answer

Answer: B) Use AWS Lambda with an EventBridge rule to trigger processing, ensuring functions run only when needed.

The Sustainability pillar focuses on minimizing environmental impacts by maximizing utilization and minimizing resources required. Using AWS Lambda with EventBridge means compute runs only when needed, with no idle capacity between runs. A 24/7 EC2 or always-on EMR cluster wastes resources between jobs, and over-provisioned RDS does not fit a batch processing pattern.

Quick Check: Operational Excellence and Observability

Test how well you can recognize operational excellence patterns in AWS designs.

A team wants to improve operational excellence for their microservices on AWS. Which combination of actions best supports this goal?

  1. Use manual deployments through the console and rely on instance-level logs only.
  2. Adopt CloudFormation for infrastructure, set up CloudWatch metrics and alarms, and automate deployments via CodePipeline.
  3. Increase EC2 instance sizes to reduce the chance of CPU throttling and avoid the need for monitoring.
  4. Disable detailed monitoring to reduce CloudWatch costs and focus only on application logs.
Show Answer

Answer: B) Adopt CloudFormation for infrastructure, set up CloudWatch metrics and alarms, and automate deployments via CodePipeline.

Operational excellence emphasizes automation, observability, and repeatable processes. Using CloudFormation, CloudWatch metrics and alarms, and automated deployments via CodePipeline directly supports this. Manual deployments, avoiding monitoring, or just over-provisioning compute do not align with operational excellence.

Key Term Review: Sustainability and Operational Excellence

Use these flashcards to reinforce core definitions and patterns.

AWS Well-Architected Framework
The AWS Well-Architected Framework provides a consistent set of best practices for customers and partners to evaluate architectures, and a set of questions you can use to evaluate how well an architecture is aligned to AWS best practices.
Sustainability pillar (definition)
The sustainability pillar focuses on minimizing the environmental impacts of running cloud workloads by maximizing utilization and minimizing the resources required, and by reducing the energy required to deliver business value.
Performance efficiency pillar (definition)
The performance efficiency pillar focuses on the efficient use of computing resources to meet requirements and maintain that efficiency as demand changes and technologies evolve.
Cost optimization pillar (definition)
The cost optimization pillar includes the continual process of refinement and improvement of a system over its entire lifecycle to build and operate cost-aware systems that achieve business outcomes and minimize costs.
Operational excellence (concept)
An AWS Well-Architected lens focused on how you organize, run, and evolve workloads using clear processes, automation, observability, and continuous improvement to support all other pillars.
Infrastructure as Code (IaC)
The practice of defining and provisioning infrastructure using machine-readable templates or code (for example, AWS CloudFormation, AWS CDK), enabling repeatable, automated deployments.
Serverless sustainability advantage
Serverless services like AWS Lambda, Fargate, and Athena typically run only when needed and scale automatically, reducing idle capacity and improving utilization.
S3 Lifecycle policy
Configuration on an S3 bucket or prefix that automatically transitions objects between storage classes or deletes them after specified periods, supporting cost and sustainability goals.
Observability
The ability to understand the internal state of a system through telemetry such as logs, metrics, and traces, enabling detection of issues and optimization opportunities.
Right-sizing
Adjusting resource types and sizes (for example, EC2 instance types, database classes) to match actual workload needs, avoiding both over-provisioning and under-provisioning.

Common Exam Patterns and Traps

Over-Provisioning Trap

Watch for answers that use very large, always-on instances "to be safe". Prefer auto scaling or serverless when they meet requirements.

Data Hoarding Trap

Keeping all logs or backups forever in S3 Standard is rarely best. Look for lifecycle policies and explicit retention periods.

Manual Ops Trap

Manual checks and ad-hoc changes are red flags. Prefer designs with CloudWatch alarms, Systems Manager, and infrastructure as code.

Telemetry Trap

Disabling monitoring to save a little money usually backfires. Sufficient observability is needed to optimize cost and sustainability long term.

Key Terms

Serverless
A cloud execution model where the cloud provider automatically manages infrastructure provisioning, scaling, and maintenance, and you are billed based on actual usage rather than pre-allocated capacity.
Right-sizing
Adjusting resource configurations (such as EC2 instance types, database classes, and storage tiers) to closely match actual workload requirements, reducing both over-provisioning and under-provisioning.
Observability
The ability to understand the internal state of a system based on the data it produces, such as logs, metrics, and traces, enabling effective monitoring, troubleshooting, and optimization.
S3 Lifecycle policy
A configuration on an Amazon S3 bucket or prefix that automatically transitions objects between storage classes or expires (deletes) them after specified periods, supporting cost and sustainability goals.
Sustainability pillar
The sustainability pillar focuses on minimizing the environmental impacts of running cloud workloads by maximizing utilization and minimizing the resources required, and by reducing the energy required to deliver business value.
Operational excellence
An AWS Well-Architected lens focused on how you organize, run, and evolve workloads using clear processes, automation, observability, and continuous improvement to support all other pillars.
Cost optimization pillar
The cost optimization pillar includes the continual process of refinement and improvement of a system over its entire lifecycle to build and operate cost-aware systems that achieve business outcomes and minimize costs.
Infrastructure as Code (IaC)
The practice of managing and provisioning infrastructure using machine-readable definition files or code, such as AWS CloudFormation or AWS CDK, enabling consistent, automated deployments.
Performance efficiency pillar
The performance efficiency pillar focuses on the efficient use of computing resources to meet requirements and maintain that efficiency as demand changes and technologies evolve.
AWS Well-Architected Framework
The AWS Well-Architected Framework provides a consistent set of best practices for customers and partners to evaluate architectures, and a set of questions you can use to evaluate how well an architecture is aligned to AWS best practices.

Finished reading?

Test your understanding with a custom practice exam on this chapter.

Test yourself