SkarpSkarp

Chapter 6 of 10

Storage, Databases, and Network Building Blocks

Connect the dots between where data lives, how it’s structured, and how it travels across AWS so you can quickly recognize which storage, database, and networking services best fit common exam scenarios.

15 min readen

Step 1: The Big Picture – Data at Rest, in Use, and in Motion

Three Core Questions

In AWS, most scenarios boil down to: 1) Where does the data live? 2) How is it structured and queried? 3) How does it travel securely and reliably?

Link to Earlier Modules

Compute runs the code that uses data. IAM and security control who can access that data. Now we focus on storage, databases, and networking.

Three Mental Buckets

Think in three buckets: Storage = where bits sit; Databases = how data is organized; Networking = how components talk to each other securely.

Step 2: Core AWS Storage Categories and When to Use Them

Object Storage: S3

Amazon S3 stores data as objects in buckets. Ideal for images, videos, logs, backups, and static website assets. Think: large, unstructured data at web scale.

Block Storage: EBS

Amazon EBS provides virtual disks for EC2 instances. Use it for OS volumes, databases on EC2, and apps needing low-latency disk attached to a single instance.

File Storage: EFS and FSx

EFS is a scalable NFS file system for Linux EC2 across AZs. FSx provides specialized file systems like Windows File Server or Lustre for specific workloads.

Archive and Backup

S3 Glacier tiers offer low-cost archival storage with slower retrieval. AWS Backup centralizes and automates backups for EBS, RDS, EFS, DynamoDB, and more.

Step 3: Storage Decision Scenarios

Static Website Assets

Website images and CSS for global users: store in Amazon S3 and optionally use CloudFront to cache and deliver them closer to users worldwide.

Databases on EC2

Self-managed PostgreSQL on a single EC2 instance: use EBS volumes for low-latency block storage attached directly to that instance.

Shared Workspace

Multiple Linux EC2 instances need to read and write the same files: choose Amazon EFS, a managed, shared NFS file system.

Compliance Archives

Regulated data kept for 7–10 years and rarely accessed: use S3 Glacier tiers, especially Deep Archive, to minimize long-term storage cost.

Step 4: Relational vs Non-Relational Databases on AWS

Relational Databases

Relational databases use tables, rows, and columns with SQL, joins, and ACID transactions. On AWS, think Amazon RDS and Amazon Aurora for managed relational engines.

Non-Relational Databases

Non-relational (NoSQL) databases are schema-flexible and scale horizontally. Key AWS options: DynamoDB for key-value/doc and ElastiCache for in-memory caching.

Clues for RDS/Aurora

Look for complex joins, transactions, or using MySQL, PostgreSQL, Oracle, or SQL Server. These point to Amazon RDS or Aurora as the right service.

Clues for DynamoDB/ElastiCache

Look for millions of requests per second, very low latency, or serverless NoSQL for DynamoDB; look for caching and offloading reads for ElastiCache.

Step 5: Database Thought Exercise – Match the Workload

Match each workload to the most appropriate AWS database service at a high level. Think it through before peeking at the suggested answers.

  1. E-commerce order system
  • Requirements: track orders, payments, inventory; strong consistency and transactions; existing team knows PostgreSQL.
  • Which service? Write down your guess.
  1. Mobile game leaderboard
  • Requirements: extremely high read/write throughput, simple access patterns (userID → score), auto scaling, low ops overhead.
  • Which service?
  1. User session cache for a web app
  • Requirements: ultra-fast access, data can be lost without breaking the system, primarily key-based lookups.
  • Which service?
  1. Existing on-prem Oracle database migration with minimal code changes
  • Requirements: lift-and-shift, keep Oracle engine, reduce management overhead.
  • Which service?

Scroll down for suggested mappings.

Suggested mappings

  1. E-commerce order system → Amazon RDS for PostgreSQL or Amazon Aurora PostgreSQL-Compatible Edition.
  2. Mobile game leaderboard → Amazon DynamoDB.
  3. User session cache → Amazon ElastiCache (Redis engine is common).
  4. On-prem Oracle migration → Amazon RDS for Oracle.

As you review, focus on why each choice fits: transactions vs scale vs caching vs engine compatibility.

Step 6: Foundational AWS Networking Concepts

Amazon VPC

A VPC is your logically isolated network in AWS. You control IP ranges, subnets, routing, and gateways, like a virtual data center in the cloud.

Subnets and Internet Access

Public subnets route to an Internet Gateway and can reach the internet. Private subnets have no direct internet route and often host databases or internal services.

IGW and NAT

An Internet Gateway connects your VPC to the public internet. NAT gateways let private subnet instances make outbound internet calls without being directly reachable.

Route 53 and CloudFront

Route 53 is AWS DNS that maps domain names to IPs. CloudFront is a CDN that caches content at edge locations to reduce latency for global users.

Hybrid Connectivity

Site-to-Site VPN gives encrypted connectivity over the internet. Direct Connect provides dedicated, private network links with more consistent performance.

Step 7: Putting It Together – A Simple 3-Tier Web App

Scenario Overview

A 3-tier web app: users access a website, EC2 instances run the app, user photos and profiles are stored, and an on-prem network needs secure DB access.

Networking Layer

Use a VPC with public and private subnets. ALB and NAT in public subnets; EC2 and RDS in private subnets; IGW for internet; VPN or Direct Connect for on-prem.

Storage Layer

Store user-uploaded photos in Amazon S3, possibly behind CloudFront. Use EBS volumes for EC2 OS and app data that require block storage.

Database and DNS

Run Amazon RDS (or Aurora) in private subnets for profile data. Optionally add ElastiCache. Route 53 maps www.example.com to the ALB endpoint.

Step 8: Quick Check – Storage and Databases

Test your understanding of storage and database choices.

A startup needs to store clickstream logs from a high-traffic website for analytics. The data volume is huge, structure is flexible, and they want low-cost, durable storage that can later be queried with analytics tools. Which AWS service is the best primary storage choice?

  1. Amazon RDS for PostgreSQL
  2. Amazon DynamoDB
  3. Amazon S3
  4. Amazon EBS
Show Answer

Answer: C) Amazon S3

Amazon S3 is ideal for large volumes of semi-structured data like logs. It is durable, low-cost, and integrates with analytics tools such as Amazon Athena, EMR, and Glue. RDS and DynamoDB are databases, not the best fit for raw log storage at this scale; EBS is tied to a single instance and not optimized for massive, shared data lakes.

Step 9: Quick Check – Networking Basics

Check your grasp of foundational networking concepts.

You have an EC2-based application server in a private subnet that needs to download OS updates from the internet, but must not be directly reachable from the internet. What should you use?

  1. Attach an Internet Gateway directly to the instance
  2. Place the instance in a public subnet
  3. Use a NAT Gateway in a public subnet
  4. Use Amazon CloudFront
Show Answer

Answer: C) Use a NAT Gateway in a public subnet

A NAT Gateway in a public subnet allows instances in private subnets to initiate outbound connections to the internet while remaining unreachable from the internet. An Internet Gateway or public subnet would expose them; CloudFront is a CDN, not a solution for outbound access from private subnets.

Step 10: Flashcard Review – Key Terms

Flip through these cards to reinforce core storage, database, and networking terms.

Amazon S3
Object storage service for storing and retrieving any amount of data, often used for static assets, backups, and data lakes.
Amazon EBS
Block storage volumes attached to EC2 instances, suitable for OS volumes and databases running on a single instance.
Amazon EFS
Managed, scalable NFS file system for Linux, allowing multiple EC2 instances to share the same files.
Amazon RDS
Managed relational database service supporting engines like MySQL, PostgreSQL, MariaDB, Oracle, and SQL Server.
Amazon DynamoDB
Fully managed NoSQL key-value and document database, designed for high throughput and low latency at scale.
Amazon ElastiCache
Managed in-memory caching service compatible with Redis and Memcached, used to speed up applications and reduce database load.
Amazon VPC
Virtual Private Cloud; a logically isolated section of AWS where you define your own networking environment.
Public vs Private Subnet
Public subnet has a route to an Internet Gateway; private subnet does not, and typically hosts internal resources like databases.
Amazon Route 53
Highly available and scalable DNS service used to translate domain names into IP addresses and route traffic.
Amazon CloudFront
Content Delivery Network (CDN) that caches content at edge locations to reduce latency for users worldwide.
AWS Site-to-Site VPN
Encrypted connection over the public internet between your on-premises network and your AWS VPC.
AWS Direct Connect
Dedicated, private network connection from your premises to AWS, providing more consistent performance than VPN.

Key Terms

Subnet
Segment of a VPC's IP address range where you place resources; can be public or private based on routing.
Amazon S3
Simple Storage Service; scalable, durable object storage for unstructured data such as files, logs, and backups.
AWS Backup
Centralized backup service for automating and managing backups across AWS resources.
Amazon EBS
Elastic Block Store; block-level storage volumes that attach to EC2 instances for low-latency disk access.
Amazon EFS
Elastic File System; managed NFS file system for Linux workloads, accessible from multiple EC2 instances.
Amazon FSx
Family of managed file systems (e.g., Windows File Server, Lustre) optimized for specific workloads.
Amazon RDS
Relational Database Service; managed service for traditional SQL databases such as MySQL and PostgreSQL.
Amazon VPC
Virtual Private Cloud; isolated virtual network environment in AWS where you control IP ranges, subnets, and routing.
S3 Glacier
Set of S3 storage classes (Instant Retrieval, Flexible Retrieval, Deep Archive) optimized for low-cost archival storage.
NAT Gateway
Managed network address translation service enabling instances in private subnets to access the internet outbound.
Amazon Aurora
Cloud-optimized relational database compatible with MySQL and PostgreSQL, offering high performance and availability.
Amazon DynamoDB
Managed NoSQL key-value and document database designed for massive scale and low latency.
Amazon Route 53
Scalable DNS and domain registration service that routes users to applications using domain names.
Internet Gateway
Horizontally scaled, redundant VPC component that allows communication between resources in a VPC and the internet.
Amazon CloudFront
Content Delivery Network that caches and serves content from edge locations to reduce latency.
AWS Direct Connect
Service that provides a dedicated network connection from your premises to AWS for consistent performance.
Amazon ElastiCache
Managed in-memory data store and cache, compatible with Redis and Memcached.
AWS Site-to-Site VPN
Service that creates an IPsec VPN connection between your on-premises network and your AWS VPC.

Finished reading?

Test your understanding with a custom practice exam on this chapter.

Test yourself