Module 7: Databases, Analytics, and AI/ML Services at a Glance - AWS Cloud Practitioner CLF-C02: Focused Exam Prep

Q: A startup needs to store shopping cart data for millions of users with unpredictable traffic spikes. They need single‑digit millisecond response times and minimal operational overhead. Which AWS database service is the best fit?

Amazon DynamoDB. **Amazon DynamoDB** is a fully managed, serverless NoSQL key‑value database that handles massive scale and spiky workloads with single‑digit millisecond latency. RDS is relational and instance‑based, Redshift is a data warehouse, and Athena is a query service for data in S3.

Q: Your company stores all application logs in Amazon S3 and wants to run occasional SQL queries on this data without managing any servers or loading the data into a database. Which service is the best fit?

Amazon Athena. **Amazon Athena** lets you run SQL queries directly on data stored in Amazon S3, with no servers to manage. Redshift is a data warehouse that requires loading data, Kinesis handles streaming data, and SageMaker is for building ML models.

Module 7 Overview: Why Databases, Analytics, and AI/ML Matter for CLF-C02

In this 10‑minute module, you’ll connect what you already know about compute, networking, and storage to three more CLF‑C02 pillars:

Databases – where and how applications store data
Analytics – how organizations query and analyze data at scale
AI/ML services – how AWS exposes machine learning and AI capabilities without you building models from scratch

By the end, you should be able to read an exam scenario and quickly recognize:

“This sounds like RDS vs DynamoDB.”
“This is clearly an Athena / Redshift / Kinesis situation.”
“This is using SageMaker / Rekognition / Comprehend style managed AI/ML.”

We’ll stay high‑level and conceptual (as expected for CLF‑C02) and focus on:

Picking relational vs NoSQL on AWS
Matching analytics services to use cases
Spotting AI/ML managed services in exam questions

> Context reminder: In Module 5 you saw IAM and security; in Module 6 you saw compute, networking, and storage. This module layers data and intelligence on top of that foundation.

Step 1: Relational vs NoSQL on AWS – The Big Picture

Before we name services, lock in the two main database styles the exam cares about:

Relational databases (SQL)

Think tables with rows and columns
Strong schemas (defined structure): each column has a type
Use SQL for queries: `SELECT * FROM Orders WHERE CustomerId = 123;`
Great when:
Data is highly structured
You need joins, transactions, and referential integrity
Classic business apps: ERP, CRM, e‑commerce backend

NoSQL / key‑value databases

Think flexible, schema‑less or semi‑structured data
Access by primary key (e.g., `UserId 12345`)
Designed for massive scale and high performance
Great when:
You need single‑digit millisecond or microsecond latency at scale
You store user profiles, session data, IoT data, etc.
Your access pattern is mostly: “Get this item by key”

On AWS, the CLF‑C02 blueprint expects you to recognize:

Amazon RDS as the relational choice
Amazon DynamoDB as the NoSQL key‑value choice

In the next steps, we’ll compare these two directly so you can choose the right one in scenarios.

Step 2: Amazon RDS vs Amazon DynamoDB – Side‑by‑Side

Use this mental table for exam questions:

| Feature / Scenario | Amazon RDS (Relational) | Amazon DynamoDB (NoSQL key‑value) |

|---------------------------------------------|-------------------------------------------------------------------------|------------------------------------------------------------------------|

| Data model | Tables, rows, columns, fixed schema | Tables of items, flexible attributes (key‑value / document style) |

| Typical query style | Complex SQL with joins, aggregates | Get/put item by primary key, simple queries/scan |

| Scaling | Vertical (bigger instance) + read replicas, some auto scaling options | Horizontal, fully managed, scales automatically |

| Use cases | Financial apps, order management, inventory, legacy apps | User profiles, shopping carts, gaming leaderboards, IoT, ad tech |

| Management | AWS manages DB engine, patching, backups | Fully serverless; no servers to manage |

| Availability / durability | Multi‑AZ options, automated backups | Built‑in high availability across multiple AZs |

| Pricing model (conceptual) | Pay for DB instance size + storage | Pay for read/write capacity + storage |

Quick scenario checks

“We need ACID transactions and complex joins across multiple tables.”

→ This screams relational → Amazon RDS.

“We need to store millions of user session objects with millisecond latency and unpredictable traffic spikes.”

→ This screams NoSQL key‑value → Amazon DynamoDB.

“The app already uses MySQL on‑prem and we’re lifting and shifting to AWS.”

→ Use Amazon RDS for MySQL.

“We want a fully serverless database for a mobile game leaderboard with huge spikes.”

→ Use Amazon DynamoDB.

> For CLF‑C02, you are not expected to configure tables or indexes. You just need to recognize which service fits which story.

Step 3: Choose RDS or DynamoDB (Thought Exercise)

Decide whether each scenario is a better fit for Amazon RDS or Amazon DynamoDB. Think it through before checking the suggested answer.

Hospital patient records system needs strict schemas, complex reporting, and transactions across multiple tables.

Your pick: `RDS` or `DynamoDB`?
Suggested answer: RDS – relational structure and strong consistency for critical data.

Mobile game storing per‑player inventory and scores, with millions of players logging in during a tournament.

Your pick: `RDS` or `DynamoDB`?
Suggested answer: DynamoDB – key‑value, massive scale, spiky workloads.

Legacy CRM currently running on Microsoft SQL Server on‑premises, being migrated to AWS with minimal code changes.

Your pick: `RDS` or `DynamoDB`?
Suggested answer: RDS (SQL Server engine) – easiest lift‑and‑shift.

IoT sensor data from millions of devices sending frequent small updates.

Your pick: `RDS` or `DynamoDB`?
Suggested answer: DynamoDB – high write throughput and scalable key‑value access.

As you practice, try to label the data model (relational vs key‑value) and the performance pattern (steady vs spiky, complex queries vs simple key lookups).

Step 4: Analytics Basics – Athena, Redshift, and Kinesis

Analytics services show up in CLF‑C02 scenario questions whenever you see “analyze large amounts of data” or “real‑time streaming”.

1. Amazon Athena – Query data in Amazon S3 with SQL

Serverless, interactive query service
You point Athena at data stored in Amazon S3 (e.g., logs, CSV, JSON, Parquet) and run SQL queries
You don’t manage servers or data warehouses
Great for ad‑hoc analysis and log analysis
You pay per query (based on data scanned)

> Think: “I already have data in S3; I just want to run SQL on it without loading it anywhere.”

2. Amazon Redshift – Data warehouse for large‑scale analytics

Fully managed data warehouse optimized for analytics on large datasets (GBs to PBs)
Stores data in columnar format for fast analytical queries
Integrates with S3 (e.g., Redshift Spectrum can query data in S3)
Great for BI dashboards, reporting, complex analytical queries across many tables

> Think: “We need a central analytics database for the whole company, with dashboards and complex joins over huge data volumes.”

3. Amazon Kinesis – Real‑time streaming data platform

Family of services for collecting, processing, and analyzing streaming data in real time
Common use: Kinesis Data Streams for ingesting clickstreams, IoT data, application logs
Other variants: Kinesis Data Firehose (load into S3/Redshift/OpenSearch), Kinesis Data Analytics (SQL on streams)

> Think: “Data is continuously arriving and we want to process it in seconds, not hours.”

In exam questions:

Athena → ad‑hoc queries on S3
Redshift → data warehouse / BI / complex analytics at scale
Kinesis → real‑time streaming ingestion and processing

Step 5: Matching Analytics Services to Scenarios

Visualize three different teams and which service fits them best.

Scenario A – Operations team analyzing S3 logs

The ops team stores application logs and access logs in S3.
They sometimes need to run SQL queries to find errors or performance issues.
They don’t want to set up servers.

Best fit: Amazon Athena

Why: Data already in S3, needs ad‑hoc SQL, serverless.

---

Scenario B – BI team building executive dashboards

The BI team pulls data from multiple systems: sales, marketing, support.
They want a central warehouse where data is cleaned and structured.
They run complex queries and connect dashboard tools (e.g., QuickSight, Tableau).

Best fit: Amazon Redshift

Why: Purpose‑built data warehouse for large‑scale analytics.

---

Scenario C – Streaming clickstream data from a website

A high‑traffic website generates click events continuously.
The company wants to detect anomalies and trends in near real time.
Later, they want to store the data in S3 or Redshift for deeper analysis.

Best fit: Amazon Kinesis (e.g., Kinesis Data Streams + Firehose)

Why: Purpose‑built for real‑time streaming ingestion and processing.

> On CLF‑C02, the exam usually gives enough hints: “real‑time streaming” → Kinesis, “query data in S3 with SQL” → Athena, “enterprise data warehouse” → Redshift.

Step 6: AI/ML at a High Level – SageMaker, Rekognition, Comprehend

AWS has many AI/ML services, but CLF‑C02 focuses on recognizing high‑level use cases, not on building models.

1. Amazon SageMaker – Build, train, and deploy ML models

End‑to‑end machine learning platform
Used by data scientists and ML engineers
Covers the full ML lifecycle: data preparation, training, tuning, deployment, MLOps
You choose algorithms, provide data, and manage model versions (with lots of automation/assistance)

> Think: “We have data and ML expertise; we want to build our own models on AWS.”

2. Amazon Rekognition – Image and video analysis

Pre‑trained computer vision service
Can detect objects, people, text, scenes, inappropriate content in images and videos
Use cases: content moderation, face detection, object detection, text in images

> Think: “We want to analyze images/videos with an API, no ML expertise required.”

3. Amazon Comprehend – Natural language processing (NLP)

Pre‑trained NLP service
Can do sentiment analysis, entity recognition (names, places, organizations), key phrase extraction, language detection, and topic modeling
Works on unstructured text: reviews, support tickets, social media posts

> Think: “We want to extract meaning from text (sentiment, topics, entities) via an API.”

For CLF‑C02, remember this pattern:

SageMaker → custom ML models, full ML pipeline
Rekognition → images/videos
Comprehend → text and sentiment

Step 7: Spot the Right AI/ML Service (Thought Exercise)

Match each idea with SageMaker, Rekognition, or Comprehend.

Analyze product reviews to determine if customers are happy or frustrated.

Your pick?
Suggested answer: Comprehend (sentiment analysis on text).

Automatically blur faces in user‑uploaded videos for privacy.

Your pick?
Suggested answer: Rekognition (face detection in images/videos).

Build a custom model to predict which customers are likely to churn, using your historical data.

Your pick?
Suggested answer: SageMaker (custom predictive model using your data).

Extract entities like company names and locations from thousands of legal documents.

Your pick?
Suggested answer: Comprehend (entity recognition in text).

Detect inappropriate or unsafe content in images uploaded to a social media app.

Your pick?
Suggested answer: Rekognition (content moderation for images).

Step 8: Quick Check – Databases

Test your understanding of RDS vs DynamoDB.

A startup needs to store shopping cart data for millions of users with unpredictable traffic spikes. They need single‑digit millisecond response times and minimal operational overhead. Which AWS database service is the best fit?

Amazon RDS
Amazon DynamoDB
Amazon Redshift
Amazon Athena

Show Answer

Answer: B) Amazon DynamoDB

**Amazon DynamoDB** is a fully managed, serverless NoSQL key‑value database that handles massive scale and spiky workloads with single‑digit millisecond latency. RDS is relational and instance‑based, Redshift is a data warehouse, and Athena is a query service for data in S3.

Step 9: Quick Check – Analytics and AI/ML

Test your ability to map scenarios to analytics and AI/ML services.

Your company stores all application logs in Amazon S3 and wants to run occasional SQL queries on this data without managing any servers or loading the data into a database. Which service is the best fit?

Amazon Redshift
Amazon Athena
Amazon Kinesis Data Streams
Amazon SageMaker

Show Answer

Answer: B) Amazon Athena

**Amazon Athena** lets you run SQL queries directly on data stored in Amazon S3, with no servers to manage. Redshift is a data warehouse that requires loading data, Kinesis handles streaming data, and SageMaker is for building ML models.

Step 10: Flashcard Review – Key Services at a Glance

Flip through these cards to reinforce the main services and when to use them.

Amazon RDS: Managed **relational database service** supporting engines like MySQL, PostgreSQL, MariaDB, Oracle, and SQL Server. Best for structured data, SQL queries, and transactional workloads.
Amazon DynamoDB: Fully managed, **serverless NoSQL key‑value and document database**. Designed for massive scale, low latency, and spiky workloads such as user profiles, sessions, and IoT data.
Amazon Athena: Serverless **interactive query service** that lets you use SQL to analyze data directly in **Amazon S3**. Ideal for ad‑hoc queries and log analysis without provisioning infrastructure.
Amazon Redshift: Fully managed **data warehouse** optimized for large‑scale analytical queries and BI workloads. Stores data in columnar format and integrates with S3.
Amazon Kinesis: Family of services for **real‑time streaming data** ingestion and processing (e.g., Kinesis Data Streams, Kinesis Data Firehose, Kinesis Data Analytics).
Amazon SageMaker: End‑to‑end **machine learning platform** to build, train, tune, and deploy custom ML models using your own data.
Amazon Rekognition: Pre‑trained **computer vision** service for analyzing images and videos: object and scene detection, face detection, text in images, and content moderation.
Amazon Comprehend: Pre‑trained **natural language processing (NLP)** service for sentiment analysis, entity recognition, key phrase extraction, and language detection on text.

Key Terms

Amazon RDS: Amazon Relational Database Service; a managed service for relational databases like MySQL, PostgreSQL, MariaDB, Oracle, and SQL Server.
Amazon Athena: A serverless interactive query service that lets you analyze data directly in Amazon S3 using standard SQL.
Amazon Kinesis: A family of AWS services for real‑time collection, processing, and analysis of streaming data such as logs, clickstreams, and IoT telemetry.
Data warehouse: A centralized repository optimized for analytical queries and reporting across large volumes of historical data from multiple sources.
NoSQL database: A broad class of databases that are non‑relational and often schema‑less or schema‑flexible, such as key‑value, document, or wide‑column stores. Designed for scalability and flexible data models.
Streaming data: Data that is continuously generated and delivered in small records, often from sources like clickstreams, IoT devices, or application logs.
Amazon DynamoDB: AWS's fully managed, serverless NoSQL key‑value and document database service designed for high throughput and low latency.
Amazon Redshift: AWS's fully managed, petabyte‑scale data warehouse service for running complex analytical queries and powering BI workloads.
Amazon SageMaker: A fully managed service that provides tools to build, train, and deploy machine learning models at scale.
Amazon Comprehend: A managed natural language processing service that uses machine learning to find insights and relationships in text, including sentiment and entities.
Amazon Rekognition: A managed computer vision service that adds image and video analysis to applications via API calls.
Relational database: A database that organizes data into tables with rows and columns and enforces a fixed schema. Typically queried with SQL and supports joins and transactions.