Chapter 17 of 20
Monitoring and Insights: Azure Monitor, Logs, Metrics, and Alerts
Track the health and performance of Azure resources using metrics, logs, and alerts so you can detect and respond to issues proactively.
Big Picture: Why Monitoring Matters in Azure
From Deployment to Operations
You have learned how to deploy Azure resources with the Portal, CLI, PowerShell, and IaC. Now you need to keep those resources healthy and performant once they are running.
What Azure Monitor Does
Azure Monitor is the central monitoring service that collects telemetry from Azure resources, subscriptions, and even some on-prem or other-cloud workloads, then lets you analyze, visualize, and act on it.
Exam-Relevant Goals
For AZ-900 you must recognize Azure Monitor, distinguish metrics vs logs, explain alerts and action groups, and understand what a Log Analytics workspace is used for.
On-Call Mindset
Imagine you are the junior cloud engineer on call. Azure Monitor is your toolbox for spotting issues early and responding before users feel the pain.
Azure Monitor Architecture: The Umbrella View
Three-Layer Mental Model
Think of Azure Monitor as three layers: data sources (what is observed), data stores (where telemetry is kept), and insights/actions (how you visualize and respond).
Data Sources
Data sources include Azure resources, the Azure platform (activity logs), your application code via Application Insights, and guest OS/custom sources via agents.
Data Stores
Metrics go into a metrics database optimized for numeric time-series. Logs go into Log Analytics workspaces, which support flexible schemas and Kusto Query Language.
Insights and Actions
On top of metrics and logs, you use dashboards, workbooks, insights views, and alerts to visualize health and trigger notifications or automation.
Metrics vs Logs: Core Concepts and Exam Traps
What Are Metrics?
Metrics are numeric measurements taken at regular intervals, like CPU percentage every minute. They are structured, lightweight, and great for near real-time charts and thresholds.
What Are Logs?
Logs are records of events or data entries, often semi-structured, with rich detail about what happened, who did it, and any error messages or parameters.
Examples of Each
Metrics: VM CPU %, disk IOPS, requests/sec. Logs: activity log events, firewall logs, application exceptions and traces stored in a Log Analytics workspace.
Exam Rule of Thumb
Metrics answer "how is it behaving over time?" Logs answer "what exactly happened?" Remember: metrics are not where you find detailed error messages.
Scenario Walkthrough: Choosing Metrics vs Logs
Student Portal Problem
Your Azure web app for a student portal has slow and failing logins at certain times. You must decide whether to use metrics or logs to investigate.
Start With Metrics
In Azure Monitor Metrics for the App Service, you plot Requests, HTTP 5xx, and Average Response Time over 7 days and see spikes around 9–10 AM.
Then Use Logs
In Azure Monitor Logs, you query application logs around the spikes and find repeated SQL timeout exceptions during login operations.
Choosing the Right Tool
Metrics reveal patterns and thresholds; logs reveal detailed causes. For exam scenarios, map "trend" to metrics and "detailed what/why" to logs.
Log Analytics Workspace: The Log Hub
What Is a Log Analytics Workspace?
A Log Analytics workspace is a logical container in a region where Azure Monitor stores log tables that you can query with Kusto Query Language (KQL).
Why Centralize Logs?
Multiple resources can send logs to one workspace, letting you correlate events across VMs, firewalls, gateways, and more in a single query surface.
Control and Retention
You use RBAC to control who can access workspace data and configure retention to balance investigation needs with storage cost.
Exam Associations
If a question mentions querying logs with KQL or correlating logs from multiple resources, the answer usually involves a Log Analytics workspace.
Alerts and Action Groups: From Detection to Response
Why Alerts?
Alerts let Azure Monitor move from passive dashboards to proactive notifications and automation when something goes wrong or crosses a threshold.
Alert Rule Components
An alert rule defines the signal (metric, log, activity), the condition (threshold or logic), the scope (resources), and the action group to execute.
What Is an Action Group?
An action group is a reusable set of actions: emails, SMS, push, voice, webhooks, Logic Apps, or ITSM connectors that run when an alert fires.
Key Distinction
Alert rules detect conditions; action groups perform responses. For exam questions, pair "detect" with alerts and "notify or automate" with action groups.
Hands-On Flow: Creating a Simple Metric Alert (Portal)
Alert Scenario
You want to be notified if a production VM's CPU stays above 80% for 10 minutes. This is a classic metric alert use case.
Define Scope and Condition
In Azure Monitor Alerts, create an alert rule, select the VM as the scope, choose the Percentage CPU metric, and set a condition like Greater than 80 for the last 10 minutes.
Attach an Action Group
Create or select an action group that sends you an email (and optionally triggers automation) when the alert condition is met.
Conceptual Exam Link
Remember: metric alerts watch numeric signals for thresholds; log alerts watch query results. Both use action groups to send notifications or run actions.
Thought Exercise: Matching Scenarios to Monitor Features
Apply what you have learned by mapping real-world needs to Azure Monitor features.
For each scenario, pause and decide which Azure Monitor feature(s) you would use. Then check the suggested answer.
- Scenario A: Your manager wants a dashboard showing CPU, memory, and disk performance for all production VMs in one view, updated every few minutes.
- What would you use?
- Answer: Use metrics for each VM and build a dashboard or workbook in Azure Monitor to visualize those metrics.
- Scenario B: Security asks, "Who deleted the 'Test-RG' resource group yesterday, and from which IP address?"
- What would you use?
- Answer: Use activity logs (and possibly logs in a Log Analytics workspace) to query control-plane operations. This is a logs scenario, not metrics.
- Scenario C: You want to automatically open a helpdesk ticket if your web app returns more than 100 HTTP 500 errors in 5 minutes.
- What would you use?
- Answer: Create a log alert on a KQL query that counts HTTP 500s, and attach an action group that calls a webhook or ITSM connector to create tickets.
- Scenario D: You need to analyze a specific user's failed login attempts over the last month.
- What would you use?
- Answer: Use logs (for example, Application Insights or sign-in logs sent to a Log Analytics workspace) and run a query filtered by that user ID.
Quiz 1: Metrics vs Logs Basics
Test your understanding of metrics and logs.
You need to identify exactly which error messages occurred on an Azure web app during a 10-minute outage yesterday. Which Azure Monitor data type is most appropriate?
- Metrics, because they show performance trends over time
- Logs, because they store detailed event and error information
- Azure Resource Manager templates, because they define the infrastructure
- Action groups, because they send notifications when issues occur
Show Answer
Answer: B) Logs, because they store detailed event and error information
Metrics show numeric trends (like CPU or response time) but not full error details. Logs store detailed event and error information and are the right choice for investigating what exactly happened. ARM templates define infrastructure, and action groups control how alerts notify or automate; neither stores error messages.
Quiz 2: Alerts and Action Groups
Check your understanding of Azure Monitor alerts.
You configure an alert so that when CPU usage on a VM exceeds 90% for 15 minutes, an email is sent to the on-call engineer. In Azure Monitor, what is the role of the action group in this setup?
- It defines the metric condition that triggers the alert.
- It stores the CPU metrics for the VM.
- It defines who gets notified and which actions run when the alert fires.
- It creates the VM and assigns the correct size.
Show Answer
Answer: C) It defines who gets notified and which actions run when the alert fires.
The alert rule defines the metric condition (CPU > 90% for 15 minutes). The action group defines the response: who is notified and which actions (such as email, SMS, or webhooks) are executed when the alert fires.
Key Term Flashcards: Azure Monitor Essentials
Flip through these cards to reinforce core monitoring terms.
- Azure Monitor
- Azure Monitor is the central service in Azure for collecting, analyzing, and acting on telemetry from Azure resources, Azure platform, and some on-premises or other-cloud resources.
- Metric (in Azure Monitor)
- A metric is a numeric measurement collected at regular intervals, optimized for near real-time monitoring and trend analysis (for example, CPU percentage, requests per second).
- Log (in Azure Monitor)
- A log is a record of an event or data entry, often semi-structured, stored in a Log Analytics workspace and queried with Kusto Query Language for detailed analysis and auditing.
- Log Analytics workspace
- A Log Analytics workspace is a logical container in Azure where Azure Monitor stores log data in tables that can be queried with Kusto Query Language.
- Alert rule
- An alert rule defines the signal, condition, scope, and associated action group that determine when Azure Monitor should treat a situation as a problem and trigger a response.
- Action group
- An action group is a reusable collection of notification and automation preferences (such as email, SMS, push, voice, webhooks, and Logic Apps) that run when an alert fires.
- Activity log
- The activity log records control-plane operations on Azure resources, such as create, update, delete, and role assignments, and is useful for auditing and troubleshooting management actions.
- Application Insights
- Application Insights is a feature of Azure Monitor that provides application performance monitoring, including request rates, response times, failures, and user behavior telemetry.
Putting It Together: Monitoring Patterns and Exam Tips
Pattern: Health Monitoring
Use metrics for quick health checks, attach metric alerts with action groups for critical thresholds, and visualize everything with dashboards or workbooks.
Pattern: Incident Troubleshooting
Check metrics for spikes, then dive into logs in a Log Analytics workspace or Application Insights, and review activity logs for recent configuration changes.
Pattern: Auditing and Compliance
Rely on activity logs and other logs stored in a Log Analytics workspace to answer "who did what, when" and to meet retention requirements.
Exam Mapping
Numeric thresholds → metrics; KQL and detailed events → logs; notifications and automation → alerts plus action groups. Keep this mapping in mind for AZ-900 questions.
Key Terms
- Log
- A log is a record of an event or data entry, often semi-structured, stored in a Log Analytics workspace and queried with Kusto Query Language for detailed analysis and auditing.
- Metric
- A metric is a numeric measurement collected at regular intervals, optimized for near real-time monitoring and trend analysis, such as CPU percentage or requests per second.
- Alert rule
- An alert rule defines the signal, condition, scope, and associated action group that determine when Azure Monitor should treat a situation as a problem and trigger a response.
- Action group
- An action group is a reusable collection of notification and automation preferences, such as email, SMS, push, voice, webhooks, and Logic Apps, that run when an alert fires.
- Activity log
- The activity log records control-plane operations on Azure resources, such as create, update, delete, and role assignments, and is useful for auditing and troubleshooting management actions.
- Azure Monitor
- Azure Monitor is the central service in Azure for collecting, analyzing, and acting on telemetry from Azure resources, Azure platform, and some on-premises or other-cloud resources.
- Application Insights
- Application Insights is a feature of Azure Monitor that provides application performance monitoring, including request rates, response times, failures, and user behavior telemetry.
- Log Analytics workspace
- A Log Analytics workspace is a logical container in Azure where Azure Monitor stores log data in tables that can be queried with Kusto Query Language.