Chapter 15 of 21
Monitoring and Health: Azure Monitor, Logs, and Service Health
See how Azure keeps a pulse on your resources with metrics, logs, alerts, and service health notifications that help you respond before issues grow.
Big Picture: Why Monitoring Matters in Azure
Control vs. Monitoring
Identity, governance, and security tools help you control your Azure environment. Monitoring is about keeping a continuous pulse so you can see what is happening and react quickly.
Azure Monitor as a Hub
Azure Monitor is the central hub for collecting, analyzing, and acting on data about your Azure resources, applications, and even some on-premises or other cloud resources.
Why Monitoring Matters
Monitoring supports reliability and operations by detecting problems early, helping you troubleshoot, and driving alerts or automation when issues appear.
Key Components in This Module
You will learn about Azure Monitor, metrics, logs, alerts, Log Analytics, Application Insights, and Azure Service Health, and how to choose between them in exam scenarios.
Azure Monitor Overview: What It Covers
Unified Monitoring Platform
Azure Monitor is a platform service that unifies monitoring across your Azure and hybrid resources, rather than each service having its own isolated monitoring solution.
Core Capabilities
At fundamentals level: Azure Monitor collects metrics and logs, lets you visualize data, create alerts, and integrate with tools like email or automation to help you respond.
Data Sources
Data comes from Azure resources, platform logs like Activity Log, application telemetry via Application Insights, and guest OS data from VMs when you use the Azure Monitor agent.
Key Experiences
You should recognize Metrics explorer, Logs (Log Analytics), Alerts, and dashboards/workbooks as the main Azure Monitor experiences you might use or see in exam questions.
Metrics vs Logs: The Two Core Data Types
What Are Metrics?
Metrics are numeric, time-series values like CPU percentage or requests per second. They are optimized for fast queries, charts, and real-time alerting.
What Are Logs?
Logs are detailed records of events or data points, often text or structured data, stored in a Log Analytics workspace for deep querying and analysis.
Different Questions They Answer
Metrics answer “How is the system behaving over time?” while logs answer “Exactly what happened, when, and in what order?”
Alerts from Metrics vs Logs
Metric alerts trigger on threshold breaches in metrics; log alerts trigger when a log query finds patterns like many failures or specific error codes.
Scenario Walkthrough: Picking Metrics or Logs
Performance Issue Scenario
Web app is slow and you want CPU and response time trends. This is a metrics problem: chart CPU and request duration, and maybe set a metric alert.
Security Investigation Scenario
You suspect a brute-force attack and need each failed login with IP and timestamp. This is a logs problem: query security or sign-in logs in Log Analytics.
Audit Trail Scenario
You must know who deleted which resources and when. Use logs, especially the Azure Activity Log, often forwarded into a Log Analytics workspace.
Autoscale Scenario
You want VMs to scale out when CPU is high. Autoscale relies on metrics, so you use metric-based rules such as CPU > 70% for 10 minutes.
Log Analytics and Application Insights (High Level)
What Is Log Analytics?
Log Analytics is the log data platform in Azure Monitor. Logs from many sources are stored in a Log Analytics workspace and queried with KQL in the Logs blade.
What Is Application Insights?
Application Insights is an application performance monitoring (APM) feature of Azure Monitor that tracks requests, dependencies, exceptions, and user behavior.
How They Relate
Application Insights data is stored on a Log Analytics-based backend, so you can query app telemetry using the same KQL tools as other logs.
When to Use Which
Think Log Analytics for infrastructure/platform/security logs across resources; think Application Insights for end-to-end monitoring of a specific application.
Alerts in Azure Monitor: How You Get Notified
Why Alerts Matter
Alerts turn monitoring data into action. Instead of manually watching dashboards, you define rules that notify you or trigger automation when conditions are met.
Core Alert Components
Every alert has a signal (metrics, logs, activity log), an alert rule that defines the condition, and an action group that defines what to do when it fires.
Alert Types
Recognize metric alerts (numeric thresholds), log alerts (log queries), and activity log alerts (specific operations like resource deletions).
Example Scenario
A metric alert on VM CPU > 80% for 10 minutes sends email and triggers autoscale; a log alert could watch for many failed sign-ins in a short period.
Azure Service Health: When Azure Has Issues
What Is Azure Service Health?
Azure Service Health keeps you informed about Azure service issues, planned maintenance, and advisories that might impact your subscriptions and regions.
Service Issues and Maintenance
Service Health shows active incidents and planned maintenance for Azure services you use, along with impact and updates from Microsoft.
Health Advisories
Advisories communicate important changes or recommendations, such as deprecations or configuration guidance affecting your workloads.
Service Health vs Azure Monitor
Azure Monitor watches your resources’ performance and logs; Azure Service Health tells you when the Azure platform itself is having issues or maintenance.
Thought Exercise: Choosing the Right Monitoring Tool
Scenario A: Platform Issue
Your app cannot access storage in one region. You suspect a platform incident. Which tool confirms if Azure Storage in that region is having an issue?
Scenario B: API Failures
Users see HTTP 500 errors. You want failed requests, stack traces, and dependency calls. Which Azure Monitor capability focuses on end-to-end app behavior?
Scenario C: High CPU Alert
You want an email when a VM’s CPU is above 85% for 15 minutes. Which data type and feature do you use to implement this?
Scenario D: Deletion Audit
You need a monthly report of who deleted which resources. Which log source and analysis tool will you rely on?
Quick Check: Metrics, Logs, and Service Health
Test your understanding of the core distinctions before we move on.
You manage an Azure web app. Users report slow performance at peak times, and you want a simple chart of average response time and CPU usage over the last 24 hours, plus a threshold-based alert. Which Azure capability is the BEST primary choice?
- Azure Service Health, because it shows platform incidents and response times
- Azure Monitor metrics explorer with a metric alert on CPU and request duration
- Application Insights with a log-based alert on HTTP 500 errors only
- Azure Activity Log with an alert on administrative operations
Show Answer
Answer: B) Azure Monitor metrics explorer with a metric alert on CPU and request duration
You want numeric performance data over time (CPU and response time) and a threshold-based alert. That is a classic metrics use case. Azure Monitor metrics explorer lets you chart these metrics, and a metric alert can trigger when thresholds are crossed. Service Health is for platform issues, Activity Log is for management operations, and focusing only on HTTP 500 errors would miss general slowness.
Quick Check: Log Analytics and Application Insights
Another short quiz to reinforce the high-level roles of Log Analytics and Application Insights.
Your team wants to analyze detailed security events and VM system logs across multiple subscriptions using a query language, and then create alerts when suspicious patterns occur. Which Azure Monitor feature is MOST appropriate as the central place to store and query this data?
- Azure Service Health
- Application Insights
- Log Analytics workspace (Azure Monitor Logs)
- Azure Activity Log in the Azure portal without any workspace
Show Answer
Answer: C) Log Analytics workspace (Azure Monitor Logs)
A Log Analytics workspace (Azure Monitor Logs) is the central store for log data from many sources, including security and VM logs, and supports Kusto Query Language (KQL) plus log-based alerts. Application Insights focuses on application telemetry, Service Health is about Azure platform status, and viewing the Activity Log alone does not provide a unified, queryable log store across all your security and VM logs.
Key Term Flashcards: Monitoring and Health
Flip through these cards to reinforce the most important terms for AZ-900.
- Azure Monitor
- A centralized Azure platform service that collects, analyzes, and acts on telemetry (metrics and logs) from Azure and hybrid resources, supporting visualization, alerting, and integration with other tools.
- Metric (Azure Monitor)
- A numeric, time-series value (such as CPU percentage or requests per second) optimized for fast querying and charting, commonly used for real-time performance monitoring and threshold-based alerts.
- Log (Azure Monitor)
- A detailed record of events or data points, often text or structured data, stored in a Log Analytics workspace and queried using Kusto Query Language (KQL) for troubleshooting, auditing, and analysis.
- Log Analytics workspace
- A special Azure resource used by Azure Monitor to store log data from many sources, enabling centralized querying and analysis with Kusto Query Language (KQL).
- Application Insights
- An application performance monitoring (APM) feature of Azure Monitor that collects application-level telemetry such as requests, dependencies, exceptions, and user behavior for end-to-end app monitoring.
- Azure Service Health
- An Azure experience that informs you about Azure service issues, planned maintenance, and health advisories that specifically impact your subscriptions and regions.
- Metric alert
- An Azure Monitor alert type that triggers when a metric (such as CPU or response time) crosses a defined threshold over a specified period.
- Log alert
- An Azure Monitor alert type based on a log query that triggers when the query results meet a defined condition, such as many failed sign-ins in a short time.
- Activity log alert
- An Azure Monitor alert type that fires when specific operations appear in the Azure Activity Log, such as when a resource is deleted or a policy is updated.
- Service Health alert
- A notification configured in Azure Service Health that informs you when new service issues, planned maintenance, or health advisories affect your Azure services or regions.
How Monitoring Supports Reliability and Exam Scenarios
Monitoring and Reliability
Metrics and logs help you detect failures quickly, support autoscaling and capacity planning, and provide evidence for uptime and performance targets.
Monitoring and Security
Security tools send data into Log Analytics; you use logs and alerts to spot threats. Azure monitors the platform; you monitor your resources and apps.
Common Scenario Patterns
Platform issues → Service Health; performance thresholds/autoscale → metrics; deep investigation → logs/Log Analytics; audit of operations → Activity Log.
Link to Your Study Path
Your next diagnostic and mock exam will probe these choices. Any gaps will show up in your gap guide and spaced review for targeted practice.
Key Terms
- Log
- A detailed record of events or data points, often text or structured data, stored in a Log Analytics workspace and queried using Kusto Query Language (KQL) for troubleshooting, auditing, and analysis.
- Metric
- A numeric, time-series value measured over time, such as CPU percentage or requests per second, optimized for fast querying and charting and commonly used for real-time performance monitoring and threshold-based alerts.
- Log alert
- An Azure Monitor alert type based on a log query that triggers when the query results meet a defined condition.
- Metric alert
- An Azure Monitor alert type that triggers when a metric crosses a defined threshold over a specified period.
- Azure Monitor
- A centralized Azure platform service that collects, analyzes, and acts on telemetry (metrics and logs) from Azure and hybrid resources, supporting visualization, alerting, and integration with other tools.
- Activity log alert
- An Azure Monitor alert type that fires when specific operations appear in the Azure Activity Log, such as when a resource is deleted or a policy is updated.
- Application Insights
- An application performance monitoring (APM) feature of Azure Monitor that collects application-level telemetry such as requests, dependencies, exceptions, and user behavior for end-to-end app monitoring.
- Azure Service Health
- An Azure experience that informs you about Azure service issues, planned maintenance, and health advisories that specifically impact your subscriptions and regions.
- Service Health alert
- A notification configured in Azure Service Health that informs you when new service issues, planned maintenance, or health advisories affect your Azure services or regions.
- Log Analytics workspace
- A special Azure resource used by Azure Monitor to store log data from many sources, enabling centralized querying and analysis with Kusto Query Language (KQL).