SkarpSkarp

Chapter 17 of 20

Monitoring and Health: Azure Monitor, Logs, and Service Health

Turn raw metrics and logs into insight by learning how Azure Monitor and Service Health help you detect issues, set alerts, and understand platform incidents.

27 min readen

Big Picture: Why Monitoring Matters in Azure

From Deployment to Operations

You have learned how to create Azure resources with the portal, CLI, PowerShell, and ARM templates. Next, you must ensure those resources stay healthy and performant once they are running.

Four Key Services

For AZ-900 monitoring and health, focus on: Azure Monitor, metrics and logs (plus alerts), Azure Service Health, and Azure Advisor. These are conceptual but very testable.

Exam Angle

Expect scenario questions like: which service checks VM CPU, where to see Azure-wide incidents, or which tool recommends cost savings. You need to pick the right service for each job.

Azure Monitor Overview: The Central Monitoring Hub

What Is Azure Monitor?

Azure Monitor is the central service for collecting and analyzing telemetry from Azure resources, applications, and some external workloads. It turns raw data into useful insights.

The Monitoring Pipeline

Conceptually, Azure Monitor has four stages: Collect data from many sources, Store it as metrics or logs, Analyze and visualize it, and Respond through alerts and automated actions.

Exam Memory Hook

When a question is about monitoring Azure resources in general, think of Azure Monitor as the umbrella that covers metrics, logs, dashboards, and alerts.

Metrics vs Logs: Two Core Data Types

What Are Metrics?

Metrics are numeric values captured at regular intervals, like CPU percentage or request count per minute. They are lightweight, fast to query, and great for real-time charts and alerts.

What Are Logs?

Logs are detailed records with many fields, often including text. Examples are activity logs, resource logs, and application error logs. They are ideal for troubleshooting and auditing.

Exam Shortcut

Numeric performance over time? Think metrics. Detailed events, errors, and who-did-what? Think logs. This distinction appears often in AZ-900 questions.

Scenario Walkthrough: Using Metrics and Logs Together

Step 1: Start with Metrics

Users say your web app is slow. You open the Metrics blade and chart CPU, HTTP 5xx errors, and response time. You see spikes during peak hours: metrics show that a problem exists.

Step 2: Use Logs for Root Cause

Next, you open Logs or Application Insights and inspect exceptions and failed requests. You discover repeated timeouts when calling the database: logs explain why it is failing.

Step 3: Fix and Alert

You scale or optimize the database and create an alert rule. For example, if HTTP 5xx errors exceed a threshold for 5 minutes, Azure Monitor emails the on-call team.

Azure Monitor Alerts: From Data to Action

What Is an Alert Rule?

An alert rule tells Azure Monitor which signal to watch, what condition to check (for example, CPU > 80% for 10 minutes), and what actions to take when that condition is met.

Signals and Types

Alerts can be based on metrics (fast thresholds), logs (query results), or activity logs (management operations). Each type suits different monitoring needs.

Action Groups

Action groups define who gets notified and how: email, SMS, push, webhooks, Logic Apps, Functions, or ITSM. They are reusable across multiple alert rules.

Azure Service Health: When Azure Itself Has Issues

What Is Azure Service Health?

Azure Service Health shows how Azure platform incidents, planned maintenance, and advisories affect your subscriptions and resources. It focuses on Azure itself, not just your apps.

Types of Information

You see service issues (ongoing incidents), planned maintenance, and health advisories (for example, security or retirement notices) that may require your attention.

Service Health vs Azure Monitor

Azure Monitor tracks your resource telemetry. Azure Service Health reports Azure platform problems and maintenance. Use Service Health for regional outages and official incident updates.

Azure Advisor: Built-in Recommendations

What Is Azure Advisor?

Azure Advisor is a recommendation engine that analyzes your Azure resources and suggests improvements for reliability, security, performance, cost, and operational excellence.

Recommendation Categories

Advisor groups recommendations into five areas: Reliability, Security, Performance, Cost, and Operational Excellence. Each item shows impact and how to fix it.

Exam Clue

If the question mentions proactive suggestions or best-practice improvements across cost, performance, or security, think Azure Advisor, not Azure Monitor or Service Health.

Thought Exercise: Match the Service to the Scenario

Use this quick exercise to practice mapping real-world needs to the correct Azure service. Think through each scenario and decide whether it calls for Azure Monitor, metrics, logs, Azure Service Health, or Azure Advisor.

  1. Scenario A: Your finance team wants to cut monthly Azure costs. They ask which VMs are consistently underused and could be resized or shut down.
  • Which service do you start with and why?
  1. Scenario B: Users in one region report that they cannot access your app at all. You suspect a wider Azure problem.
  • Where do you look for confirmation and official updates?
  1. Scenario C: You want to be notified if your production database’s DTU or vCore usage exceeds 80% for more than 10 minutes.
  • Which data type (metric or log) is best suited, and which service sets up the alert?
  1. Scenario D: Your security team wants to know who deleted a critical VM and when.
  • Do you check metrics or logs, and which specific log type is most relevant?
  1. Scenario E: After a performance incident, you need detailed error messages and stack traces from your web API to debug an intermittent bug.
  • Are you primarily using metrics or logs, and through which Azure Monitor experience?

Take 2–3 minutes to answer these in your own words. Then compare mentally with the guidance in earlier steps:

  • Azure Advisor for cost and best-practice recommendations.
  • Azure Service Health for platform-wide incidents and maintenance.
  • Metrics for numeric thresholds and alerts.
  • Logs (including activity logs and application logs) for who-did-what and deep troubleshooting.

Quiz 1: Concepts of Metrics, Logs, and Alerts

Test your understanding of core monitoring concepts.

You want to be notified if a virtual machine's CPU usage stays above 85% for 15 minutes. Which combination is the BEST fit?

  1. Create a log alert in Azure Monitor based on activity logs, and send an email.
  2. Create a metric alert in Azure Monitor on the Percentage CPU metric, and attach an action group.
  3. Use Azure Service Health to create a service issues alert for high CPU.
  4. Use Azure Advisor to generate a performance recommendation when CPU is high.
Show Answer

Answer: B) Create a metric alert in Azure Monitor on the Percentage CPU metric, and attach an action group.

CPU usage over time is numeric performance data, which is a metric. The correct approach is to create a metric alert in Azure Monitor on the Percentage CPU metric and link an action group for notifications. Activity logs do not track CPU usage, Azure Service Health is for platform issues, and Azure Advisor gives recommendations but does not provide real-time threshold alerts.

Quiz 2: Azure Monitor vs Service Health vs Advisor

Check your ability to pick the right service for each scenario.

Your application is down in one region. You suspect an Azure-wide incident and want official information on which services and regions are affected and when it is expected to be resolved. Which Azure feature should you use FIRST?

  1. Azure Monitor metrics for your application resources
  2. Azure Service Health to view service issues and advisories
  3. Azure Advisor to check reliability recommendations
  4. Activity logs in Azure Monitor
Show Answer

Answer: B) Azure Service Health to view service issues and advisories

Azure Service Health is designed to show Azure platform incidents, planned maintenance, and health advisories that affect your subscriptions. It is the right place to see official details about regional or service-wide outages. Azure Monitor metrics and activity logs focus on your resources, and Azure Advisor is for recommendations, not real-time incident updates.

Key Term Flashcards: Monitoring and Health

Flip through these cards to reinforce the core concepts for AZ-900.

Azure Monitor
The central Azure service for collecting, analyzing, and acting on telemetry from Azure resources, applications, and some external workloads. It works with metrics, logs, alerts, and dashboards.
Metric (in Azure Monitor)
A numeric value collected at regular intervals (time-series data), such as CPU percentage or request count. Metrics are lightweight and ideal for near real-time monitoring and threshold alerts.
Log (in Azure Monitor)
A detailed record with multiple fields, often including text, such as activity logs, resource logs, and application logs. Logs are stored in Log Analytics and are used for troubleshooting, auditing, and deep analysis.
Alert rule
A configuration in Azure Monitor that defines which signal to watch (metric, log, or activity log), the condition that triggers an alert, and the actions to take when that condition is met.
Action group
A reusable set of notification and action preferences (email, SMS, push, webhooks, Logic Apps, Functions, ITSM) that Azure Monitor uses when an alert is fired.
Azure Service Health
An Azure experience that provides information about Azure platform service issues, planned maintenance, and health advisories that affect your subscriptions and resources.
Service issues (Service Health)
Ongoing Azure platform problems where a service is not working as expected in one or more regions. Service Health shows impact, status, and updates.
Planned maintenance (Service Health)
Upcoming Azure platform maintenance events that may affect your resources, allowing you to plan around potential downtime or performance impact.
Azure Advisor
A recommendation engine that analyzes your Azure resources and usage to suggest improvements in reliability (high availability), security, performance, cost, and operational excellence.
Activity log
A type of log that records management operations on Azure resources, such as who created, modified, or deleted a resource and when. Useful for auditing and investigating changes.

Exam-Focused Wrap-Up and Next Steps

Core Takeaways

Azure Monitor handles metrics, logs, and alerts for your resources. Metrics are numeric time-series; logs are detailed records. Alerts plus action groups turn data into notifications and automation.

Platform vs Recommendations

Azure Service Health reports Azure platform issues and maintenance. Azure Advisor gives proactive recommendations for reliability, security, performance, cost, and operational excellence.

Link to Your Study Path

These topics sit in Azure Management and Governance. Your next Skarp diagnostic and spaced review will reinforce metrics vs logs, alerts, Service Health, and Advisor scenarios.

Key Terms

Log
A detailed record with multiple fields, often including text, such as activity logs, resource logs, and application logs. Logs are stored in Log Analytics and are used for troubleshooting, auditing, and deep analysis.
Metric
A numeric value collected at regular intervals (time-series data), such as CPU percentage or request count. Metrics are lightweight and ideal for near real-time monitoring and threshold alerts.
Alert rule
A configuration in Azure Monitor that defines which signal to watch (metric, log, or activity log), the condition that triggers an alert, and the actions to take when that condition is met.
Action group
A reusable set of notification and action preferences (email, SMS, push, webhooks, Logic Apps, Functions, ITSM) that Azure Monitor uses when an alert is fired.
Activity log
A type of log that records management operations on Azure resources, such as who created, modified, or deleted a resource and when. Useful for auditing and investigating changes.
Azure Advisor
A recommendation engine that analyzes your Azure resources and usage to suggest improvements in reliability (high availability), security, performance, cost, and operational excellence.
Azure Monitor
The central Azure service for collecting, analyzing, and acting on telemetry from Azure resources, applications, and some external workloads. It works with metrics, logs, alerts, and dashboards.
Service issues
Ongoing Azure platform problems where a service is not working as expected in one or more regions. Service Health shows impact, status, and updates.
Health advisories
Important Azure notices, such as security advisories, required configuration changes, or feature retirements, that may affect how you use Azure services.
Planned maintenance
Upcoming Azure platform maintenance events that may affect your resources, allowing you to plan around potential downtime or performance impact.
Azure Service Health
An Azure experience that provides information about Azure platform service issues, planned maintenance, and health advisories that affect your subscriptions and resources.
Log Analytics workspace
A special Azure resource where Azure Monitor stores log data so that it can be queried and analyzed with Kusto Query Language (KQL).

Finished reading?

Test your understanding with a custom practice exam on this chapter.

Test yourself