Continuous observation of services and CIs using metrics, logs, and other data to detect conditions that may require attention.

Any detectable occurrence with significance for service or CI management, often classified as informational, warning, or exception.

A detectable occurrence that has significance for the management of a service or CI.

Continuous observation of services and CIs using metrics, logs, and other data.

What is Observability?

The ability to understand a system’s internal state from its external outputs, typically via metrics, logs, and traces.

Design, Architecture, and Configuration: Supporting Reliable Services — ITIL Foundation (Version 5) Exam Prep Bootcamp

Q: A team wants to know which business services will be affected if a particular database server is taken down for maintenance. Which ITIL 5 concept are they mainly using?

Dependency mapping and impact analysis. They are using dependency mapping and impact analysis: looking at CI relationships to see which services depend on the database and what the impact will be.

1. From Tickets to Design: Why This Module Matters

From Operations Back to Design

ITIL 5 says reliable services are created long before go-live. Design, architecture, configuration, and monitoring practices shape how stable and observable a service will be.

Where This Fits in the SMS

These practices are part of the Service Management System (SMS). They provide the structure and information that make incidents, problems, and changes easier to handle.

Four Big Ideas

We focus on: 1) service design considerations, 2) configuration management and CIs, 3) monitoring, events, and observability, and 4) dependency mapping and impact analysis.

Link to Previous Modules

Good design and solid configuration data reduce incidents, speed up troubleshooting, and make change and release decisions less risky.

2. Service Design in ITIL 5: What Are We Designing?

What Is Service Design?

In ITIL 5, service design is about how value will be delivered reliably. It covers much more than the user interface; it includes reliability, support, and change.

Utility and Warranty

Utility asks: does the service do what users need? Warranty asks: can it do this reliably, with enough capacity, availability, security, and continuity?

Non-functional Requirements

NFRs like response time, uptime, data retention, and recovery time must be captured and designed early, not guessed after go-live.

Designing for Operations

Good design decides in advance how the service will be monitored, what events and logs it will produce, and who supports which components.

Designing for Change and Resilience

Design also considers safe deployment, redundancy and failover, and how to roll back if a release causes issues.

3. Example: Designing a Student Portal for Reliability

Student Portal Scenario

Your university designs a new student portal. A simple design lists features, but an ITIL 5-aligned design also plans reliability and operations.

Warranty and NFRs

The team sets uptime targets, performance goals, and security requirements, such as MFA for payments and stricter uptime during registration weeks.

Monitoring and Events

They define metrics like logins per minute and error rates. High CPU on the database or repeated payment failures generate events and alerts.

Dependencies

The portal depends on web servers, app servers, a database, a payment gateway, and an identity provider. These dependencies must be documented.

Designing for Change and Recovery

They choose blue/green deployments and daily backups with tested restores, so changes and failures can be handled with minimal disruption.

4. Configuration Management and CIs: The Service Map

What Is a CI?

A Configuration Item (CI) is any component that must be managed to deliver a service: servers, apps, databases, APIs, contracts, even documentation.

Service Configuration Management

This ITIL 5 practice ensures accurate, reliable information about CIs and their relationships is available when needed, often via a CMDB or CMS.

Why CI Information Matters

Good CI data helps assess impact and risk of changes, speeds up incident and problem diagnosis, and supports compliance and audits.

Relationships Between CIs

Relationships such as "runs on", "depends on", or "provided by" show how components connect and which services rely on which infrastructure.

Right Level of Detail

Capture enough detail to understand dependencies and risk, but not so much that the CMDB becomes impossible to maintain or use.

5. Thought Exercise: Identify CIs and Relationships

Imagine a simple online bookstore service.

It includes:

A web front-end.
An application server.
A database.
A payment service from an external provider.
A shipping integration API.

Your task (mentally or on paper):

List at least 5 CIs in this environment.
For each CI, note one relationship to another CI using phrases like `depends on` or `runs on`.
Decide which relationship would be most useful during an incident where payments are failing.

Reflect:

Which CIs and relationships would you want to see first in the CMDB when the service desk receives many "payment failed" incidents?
How would this configuration information help the incident manager or change authority decide what to do next?

6. Monitoring, Events, and Observability: Seeing Inside the Service

Monitoring Basics

Monitoring means continuously observing services and CIs using metrics, logs, and traces, such as CPU usage, response time, and error rates.

What Is an Event?

An event is any detectable occurrence that matters for management. It can be informational, a warning, or an exception indicating something is wrong.

Event Management

Event management is the practice of detecting events, interpreting them, and deciding which actions or workflows should be triggered.

Observability

Observability is designing systems so their internal state can be understood from outputs like metrics, logs, and traces. It depends on good design and CI data.

Link to Other Practices

Incident, problem, and change practices all rely on monitoring and events to detect issues, find root causes, and verify the impact of changes.

7. Dependency Mapping and Impact Analysis

What Is Dependency Mapping?

Dependency mapping shows how CIs and services connect, often as diagrams from users to apps, APIs, databases, and infrastructure.

What Is Impact Analysis?

Impact analysis uses dependency information to assess the consequences of an event, incident, problem, or change on services and customers.

Database Failure Example

A failed database affects both the student portal and a reporting dashboard, but the portal has higher impact because students cannot register.

Use Across Practices

Change enablement, incident management, and continuity all rely on dependency maps and impact analysis to prioritize and decide actions.

8. Quiz: CIs and Impact

Test your understanding of configuration items and impact analysis.

A team wants to know which business services will be affected if a particular database server is taken down for maintenance. Which ITIL 5 concept are they mainly using?

Event categorization
Dependency mapping and impact analysis
Service level monitoring
Request fulfillment

Show Answer

Answer: B) Dependency mapping and impact analysis

They are using dependency mapping and impact analysis: looking at CI relationships to see which services depend on the database and what the impact will be.

9. Quiz: Monitoring and Events

Check your understanding of monitoring, events, and observability.

Which is the BEST example of observability-oriented design for a new payment microservice?

Documenting the user interface text in three languages
Configuring detailed metrics, logs, and traces that show transaction paths and errors
Ensuring the microservice uses the latest programming language version
Limiting access to the source code repository