Chapter 10 of 13
Design, Architecture, and Configuration: Supporting Reliable Services
Stable services start long before go‑live. This chapter shows how ITIL 5’s design and configuration-related practices set the stage for resilience, observability, and smooth operations.
1. From Tickets to Design: Why This Module Matters
From Operations Back to Design
ITIL 5 says reliable services are created long before go-live. Design, architecture, configuration, and monitoring practices shape how stable and observable a service will be.
Where This Fits in the SMS
These practices are part of the Service Management System (SMS). They provide the structure and information that make incidents, problems, and changes easier to handle.
Four Big Ideas
We focus on: 1) service design considerations, 2) configuration management and CIs, 3) monitoring, events, and observability, and 4) dependency mapping and impact analysis.
Link to Previous Modules
Good design and solid configuration data reduce incidents, speed up troubleshooting, and make change and release decisions less risky.
2. Service Design in ITIL 5: What Are We Designing?
What Is Service Design?
In ITIL 5, service design is about how value will be delivered reliably. It covers much more than the user interface; it includes reliability, support, and change.
Utility and Warranty
Utility asks: does the service do what users need? Warranty asks: can it do this reliably, with enough capacity, availability, security, and continuity?
Non-functional Requirements
NFRs like response time, uptime, data retention, and recovery time must be captured and designed early, not guessed after go-live.
Designing for Operations
Good design decides in advance how the service will be monitored, what events and logs it will produce, and who supports which components.
Designing for Change and Resilience
Design also considers safe deployment, redundancy and failover, and how to roll back if a release causes issues.
3. Example: Designing a Student Portal for Reliability
Student Portal Scenario
Your university designs a new student portal. A simple design lists features, but an ITIL 5-aligned design also plans reliability and operations.
Warranty and NFRs
The team sets uptime targets, performance goals, and security requirements, such as MFA for payments and stricter uptime during registration weeks.
Monitoring and Events
They define metrics like logins per minute and error rates. High CPU on the database or repeated payment failures generate events and alerts.
Dependencies
The portal depends on web servers, app servers, a database, a payment gateway, and an identity provider. These dependencies must be documented.
Designing for Change and Recovery
They choose blue/green deployments and daily backups with tested restores, so changes and failures can be handled with minimal disruption.
4. Configuration Management and CIs: The Service Map
What Is a CI?
A Configuration Item (CI) is any component that must be managed to deliver a service: servers, apps, databases, APIs, contracts, even documentation.
Service Configuration Management
This ITIL 5 practice ensures accurate, reliable information about CIs and their relationships is available when needed, often via a CMDB or CMS.
Why CI Information Matters
Good CI data helps assess impact and risk of changes, speeds up incident and problem diagnosis, and supports compliance and audits.
Relationships Between CIs
Relationships such as "runs on", "depends on", or "provided by" show how components connect and which services rely on which infrastructure.
Right Level of Detail
Capture enough detail to understand dependencies and risk, but not so much that the CMDB becomes impossible to maintain or use.
5. Thought Exercise: Identify CIs and Relationships
Imagine a simple online bookstore service.
It includes:
- A web front-end.
- An application server.
- A database.
- A payment service from an external provider.
- A shipping integration API.
Your task (mentally or on paper):
- List at least 5 CIs in this environment.
- For each CI, note one relationship to another CI using phrases like `depends on` or `runs on`.
- Decide which relationship would be most useful during an incident where payments are failing.
Reflect:
- Which CIs and relationships would you want to see first in the CMDB when the service desk receives many "payment failed" incidents?
- How would this configuration information help the incident manager or change authority decide what to do next?
6. Monitoring, Events, and Observability: Seeing Inside the Service
Monitoring Basics
Monitoring means continuously observing services and CIs using metrics, logs, and traces, such as CPU usage, response time, and error rates.
What Is an Event?
An event is any detectable occurrence that matters for management. It can be informational, a warning, or an exception indicating something is wrong.
Event Management
Event management is the practice of detecting events, interpreting them, and deciding which actions or workflows should be triggered.
Observability
Observability is designing systems so their internal state can be understood from outputs like metrics, logs, and traces. It depends on good design and CI data.
Link to Other Practices
Incident, problem, and change practices all rely on monitoring and events to detect issues, find root causes, and verify the impact of changes.
7. Dependency Mapping and Impact Analysis
What Is Dependency Mapping?
Dependency mapping shows how CIs and services connect, often as diagrams from users to apps, APIs, databases, and infrastructure.
What Is Impact Analysis?
Impact analysis uses dependency information to assess the consequences of an event, incident, problem, or change on services and customers.
Database Failure Example
A failed database affects both the student portal and a reporting dashboard, but the portal has higher impact because students cannot register.
Use Across Practices
Change enablement, incident management, and continuity all rely on dependency maps and impact analysis to prioritize and decide actions.
8. Quiz: CIs and Impact
Test your understanding of configuration items and impact analysis.
A team wants to know which business services will be affected if a particular database server is taken down for maintenance. Which ITIL 5 concept are they mainly using?
- Event categorization
- Dependency mapping and impact analysis
- Service level monitoring
- Request fulfillment
Show Answer
Answer: B) Dependency mapping and impact analysis
They are using dependency mapping and impact analysis: looking at CI relationships to see which services depend on the database and what the impact will be.
9. Quiz: Monitoring and Events
Check your understanding of monitoring, events, and observability.
Which is the BEST example of observability-oriented design for a new payment microservice?
- Documenting the user interface text in three languages
- Configuring detailed metrics, logs, and traces that show transaction paths and errors
- Ensuring the microservice uses the latest programming language version
- Limiting access to the source code repository
Show Answer
Answer: B) Configuring detailed metrics, logs, and traces that show transaction paths and errors
Observability is about being able to understand internal behavior from external outputs. Detailed metrics, logs, and traces directly support this.
10. Flashcards: Key Terms Review
Review the core terms from this module. Try to define each term before flipping the card.
- Service design (in ITIL 5)
- A set of activities focused on designing services and their supporting practices, processes, and components so they deliver value reliably, including utility, warranty, monitoring, support, and change-readiness.
- Configuration Item (CI)
- Any component that needs to be managed to deliver a service, such as hardware, software, documentation, SLAs, or supplier contracts.
- Service configuration management
- The practice of ensuring that accurate and reliable information about CIs and their relationships is available when and where it is needed.
- Monitoring
- Continuous observation of services and CIs using metrics, logs, and other data to detect conditions that may require attention.
- Event
- Any detectable occurrence with significance for service or CI management, often classified as informational, warning, or exception.
- Event management
- The practice of detecting events, making sense of them, and determining the appropriate control actions or workflows.
- Observability
- The degree to which the internal state of a system can be understood from its external outputs (metrics, logs, traces), usually achieved by intentional design.
- Dependency mapping
- Identifying and documenting how CIs relate to each other and to services, often visualized as a map or diagram.
- Impact analysis
- Assessing the potential or actual consequences of an event, incident, problem, or change on services, users, and the organization.
Key Terms
- Event
- A detectable occurrence that has significance for the management of a service or CI.
- Monitoring
- Continuous observation of services and CIs using metrics, logs, and other data.
- Observability
- The ability to understand a system’s internal state from its external outputs, typically via metrics, logs, and traces.
- Service design
- Activities focused on designing services and their supporting elements so they deliver value reliably, including utility, warranty, monitoring, and support.
- Impact analysis
- The assessment of the potential or actual effects of an event, incident, problem, or change on services and stakeholders.
- Event management
- The practice of detecting events, interpreting them, and deciding what response, if any, is required.
- Dependency mapping
- The process of identifying and documenting how CIs and services depend on and relate to each other.
- Configuration Item (CI)
- Any component that must be managed to deliver a service, such as hardware, software, documentation, SLAs, or supplier contracts.
- Service configuration management
- ITIL practice that ensures accurate, reliable information about CIs and their relationships is available when needed.
- CMDB (Configuration Management Database)
- A database used to store configuration records throughout their lifecycle and the relationships between them.