Key Performance Indicator (KPI) in a DORA context

A quantitative measure of how effectively ICT operations support digital resilience objectives (e.g., service availability, MTTR), focused on performance and outcomes rather than just compliance.

Key Risk Indicator (KRI) in a DORA context

A forward‑looking metric that signals changes in exposure to ICT and operational risk (e.g., number of critical vulnerabilities > 30 days, cloud concentration index), helping anticipate incidents before they materialize.

Testing coverage metric

A measure of how much of your critical/important functions, systems, and third‑party dependencies are exercised through resilience tests (e.g., % of critical services tested against severe‑but‑plausible scenarios in the last 12 months).

What is Risk appetite?

The amount and type of risk that an organization is willing to accept in pursuit of its objectives, typically approved by the board and expressed via limits and thresholds for key metrics.

What is ICT‑related incident?

An unplanned event in ICT systems or services that disrupts the availability, authenticity, integrity, or confidentiality of data or services.

What is KRI (Key Risk Indicator)?

A metric that provides an early signal of increasing risk exposure in a specific area, helping management take preventive action.

What is MTTD (Mean Time to Detect)?

Average elapsed time between the occurrence of an incident and its detection by monitoring or other means.

Module 14: Metrics, Reporting, and Continuous Improvement in DORA Programs — Mastering DORA: Digital Operational Resilience Act Expert Track

Step 1 – Why Metrics Matter in DORA (and What ‘Good’ Looks Like)

Under the Digital Operational Resilience Act (DORA), in force since January 2023 and applicable since 17 January 2025 (about a year ago relative to today), financial entities must prove that their digital operational resilience framework is effective and improving over time.

Metrics and reporting are the backbone of that proof. For an advanced DORA program, you need three layers of measurement:

Compliance metrics

Are we meeting DORA’s explicit requirements?

% of ICT incidents reported to the competent authority within required timelines
% of critical ICT third‑party contracts including DORA‑aligned clauses
Coverage of mandatory policies (e.g., ICT risk management, incident management, testing)

Operational resilience KPIs (Key Performance Indicators)

How well is our ICT estate performing in supporting business services?

Service availability / uptime for critical business services
Mean Time to Detect (MTTD) and Mean Time to Recover (MTTR) for incidents
% of successful execution of critical processes during major incidents

Risk‑oriented KRIs (Key Risk Indicators)

How much digital operational risk are we running? Is it trending up or down?

Number of critical vulnerabilities older than X days
Dependency concentration on a single cloud provider
Volume and severity of control test failures over time

Goal of this module: by the end, you should be able to:

Design a metrics and reporting framework that links DORA requirements to resilience outcomes.
Communicate these metrics upwards (board, C‑suite) and outwards (regulators, supervisors).
Embed continuous improvement loops so that each incident, test, and audit measurably strengthens your DORA operating model.

Keep in mind: the European Supervisory Authorities (ESAs) continue to publish Regulatory Technical Standards (RTS) and Guidelines that refine expectations on incident reporting, testing, and third‑party risk. Your metrics framework must be adaptable to these updates.

Step 2 – Building a DORA Metrics Hierarchy (from RTS to Dashboard)

To avoid a chaotic list of metrics, construct a top‑down hierarchy that starts from DORA obligations and ends with concrete indicators.

1. Map DORA Articles to Measurement Domains

Typical domains (non‑exhaustive):

ICT Risk Management (Art. 5–15)
ICT‑related Incident Management & Classification (Art. 17–23)
Digital Operational Resilience Testing (Art. 24–27)
ICT Third‑Party Risk Management (Art. 28–44)
Information Sharing (Art. 45)

2. For Each Domain, Define 3–5 Strategic Questions

Example for ICT incident management:

How quickly do we detect and contain critical incidents?
Are we classifying and reporting incidents correctly under the latest RTS?
Are the root causes of major incidents being eliminated over time?

3. Translate Questions into KPIs and KRIs

Use a simple template:

Name: e.g., Critical Incident Mean Time to Recover (MTTR)
Type: KPI (performance) or KRI (risk)
Unit: minutes / hours / % / count
Scope: critical business services, entire ICT landscape, specific provider
Data source: ITSM tool, SIEM, GRC platform, CMDB, etc.
Frequency: real‑time, daily, monthly, quarterly
Thresholds / Limits: target, warning, breach
Owner: responsible role (e.g., Head of IT Ops, CISO)

4. Align with Business Services and Impact Tolerances

DORA implicitly pushes you to think in terms of critical or important functions. Metrics should be:

Service‑centric: e.g., availability of payments processing rather than generic server uptime.
Impact‑oriented: linked to customer harm, market disruption, or safety of client assets.
Proportional: more granular for systemically important entities; simpler for small firms.

This hierarchy forms the blueprint for the rest of the module: we’ll now fill it with concrete measures for incidents, testing, third‑party risk, and remediation.

Step 3 – Concrete KPIs & KRIs for Digital Resilience

Below is a sample metric catalogue for a mid‑sized bank. Focus on structure, not memorization.

A. ICT Incident Management Metrics

Performance KPIs

MTTD (Mean Time to Detect) – Critical Incidents
Definition: Average time between incident occurrence and detection for DORA‑reportable incidents.
Target: < 15 minutes.
Why it matters: Long MTTD implies weak monitoring or alert fatigue.

MTTR (Mean Time to Recover) – Critical Services
Definition: Average time from incident start to full restoration of critical services.
Target: Aligned with business impact tolerances (e.g., 2 hours for retail payments).

% of Incidents Classified According to RTS Taxonomy
Definition: Incidents tagged with the correct category, cause, and impact per latest ESA incident classification standards.
Target: 100% for DORA‑reportable incidents.

Risk KRIs

Number of DORA‑reportable Incidents per Quarter
Trend analysis: Is the number rising (worsening risk) or rising because of better detection?
Must be interpreted with context (e.g., major technology transformation).

% of Major Incidents with Repeat Root Cause within 12 Months
High value indicates poor effectiveness of corrective actions.

B. ICT Testing & Control Effectiveness Metrics

Performance KPIs

Annual Coverage of Critical Services by Scenario‑based Testing
Definition: Proportion of critical/important functions tested against severe but plausible scenarios.
Target: 100% over a defined cycle (e.g., 1–3 years, depending on RTS).

Penetration Test Defect Closure Rate
Definition: % of high/critical findings closed within agreed SLA (e.g., 30 days).
Target: > 95%.

Risk KRIs

Number of Open Critical Vulnerabilities > X Days (e.g., 30)
Indicates exposure window to known high‑risk vulnerabilities.

Control Failure Rate in Key ICT Processes
E.g., % of change requests implemented without formal approval.

C. Third‑Party / Cloud Risk Metrics

Performance KPIs

% of Critical ICT Third‑Party Contracts with DORA‑Compliant Clauses
Scope: Data location, access, audit & inspection rights, exit strategies, incident notification SLAs, etc.
Target: 100% for newly signed and renewed contracts; migration plan for legacy contracts.

Average Time for Third‑Party Incident Notification
Definition: Time between incident occurrence at provider and notification to your entity.
Compare against contractual SLAs and DORA’s reporting timelines.

Risk KRIs

Concentration Risk Index for Cloud Providers
Simple proxy: % of critical workloads hosted by a single provider.
More advanced: Herfindahl‑Hirschman Index (HHI) across providers.

Number of Critical Findings from Third‑Party Audits / On‑Site Inspections
Trend and severity over time.

When designing your own catalogue, force yourself to answer: What decision will this metric inform? If you cannot answer convincingly, the metric is probably noise.

Step 4 – Design a Metric from Scratch (Thought Exercise)

You are the DORA program lead for a payments institution heavily reliant on one cloud provider. The board is worried about concentration risk and recovery from a cloud outage.

Task: Draft one KPI and one KRI that you would present to the board. Use the template below.

KPI – Performance of Recovery from Cloud Incidents

Name:
Definition (what exactly is measured?):
Unit (e.g., minutes, %, count):
Data Source (e.g., incident management tool, cloud monitoring):
Frequency (e.g., monthly, quarterly):
Target / Threshold:
Owner:

KRI – Cloud Concentration Risk

Name:
Definition:
Unit:
Data Source:
Frequency:
Limit / Risk Appetite (what level is too high?):
Escalation Rule (what happens if breached?):

Reflect:

Would these metrics enable the board to challenge management?
Could they be used by supervisors to assess whether your cloud exit strategy and resilience testing are realistic?
Are you mixing up performance (how well you respond) and risk (how exposed you are)?

Write down your answers and refine them until each metric is:

Specific (no ambiguity in definition)
Actionable (breaches trigger clear responses)
Comparable over time and across business units.

Step 5 – Metrics for ICT Incidents, Testing, and Remediation

DORA and its RTS on incident reporting and testing expect traceability from:

> incident or control failure → classification → root cause → remediation → re‑testing → closure

Your metrics should therefore cover the full lifecycle.

1. Incident Lifecycle Metrics

Detection Quality
% of critical incidents detected by automated monitoring vs. user reports
False positive rate of monitoring alerts (too many → alert fatigue; too few → blind spots).

Response & Communication
% of major incidents with communication to affected customers within defined timeframe
Adherence to internal escalation timelines (e.g., N minutes to notify CISO, COO).

Regulatory Reporting
% of DORA‑reportable incidents submitted within RTS deadlines (initial, intermediate, final reports).
Number of regulator follow‑up queries per incident (proxy for report quality).

2. Testing & Scenario Coverage Metrics

Link your testing program (e.g., threat‑led penetration testing, scenario testing, failover exercises) to metrics:

Coverage
% of critical business services covered by at least one severe‑but‑plausible scenario test in the last X years
% of critical third‑party dependencies included in major resilience exercises.

Effectiveness
% of tests where recovery objectives (RTO/RPO, maximum tolerable downtime) were met
Number of critical findings per test and average time to remediate.

3. Remediation & Control Improvement Metrics

DORA supervisors will be skeptical if the same issues reappear. Track:

Remediation Timeliness
% of high/critical issues closed within SLA (e.g., 30/60/90 days).
Average age of open high‑risk findings.

Sustainability of Fixes
% of issues that reoccur within 12–24 months (indicates superficial fixes).
Number of incidents linked to previously identified but unremediated vulnerabilities.

Control Maturity
Use a simple 1–5 maturity scale (Initial → Optimized) for key control domains (e.g., backup & restore, change management) and track year‑on‑year improvement.

Together, these metrics operationalize the idea that every incident and test is a learning opportunity, not just a compliance burden.

Step 6 – Quick Check: Are These Good DORA Metrics?

Evaluate a proposed metric for a DORA program.

Which of the following is the **best‑designed** metric for DORA incident management?

"Reduce incidents"
"Number of ICT incidents per year"
"% of DORA‑reportable incidents for critical services where initial regulatory notification was submitted within the RTS deadline, measured quarterly"
"Average number of emails sent during incident response"

Show Answer

Answer: C) "% of DORA‑reportable incidents for critical services where initial regulatory notification was submitted within the RTS deadline, measured quarterly"

Option 3 is specific (DORA‑reportable incidents for critical services), measurable (%), time‑bound (RTS deadline, measured quarterly), and clearly linked to a DORA obligation (timely incident reporting). Options 1 and 2 are too vague or context‑free; option 4 measures an activity that is not clearly linked to resilience outcomes.

Step 7 – Board, Management, and Regulator Reporting

Different stakeholders need different views of the same underlying data.

1. Board‑Level Reporting

Boards care about risk appetite, impact on critical services, and accountability, not raw technical detail.

Key principles:

Aggregation & trend: 3–7 top‑level indicators per domain (incidents, testing, third‑party, cyber).
Narrative: Explain why metrics changed and what management is doing.
Risk appetite linkage: Explicitly show where metrics are within, at, or beyond risk appetite.

Example board dashboard items:

Heatmap of critical business services vs. resilience status (green/amber/red).
Trend of DORA‑reportable incidents by severity and root cause category.
Cloud concentration index vs. board‑approved limit.
Testing coverage of critical services and major findings outstanding.

2. Executive / Senior Management Reporting

Executives need more granular operational detail to manage trade‑offs.

Examples:

Metrics by business unit or region.
Drill‑down into top recurring root causes (e.g., change management, capacity, third‑party issues).
Cost vs. resilience indicators (e.g., incremental cost of redundancy vs. reduction in downtime).

3. Reporting to Supervisory Authorities

Supervisors expect:

Formal incident reports in line with the latest RTS (structure, fields, timelines).
Evidence of lessons learned and follow‑up actions after major incidents.
Documentation of testing programs, methodologies, and results, especially for advanced tests (e.g., threat‑led penetration testing).
Clear governance evidence: minutes of board discussions, risk appetite statements, and decisions based on metrics.

Your challenge as a DORA leader is to re‑use the same data but tailor visualization, granularity, and narrative to each audience.

Step 8 – Draft a One‑Slide Board Dashboard (Thought Exercise)

Imagine you have only one slide to brief the board on DORA metrics this quarter.

Task: Sketch the content of that slide (mentally or on paper) using the structure below.

Top‑Left: Overall Resilience Status

A simple traffic‑light summary for 3–4 domains:
ICT incidents
Testing & exercises
Third‑party risk
Vulnerability management

Top‑Right: Key Metrics vs. Appetite

Choose 3–5 metrics, e.g.:
Number of DORA‑reportable incidents (trend last 4 quarters)
Cloud concentration index vs. limit
% of critical services tested in last 12 months
% of critical vulnerabilities > 30 days

Bottom‑Left: Major Issues and Actions

List 3 bullets:
Key issue
Root cause
Management action & target date

Bottom‑Right: Ask from the Board

E.g., approval for investment, risk appetite adjustment, or endorsement of a remediation plan.

Reflect:

Are you overloading the board with detail?
Does every element on the slide support a decision or oversight responsibility?
Would a supervisor, reading the slide, see clear evidence of active board engagement on DORA?

Step 9 – Continuous Improvement: Closing the Loop

DORA expects iterative strengthening of your ICT risk framework, not a one‑off project. Metrics should feed into formal feedback loops.

1. Lessons‑Learned Loop

For each major incident or failed test:

Capture

What happened?
Which controls failed or were absent?
How did metrics behave (e.g., MTTD, MTTR, capacity utilization)?

Analyze

Root cause analysis (technical, process, human, third‑party).
Were risk appetite and thresholds realistic?
Were early‑warning KRIs triggered but ignored?

Improve

Update controls, procedures, and training.
Adjust metrics (new KRIs? different thresholds?).
Feed changes into testing scenarios and business impact analyses.

2. Audit and Assurance Loop

Internal audit, external audit, and independent reviews provide:

Challenge to your metrics design (are they biased or incomplete?).
Validation of data quality and reporting accuracy.
Recommendations that should be tracked with the same remediation metrics (timeliness, recurrence).

3. Change & Transformation Loop

Major technology or organizational changes (e.g., cloud migration, core banking replacement) should:

Trigger a re‑assessment of KRIs (new dependencies, new failure modes).
Lead to updated stress scenarios and test plans.
Be accompanied by temporary risk indicators (e.g., change‑related incident rate) until the new state stabilizes.

A mature DORA program can show a regulator concrete evidence that:

> Metrics → Insights → Decisions → Changes → Better Metrics → Improved Outcomes.

Step 10 – Staying Aligned with RTS and Future EU Developments

Since DORA entered into force (2023) and became applicable in early 2025, the ESAs have been issuing Regulatory Technical Standards (RTS), Implementing Technical Standards (ITS), and Guidelines that refine what data must be collected and reported.

Your metrics framework must be maintainable under regulatory change.

1. Design for Flexibility

Use a central data model for incidents, assets, providers, and controls.
Avoid hard‑coding regulatory fields in multiple tools; instead, map RTS requirements to this central model.
Maintain a data dictionary that documents:
Field name and definition
Source system
Regulatory references (e.g., DORA article, specific RTS/ITS paragraph)

2. Monitor the Regulatory Pipeline

Track updates not only to DORA RTS/ITS but also to related EU legislation, for example:

NIS2 Directive (network and information systems security) – especially where your entity is also an essential/important entity under NIS2, to avoid duplicate or inconsistent metrics.
EU Cybersecurity Act and related certification schemes that may introduce new assurance expectations.
Sectoral guidance from the ECB, EBA, ESMA, and EIOPA on outsourcing, cloud, and ICT risk.

3. Versioning and Backward Compatibility

When RTS or guidance change:

Version your metrics: keep historical definitions so time‑series remain interpretable.
Document breaks in series (e.g., incident classification schema changed in 2025 Q3).
Use mapping tables to reconcile old and new categories where feasible.

The objective is to be able to show, at any supervisory review:

How your metrics currently align to the latest RTS and guidance.
How and why they have evolved over the last few years.

Step 11 – Flashcard Review: Key Terms and Ideas

Flip through these cards to consolidate your understanding of DORA metrics and continuous improvement.

Key Performance Indicator (KPI) in a DORA context: A quantitative measure of how effectively ICT operations support digital resilience objectives (e.g., service availability, MTTR), focused on performance and outcomes rather than just compliance.
Key Risk Indicator (KRI) in a DORA context: A forward‑looking metric that signals changes in exposure to ICT and operational risk (e.g., number of critical vulnerabilities > 30 days, cloud concentration index), helping anticipate incidents before they materialize.
DORA‑reportable incident: An ICT‑related incident that meets thresholds defined in DORA and its RTS (e.g., impact on critical services, duration, number of users affected) and must be reported to competent authorities within specified timelines.
Testing coverage metric: A measure of how much of your critical/important functions, systems, and third‑party dependencies are exercised through resilience tests (e.g., % of critical services tested against severe‑but‑plausible scenarios in the last 12 months).
Lessons‑learned loop: A structured process that takes incidents, test results, and audit findings, performs root cause analysis, and feeds improvements back into controls, procedures, training, and metrics to strengthen resilience over time.
Cloud concentration risk metric: An indicator that quantifies dependency on specific cloud or ICT providers (e.g., % of critical workloads on one provider or HHI), used to assess resilience and exit strategy feasibility.
Risk appetite linkage: The explicit connection between metrics (e.g., number of major incidents, maximum tolerable downtime) and board‑approved limits or tolerances, enabling governance bodies to judge whether risk is acceptable.
RTS‑driven data model: A structured representation of incidents, assets, providers, and controls that is designed to capture all mandatory fields required by DORA RTS/ITS, enabling consistent reporting and easier adaptation when rules change.

Key Terms

Risk appetite: The amount and type of risk that an organization is willing to accept in pursuit of its objectives, typically approved by the board and expressed via limits and thresholds for key metrics.
ICT‑related incident: An unplanned event in ICT systems or services that disrupts the availability, authenticity, integrity, or confidentiality of data or services.
KRI (Key Risk Indicator): A metric that provides an early signal of increasing risk exposure in a specific area, helping management take preventive action.
MTTD (Mean Time to Detect): Average elapsed time between the occurrence of an incident and its detection by monitoring or other means.
MTTR (Mean Time to Recover): Average elapsed time from the start of an incident to full restoration of normal service.
KPI (Key Performance Indicator): A metric that measures how effectively an organization achieves operational or strategic objectives, focusing on performance outcomes (e.g., service availability, MTTR).
Third‑party concentration risk: The risk that over‑reliance on one or a few ICT or cloud service providers creates a single point of failure that could severely impact critical services.
RTS (Regulatory Technical Standards): Legally binding technical standards developed by the ESAs under DORA to specify detailed requirements, for example on incident reporting, testing, and third‑party risk data.
Digital operational resilience testing: The set of tests, exercises, and assessments (e.g., scenario testing, penetration testing, failover exercises) used to evaluate an entity’s ability to withstand and recover from ICT‑related disruptions.
DORA (Digital Operational Resilience Act): EU Regulation (EU) 2022/2554 establishing a harmonized framework for digital operational resilience of the financial sector, in force since January 2023 and applicable since 17 January 2025.

Step 1 – Why Metrics Matter in DORA (and What ‘Good’ Looks Like)

Step 2 – Building a DORA Metrics Hierarchy (from RTS to Dashboard)

1. Map DORA Articles to Measurement Domains

2. For Each Domain, Define 3–5 Strategic Questions

3. Translate Questions into KPIs and KRIs

4. Align with Business Services and Impact Tolerances

Step 3 – Concrete KPIs & KRIs for Digital Resilience

A. ICT Incident Management Metrics

B. ICT Testing & Control Effectiveness Metrics

C. Third‑Party / Cloud Risk Metrics

Step 4 – Design a Metric from Scratch (Thought Exercise)

Step 5 – Metrics for ICT Incidents, Testing, and Remediation

1. Incident Lifecycle Metrics

2. Testing & Scenario Coverage Metrics

3. Remediation & Control Improvement Metrics

Step 6 – Quick Check: Are These Good DORA Metrics?

Step 7 – Board, Management, and Regulator Reporting

1. Board‑Level Reporting

2. Executive / Senior Management Reporting

3. Reporting to Supervisory Authorities

Step 8 – Draft a One‑Slide Board Dashboard (Thought Exercise)

Step 9 – Continuous Improvement: Closing the Loop

1. Lessons‑Learned Loop

2. Audit and Assurance Loop

3. Change & Transformation Loop

Step 10 – Staying Aligned with RTS and Future EU Developments

1. Design for Flexibility

2. Monitor the Regulatory Pipeline

3. Versioning and Backward Compatibility

Step 11 – Flashcard Review: Key Terms and Ideas

Key Terms

Finished reading?