Depending on the industry vertical a person works in, the term “operational resilience” may have been raised and been a topic of conversation over the past year. Many clients of mine have asked and raised the two most common initial questions; What is operational resilience? Then the inevitable next question; Don’t we do that already? The following statements of “I have a mature business continuity or disaster recovery program” often follow. So, what is the difference? Is it something we should pay attention to? How relevant is it to what we do? I cannot answer how relevant it is to each individual company or methodology, but I can explain what I’ve learned is my usual answer.
Operational Resilience is the name given to a methodology developed by the Bank of England in the United Kingdom applied to financial institutions and insurance companies. They are required to have the methods in place and are audited under FCA and PRA regulations to do so. What followed this methodology was then growth of a different outlook to recovery and resilience in the industry and has been spreading, sometimes with different names but essentially turning a reactive approach into a healthier pro-active approach. Fairly soon after, Europe adopted DORA, or the Digital Operational Resilience Act, Australia with APRA (CPS 230), Basel Committee on BCBS and there will undoubtably be others to follow. There will also undoubtably be variances in the methods adopted for operational resilience with each of these regulations. But should operational resilience be something we should consider if not regulated? I’ll let you be the judge, but if we were to take the approach of looking after our health, a simple analogy of proactively looking after our heart might be to adjust our diet, be aware of how much sodium, cholesterol, or sugar we’re consuming, exercise regularly and monitor our levels. Rather than wait for a disastrous event to recover from. Operational resilience appears to take the same approach as being a proactive steppingstone between managing risk and BCDR. In which case, it warrants looking at and evaluating simply as a good practice.
Essentially, we’re going to review the products or services we offer as a company. Establish the processes or functions that support them. Figure out the impact tolerances of those products or services and test them against scenarios. Once tested we will monitor them. In Fig 1, See how the flow of operational is illustrated.
Fig 1: Operational Resilience Flow
But this all sounds familiar to what we do, what’s the difference? How does this differ from disaster recovery or business continuity. Here’s differentiator; when we do an impact assessment and figure out a recovery time object for our applications or our processes, what does that mean? It means we need to recover after a disaster within that timeframe for the business to survive. Did you notice the key word there? “After”. Our focus is often how to recover, how quickly we can recover. But like with the heart analogy, wouldn’t it be better to be aware of metrics that could tell us or warn us that things were heading in the wrong direction so we could turn the proverbial ship around before the disaster happened? That sounds like a healthier and more proactive approach. But this is the real world, disaster can and WILL happen. All the seismographs that monitor the faults around the world do not prevent earthquakes. To that end, disaster recovery and business continuity can never go away, but perhaps we can add a layer to it that incorporates operational or maybe even enterprise resilience.
What if instead of just measuring how much time a process can be down before there’s a problem, we measure and monitor how many people are needed? How many people can call out sick before the process performance is degraded? How many products, transactions or cups of coffee can we produce if we down a person, a computer, a booth, a store, a manufacturing plant, or maybe a single source vendor? How will that increase or decrease customer satisfaction or breach SLAs if we’re seeing a trend towards the direction that will lead to a disaster or crisis? People I’ve spoken with in the industry have varying levels of program maturity in their organization. Some components of what has been mentioned here may already be part of your program. I would submit that if that’s the case, then the argument of operational resilience being a good practice has been made as you may already be doing part of it.
We have multiple different disciplines in the field of BCDR. I like to look at it with this example: If I asked an IT person to solve a problem, they might develop a program. If I asked a businessperson to solve the same problem they’d come up with a project. If I asked a first responder to solve the problem, they’d have a response to an emergency. We have different BCDR disciplines for a reason, they were solved by people with different purviews and focus. We have often found them operating independently and siloed with some overlap. I see resilience as a major step towards tying many of the disciplines together cohesively. Ultimately our goal is to ensure the viability and dare I say, “Resilience” of our organizations. I am very curious to see how it develops and if it truly becomes a best practice that is here to stay.