Sunday, December 23, 2012

Antifragile ITSM

This post is about applying antifragile thinking to ITSM, particularly in the realm of Change Management. I'm a big fan of Nassim Nicholas Taleb, the author of Fooled by Randomness and The Black Swan. I'm currently reading his latest, Antifragile, and it is thought-provoking as his previous works.

A key point in NNT's works is that we often fall into traps because we believe that we can predict the future based on our understanding of the past. While this is true in some domains, we can get ourselves into a lot of trouble when we do so blindly.

A simple example is building a house in an area prone to flooding. The high-water mark is often used to indicate where it is safe to build. But that high-water mark would have been considered safe before that flood occurred! Therefore, we cannot blindly assume that we have seen the worst-case scenario.

In the world of Change Management, managing risk is of the highest concern. Most risk schemes use some combination of projected impact and likely probability to assign risk. The difficulty, as we have seen, is that probability only works when we know the likelihood of outcomes. And in the real world, we don't!

So what should we do instead? NNT suggests that we use fragility as a risk measure. The idea is simple - if there is more downside to a variable than upside, we need to limit our exposure to that variable. If there is more upside, then we don't need to worry about it. Let's see how this idea works on the "7 Rs" that are often used in Change Management.

Variable Fragile Robust Antifragile
Who RAISED the Change? An individual limited by what they know and expect Standard Change with pre-defined expectations Group effort that accounts for multiple perspectives
What is the REASON for the Change? Service reasons directly related to desired outcomes Standard Change with pre-defined relationships Technical reasons unrelated to the IT Service(s) affected
What RETURN will the Change deliver? A failed Change will reduce the value of overall Services more than a successful Change will increase their value The Change has no effect on the value of overall Services regardless of success/failure A failed Change will reduce the value of overall Services less than a successful Change will increase their value
What RISKS are there if we do or do not carry out the Change? The Change could interrupt one or more Services if it fails The Change will not positively or negatively affect IT Services The Change will only affect Services already unavailable (e.g. Emergency Change to restore service)
What RESOURCES will be required to perform this Change? The Change requires resources limited in their availability The Change only requires resources in ample supply The Change will require less resources than it supplies
Who is RESPONSIBLE for this Change being performed? The Change can only be performed by a subset of the team with confidence The Change quality is the same regardless of who executes it The value of learning the organization receives from others performing the Change is greater than the potential reduced quality of the Change
What RELATIONSHIPS are there between this and other Changes? The Change may affect other Changes negatively The Change does not interact with other Changes The Change will only positively impact other Changes

Thinking in this way causes us to focus less on trying to predict what will happen and more on limiting downside. The benefit is that even when we are lousy at predicting the future, we are less likely to be hurt by it. And since we are all lousy at prediction, we end up with a system that is less impacted by our inevitable failures.

How can this thinking be applied to other areas of ITSM? More to come on this topic in the future.