So you want to look at resilience… You have just had an issue / outage / scare and you decide that time is ripe to tackle this problem head on… But what are you really after, what is your motivation to fix / look at resilience approaches and whom do you think might fix it for you?
Resilience means so many things to many people and also can be taken in a different context based on who is offering a view….
Business resilience?? Application resilience?? Infrastructure resilience?? -> all will have a different view– somehow you have to make the whole lot align with core focus on one thing – what the business needs / wants / desires and the outcome that is required.
So i am a lowly infrastructure fella – i guess I cant even start to comprehend some of the stuff higher up the stack – but one thing i have learnt and continue to learn is that you have to start at the top and work down (i.e. understand what your business needs to achieve when implementing resilience approaches / business continuance and work it through the rest of the org).
You start trying to fix / understand resilience at an infrastructure layer and forget the fact that your servicing the business and not understanding what is important to your given company's role in life – you probably have some issues bigger than *just* trying to implement a resilience theme.
So back to what i guess we should do (and looking at things through an infrastructure lens (after all – that is what i do…)) – when servicing a greater call in life and working with / within / assisting resilience programme of work
- Understand the business requirements / what is required and focus on these as core items
- What are the application environments that correlate to the business (and let the business tell you critical business processes & services – dont guess on their behalf!)
- Ensure that you have good views of:
- Knowledge of business applications
- a view of business services and the technology map that provides this function
- A decent service catalogue
- A decent product catalogue
- a proper inventory of you technology with a relevant CMDB in place and capable of providing appropriate views (prior 3 bullet points are good “lenses” that you should be able to apply to a CMDB to get views that are required
- An ability to put business applications into appropriate “buckets” of criticallity and work allowing you to focus on the business services that are most important and relevant to the business
- Application developers – don't presume that infrastructure dudes are taking care of resilience approaches and have HA / DR / Resumption approaches built into the infrastructure layer so that you don't need to worry about it
- Infrastructure dudes – don't presume that application developers have coded an app to recover from a disaster scenario – they may be presuming that infrastructure is taking care of everything
- Infrastructure dudes meet application dudes, application dudes meet infrastructure dudes – its good to work together on this stuff
- Ensure that common resumption principles are agreed too and adhered too across technology – an example maybe that, when a DR event happens, we always fix-forward (i.e. application owners, developers & infrastructure guys all work forwards from a common point to ensure consistency is achieved).
Part 2 to follow…..