BrainBlog for Tanium by Jason Bloomberg
Cybersecurity threats and their resulting breaches are top of mind for CIOs today. Managing such risks, however, is just one aspect of the entire IT risk management landscape that CIOs must address.
Equally important is reliability risk – the risks inherent in IT’s essential fragility. Issues might occur at anytime, anywhere across the complex hybrid IT landscape, potentially slowing or bringing down services.
Addressing such cybersecurity and reliability risks in separate silos is a recipe for failure. Collaboration across the respective responsible teams is essential for effective risk management.
Such collaboration is both an organizational and a technological challenge – and the organizational aspects depend upon the right technology.
The key to solving complex IT ops problems collaboratively, in fact, is to build a common engineering approach to managing risk across the concerns of the security and operations (ops) teams – in other words, a holistic approach to managing risk.
Risk management starting point: site reliability engineering
By engineering, we mean a formal, quantitative approach to measuring and managing operational risks that can lead to reliability issues. The starting point for such an approach is site reliability engineering (SRE).
Jason Bloomberg President, Intellyx tanium.com SRE is a modern technique for managing the risks inherent in running complex, dynamic software deployments – risks like downtime, slowdowns, and the like that might have root causes anywhere, including the network, the software infrastructure, or deployed applications.
The practice of SRE requires dealing with ongoing tradeoffs. The ops team must be able to make fact-based judgments about whether to increase a service’s reliability (and hence, its cost), or lower its reliability and cost to increase the speed of development of the applications providing the service.
Read the entire BrainBlog here.