SIOS iQ: Next-Generation Machine Learning for Holistic Operational Analysis

The Importance of Machine Learning

As innovation in big data analysis techniques accelerates, the bar for operational analytics rises apace. The sheer quantity of different types of operational data, from log files to real-time performance metrics, is no longer an excuse for delivering incomplete or inaccurate information. Today, IT organizations require thorough, comprehensive, and real-time insights into their production environments.

sioslogoTraditional operational analytics approaches focus on recording and reporting discrete events – for example, CPU utilization exceeding a particular threshold. However, these traditional event-based approaches fall short in today’s increasingly complex and dynamically changing environments.

Not only do threshold-based events generate alert storms of useless information, bogging down the admin’s management tools and leading to the “crying wolf” problem of alert fatigue, but such tools also miss important correlations that can indicate severe problems.

What first generation operational analytics tools lack is machine learning. Machine learning is absolutely necessary for tracking patterns of behavior over time across the entire environment – network, storage, compute, and applications – automatically learning the characteristics of normal behavior in order to identify anomalous patterns.

First-generation operational analytics tools offer trending and averaging analytical approaches, which fall short in today’s dynamic environments. Because such older products lack the machine learning capabilities of more modern tools, false positives are common, and it soon becomes impractical to detect all but the most obvious issues.

Today, however, many of the vendors in the operational analytics market understand the importance of machine learning, and have implemented it in one way or another in their analytics tooling.

The overarching goal of these tools is to find and resolve the root causes of performance issues and other problems. To find such root causes accurately, these tools require machine learning for anomaly detection, which is the ability to identify behavior that is out of the norm.

The reason anomaly detection requires machine learning is straightforward: the tool must learn what constitutes normal behavior in order to recognize anomalous behavior. Straightforward to be sure, but this basic principle still faces important challenge: what if “normal” behavior is so dynamic, or the quantity of data so diverse and voluminous, that the machine learning algorithm can’t get a good read on it?

Next-Generation Machine Learning: Topological Behavior Analysis

In emerging dynamic environments with multiple layers of both physical and virtual components, significant and subtle interplay between operational real-time data of related objects (such as input/output performance and latency) requires understanding and learning over time in order to accurately call out anomalous behavior.

California-based SIOS Technology Corp. has risen to this challenge with an advanced approach to machine learning called topological behavior analysis (TBA).

TBA classifies the interplay among real-time operational data into clusters, forming nodes of common behaviors and building relationships or edges connecting nodes having similar data points in common. Because each node represents multiple data points, the overall network provides a compressed version of extremely high dimensional data.

TBA is a significant improvement over traditional machine learning in four fundamental ways. First, it facilitates the analysis of all data points, rather than taking a statistical approach to identifying clusters of behaviors from a narrow set of objects – an approach that can miss subtle distinctions.

Second, it learns composite behavior: the behavior of the entire infrastructure, consisting of multiple interrelated components across the operational environment, taken together holistically to uncover important issues that less sophisticated approaches might miss.

Third, it provides the foundation for determining the causal relationships behind root cause analysis. TBA identifies of all impacted objects in order to determine the root cause, leading to an accurate recommendation for correction.

Finally, TBA can accurately predict the impact of changes for a variety of purposes, such as workload addition or configuration changes.

Another important benefit of this next-generation analysis is that it eliminates the need for thresholding. For example, in a traditional operational environment, an admin might set a threshold value for storage latency. Whenever the latency exceeded that value, the management tool would send an alert.

This approach is unable to adapt to dynamic changes in the environment or understand patterns of expected behavior. Alert storms are the unfortunate result.

With TBA, however, the management tool is looking at behaviors of interrelated objects over time. It is smart enough to know that exceeding a particular value may or may not be a problem, depending upon whether the metric in question were to cause an issue with some other metric or impact some other object, say, the performance of an application.

Machine learning and anomaly detection alone are insufficient to identify root causes of such problems. However, with TBA, the analysis leads automatically from issue identification to root cause identification to solution recommendation, eliminating the need for any manual research towards final resolution.

The end result of TBA, therefore, is a holistic approach to operations management that affords the ability to derive accurate identification of issues and their root cause across dynamic, complex, distributed environments – in real-time.

Topological Behavior Analysis in SIOS iQ

SIOS iQ is one next-generation operational analytics application that offers TBA-driven machine learning. SIOS iQ learns the complex interactions and behavior of objects (CPU, storage, network, applications) in the enterprise operational environment in order to deliver a simple, intelligent identification of the root cause of each problem with specific recommendations for resolution.

SIOS iQ’s machine learning leverages TBA to create a graph that relates CPU, app, memory, storage, and network behaviors. This next-generation approach identifies patterns of behavior across applications and infrastructure – detecting anomalies along with correlated events that can predict issues and uncover their underlying causes.

TBA is the secret sauce to SIOS iQ’s approach to solving problems like compute contention. There is no need to manually configure or set thresholds using SIOS iQ’s learning technology. Instead, SIOS iQ learns the complex interactions of the infrastructure, automatically discovers object relationships, and finds subtle behavior anomalies – accurately linking these back to the group of impacted objects, as well as the underlying cause of any issues. An example of the SIOS iQ interface appears in the image below.

Screenshot SIOS_PERC_DashboardSIOS iQ (Source: SIOS)

The Intellyx Take

Topological Behavior Analysis is similar in many respects to the semantic analysis behind such technologies as natural language processing and semantic search. Semantic analysis takes unstructured text as input and creates a network topology of entities, thus distilling the essence of meaning from the context of the text.

The TBA that SIOS uses, in contrast, takes a diverse, real-time stream of operational data as its input. As a result, the challenges are somewhat different from semantic analysis, as SIOS iQ can expect a certain amount of structure from its inputs – log files look like log files and so on.

But while this expectation of structure simplifies the entity extraction challenge, the sheer volume of diverse data feeds complicates the analysis. The fact that SIOS can deliver real-time insights into root causes of issues, therefore, underscores the true innovation that SIOS has brought to the market with their SIOS iQ product.

SIOS is an Intellyx client. Intellyx retains full editorial control over the content of this article.

SHARE THIS:

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.