AutoMAP: Diagnose Your Microservice-based Web Applications Automatically
Abstract
The high complexity and dynamics of the microservice architecture make its application diagnosis extremely challenging. Static troubleshooting approaches may fail to obtain reliable model applies for frequently changing situations. Even if we know the calling dependency of services, we lack a more dynamic diagnosis mechanism due to the existence of indirect fault propagation. Besides, algorithm based on single metric usually fail to identify the root cause of anomaly, as single type of metric is not enough to characterize the anomalies occur in diverse services. In view of this, we design a novel tool, named AutoMAP, which enables dynamic generation of service correlations and automated diagnosis leveraging multiple types of metrics. In AutoMAP, we propose the concept of anomaly behavior graph to describe the correlations between services associated with different types of metrics. Two binary operations, as well as a similarity function on behavior graph are defined to help AutoMAP choose appropriate diagnosis metric in any particular scenario. Following the behavior graph, we design a heuristic investigation algorithm by using forward, self, and backward random walk, with an objective to identify the root cause services. To demonstrate the strengths of AutoMAP, we develop a prototype and evaluate it in both simulated environment and real-work enterprise cloud system. Experimental results clearly indicate that AutoMAP achieves over 90% precision, which significantly outperforms other selected baseline methods. AutoMAP can be quickly deployed in a variety of microservice-based systems without any system knowledge. It also supports introduction of various expert knowledge to improve accuracy.