Lightweight and Adaptive Service API Performance Monitoring in Highly Dynamic Cloud Environment
Abstract
Cloud platforms and services usually provide an APIlayer as decoupled, language agnostic interface for both front-endclient integration and back-end data and/or function access. Theavailability and performance of the APIs have significant impact onthe quality of end user or client experiences due to its nature ofinteraction endpoints. However, the extreme dynamics, complexityand scale of the current cloud platforms challenge the applicabilityof the existing performance monitoring and anomaly detection approachesfrom timeliness, accuracy, and scalability perspectives. Thispaper presents a novel approach to API performance monitoring,which recognizes performance problems by response time deviationfrom a baseline response time / throughput model that are createdand continuously updated through online learning. In the postdetectionphase, an MIC (Maximal Information Criteria) basedcorrelation algorithm is used to group alerts into a higher leveland more informative hyper-Alerts for end user notification. Weprototyped our solution for a large-scale commercial cloud platform,evaluated it using three months' API performance metrics data,and compared with a couple of existing representative algorithmsand tools. The results show our approach is able to detect APIperformance anomalies with a high F1-score. Compared to existingGranger based approach, our approach has achieved nearly onetime increase in F1-score. Moreover, the alert reduction ratio of ourapproach outperforms several state-of-The-Art approaches.