Ontology-driven root-cause analytics for user-reported symptoms in managed IT systems
Abstract
Enterprise users of IT services seek real-time contextual insights during system-failure scenarios in both cloud-provisioned and legacy management systems. Current IT management systems mostly provide front-office automation support, such as ticket categorization and scheduling, using a generalized set of troubleshooting instructions. Therefore, in this paper, we propose an IT management system that provides real-time insights on user-perceived failures (e.g., 'Why is application not responding?') expressed in natural language texts. We achieve this through an underpinning of a knowledge graph that helps in discovering possible topology patterns comprising multiple interdependent systems for a specific purpose. Based on the detected list of topology patterns, the proposed system composes multiple debugging workflows to generate accurate operational insights. The user interactions are 'system agnostic' in nature and do not depend on the knowledge of the underlying system topology. This significantly augments the self-assist scenarios of end-users and front-office agents, before they engage with IT support teams. We demonstrate our proposed approach, as a cloud application with a natural language interface, using an experimental setup involving a standard ticket management system.