Summarizing Process Traces for Analysis Tasks: An Intuitive and User-controlled Approach
Abstract
Domains such as business processes and workflows require working with multi-dimensional ordered objects. There is a need to analyze this data for operational insights. For example, in business processes, users are interested in clustering process traces to discover per-cluster process models that are less complex. Such applications require the ability to measure the similarity between data objects. However, measuring the similarity between sequence-based data is computationally expensive. We present an intuitive and user-controlled approach to summarize sequence-based multi-dimensional data. Our summarization schemes provide a trade-off between the quality and efficiency of analysis tasks. We also derive an error model for summary-based similarity under an edit-distance constraint. Evaluation results over real-world datasets show the effectiveness of our methods.