Automated discovery of declarative process models with correlated data conditions

Volodymyr Leno, Marlon Dumas, Fabrizio Maria Maggi, Marcello La Rosa, Artem Polyvyanyy
2019 Information Systems  
Automated process discovery techniques enable users to generate business process models from event logs extracted from enterprise information systems. Traditional techniques in this field generate procedural process models (e.g., in the BPMN notation). When dealing with highly variable processes, the resulting procedural models are often too complex to be practically usable. An alternative approach is to discover declarative process models, which represent the behavior of the process as a set
more » ... constraints. Declarative process discovery techniques have been shown to produce simpler models than procedural ones, particularly for processes with high variability. However, the bulk of approaches for automated discovery of declarative process models focus on the control-flow perspective, ignoring the data perspective. This paper addresses the problem of discovering declarative process models with data conditions. Specifically, the paper tackles the problem of discovering constraints that involve two activities of the process such that each of these two activities is associated with a condition that must hold when the activity occurs. The paper presents and compares two approaches to the problem of discovering such conditions. The first approach uses clustering techniques in conjunction with a rule mining technique, while the second approach relies on redescription mining techniques. The two approaches (and their variants) are empirically compared using a combination of synthetic and real-life event logs. The experimental results show that the former approach outperforms the latter when it comes to re-discovering constraints artificially injected in a log. Also, the former approach is in most of the cases more computationally efficient. On the other hand, redescription mining discovers rules with higher confidence (and lower support) suggesting that it may be used to discover constraints that hold for smaller subsets of cases of a process.
doi:10.1016/ fatcat:57bd2tnz5nahbhvl4a7kdj6lee