Group-based privacy preservation techniques for process mining

Majid Rafiei, Wil M.P. van der Aalst
2021 Data & Knowledge Engineering  
Process mining techniques help to improve processes using event data. Such data are widely available in information systems. However, they often contain highly sensitive information. For example, healthcare information systems record event data that can be utilized by process mining techniques to improve the treatment process, reduce patient's waiting times, improve resource productivity, etc. However, the recorded event data include highly sensitive information related to treatment activities.
more » ... eatment activities. Responsible process mining should provide insights about the underlying processes, yet, at the same time, it should not reveal sensitive information. In this paper, we discuss the challenges regarding directly applying existing well-known group-based privacy preservation techniques, e.g., k-anonymity, l-diversity, etc, to event data. We provide formal definitions of attack models and introduce an effective group-based privacy preservation technique for process mining. Our technique covers the main perspectives of process mining including control-flow, time, case, and organizational perspectives. The proposed technique provides interpretable and adjustable parameters to handle different privacy aspects. We employ real-life event data and evaluate both data utility and result utility to show the effectiveness of the privacy preservation technique. We also compare this approach with other group-based approaches for privacy-preserving event data publishing. Process mining employs event data to discover, analyze, and improve the real processes [1]. Indeed, it provides fact-based insights into the actual processes using event logs. There are many algorithms and techniques in the field of process mining. However, the three basic types of process mining are (1) process discovery, where the goal is to learn real process models from event logs, (2) conformance checking, where the aim is to find commonalities and discordances between a process model and an event log, and (3) process re-engineering (enhancement), where the aim is to extend or improve a process model using different aspects of the available data. J o u r n a l P r e -p r o o f Journal Pre-proof 2 Majid Rafiei and Wil M.P. van der Aalst An event log is a collection of events where each event is described by its attributes [1] . The typical attributes required for the main process mining algorithms are case identifier, activity, timestamp, and resource. The case identifier refers to the entity that the event belongs to, the activity refers to the activity associated with the event, the timestamp is the time that the event occurred, and the resource is the activity performer. In the human-centered processes, case identifiers refer to persons. For example, in a patient treatment process, the case identifiers refer to the patients whose data are recorded. Moreover, the resource attribute often refers to the persons performing activities, e.g., in the healthcare context, the resources refer to the doctors or nurses performing activities for the patients. The event attributes are not limited to the above-mentioned ones, and an event may also carry other case-related attributes, so-called case attributes, e.g., age, salary, disease, etc, which could be considered as sensitive person-specific information. Table 1 shows a sample event log.
doi:10.1016/j.datak.2021.101908 fatcat:oedeqwmj4nebbm3tlyoh3nzf6e