65 Hits in 0.94 sec

Mining Subjectively Interesting Attributed Subgraphs [article]

Anes Bendimerad, Ahmad Mel, Jefrey Lijffijt, Marc Plantevit, Céline Robardet, Tijl De Bie
2019 arXiv   pre-print
Community detection in graphs, data clustering, and local pattern mining are three mature fields of data mining and machine learning. In recent years, attributed subgraph mining is emerging as a new powerful data mining task in the intersection of these areas. Given a graph and a set of attributes for each vertex, attributed subgraph mining aims to find cohesive subgraphs for which (a subset of) the attribute values has exceptional values in some sense. While research on this task can borrow
more » ... m the three abovementioned fields, the principled integration of graph and attribute data poses two challenges: the definition of a pattern language that is intuitive and lends itself to efficient search strategies, and the formalization of the interestingness of such patterns. We propose an integrated solution to both of these challenges. The proposed pattern language improves upon prior work in being both highly flexible and intuitive. We show how an effective and principled algorithm can enumerate patterns of this language. The proposed approach for quantifying interestingness of patterns of this language is rooted in information theory, and is able to account for prior knowledge on the data. Prior work typically quantifies interestingness based on the cohesion of the subgraph and for the exceptionality of its attributes separately, combining these in a parametrized trade-off. Instead, in our proposal this trade-off is implicitly handled in a principled, parameter-free manner. Extensive empirical results confirm the proposed pattern syntax is intuitive, and the interestingness measure aligns well with actual subjective interestingness.
arXiv:1905.03040v1 fatcat:hlkax52by5bg3a2gbveelewpwu

Sequential recommendation with metric models based on frequent sequences [article]

Corentin Lonjarret, Roch Auburtin, Céline Robardet, Marc Plantevit
2020 arXiv   pre-print
Modeling user preferences (long-term history) and user dynamics (short-term history) is of greatest importance to build efficient sequential recommender systems. The challenge lies in the successful combination of the whole user's history and his recent actions (sequential dynamics) to provide personalized recommendations. Existing methods capture the sequential dynamics of a user using fixed-order Markov chains (usually first order chains) regardless of the user, which limits both the impact
more » ... the past of the user on the recommendation and the ability to adapt its length to the user profile. In this article, we propose to use frequent sequences to identify the most relevant part of the user history for the recommendation. The most salient items are then used in a unified metric model that embeds items based on user preferences and sequential dynamics. Extensive experiments demonstrate that our method outperforms state-of-the-art, especially on sparse datasets. We show that considering sequences of varying lengths improves the recommendations and we also emphasize that these sequences provide explanations on the recommendation.
arXiv:2008.05587v1 fatcat:c5vtmnimuvhx3orm7ty3sfu62i

Exceptional contextual subgraph mining

Mehdi Kaytoue, Marc Plantevit, Albrecht Zimmermann, Anes Bendimerad, Céline Robardet
2017 Machine Learning  
Many relational data result from the aggregation of several individual behaviors described by some characteristics. For instance, a bike-sharing system may be modeled as a graph where vertices stand for bike-share stations and connections represent bike trips made by users from one station to another. Stations and trips are described by additional information such as the description of the geographical environment of the stations (business vs. residential area, closeness to POI, elevation,
more » ... ization density, etc.), or properties of the bike trips (timestamp, user profile, weather, events and other special conditions about the trip). Identifying highly connected components (such as communities or quasi-cliques) in this graph provides interesting insights into global usages but does not capture mobility profiles that characterize a subpopulation. To tackle this problem we propose an approach rooted in exceptional model mining to find exceptional contextual subgraphs, i.e., subgraphs generated from a context or a description of the individual behaviors that is exceptional (behaves in a different way) compared to the whole augmented graph. The dependency between a context and an edge is assessed by a χ 2 test and the weighted relative accuracy measure is used to only retain contexts that strongly characterize connected subgraphs. We present an original algorithm that uses sophisticated pruning techniques to restrict the search space of vertices, context refinements, and edges to be considered. An experimental evaluation on synthetic data and two real-life datasets demonstrates the effectiveness of the proposed pruning mechanisms, as well as the relevance of the discovered patterns.
doi:10.1007/s10994-016-5598-0 fatcat:nr22capolbep5mjgfvyjspu3li

Interpretable Summaries of Black Box Incident Triaging with Subgroup Discovery [article]

Youcef Remil, Anes Bendimerad, Marc Plantevit, Céline Robardet, Mehdi Kaytoue
2021 arXiv   pre-print
The need of predictive maintenance comes with an increasing number of incidents reported by monitoring systems and equipment/software users. In the front line, on-call engineers (OCEs) have to quickly assess the degree of severity of an incident and decide which service to contact for corrective actions. To automate these decisions, several predictive models have been proposed, but the most efficient models are opaque (say, black box), strongly limiting their adoption. In this paper, we propose
more » ... an efficient black box model based on 170K incidents reported to our company over the last 7 years and emphasize on the need of automating triage when incidents are massively reported on thousands of servers running our product, an ERP. Recent developments in eXplainable Artificial Intelligence (XAI) help in providing global explanations to the model, but also, and most importantly, with local explanations for each model prediction/outcome. Sadly, providing a human with an explanation for each outcome is not conceivable when dealing with an important number of daily predictions. To address this problem, we propose an original data-mining method rooted in Subgroup Discovery, a pattern mining technique with the natural ability to group objects that share similar explanations of their black box predictions and provide a description for each group. We evaluate this approach and present our preliminary results which give us good hope towards an effective OCE's adoption. We believe that this approach provides a new way to address the problem of model agnostic outcome explanation.
arXiv:2108.03013v1 fatcat:lq4rvfwdmvgvdip3c3s6onvzly

Fenêtres sur cube

Yoann Pitarch, Anne Laurent, Marc Plantevit, Pascal Poncelet
2010 Ingénierie des Systèmes d'Information  
De nos jours, de nombreuses applications (e.g. surveillance en temps réel, analyse du trafic...) doivent faire face à un flot éventuellement infini de données multidimensionnelles. Dans un tel contexte, il n'est plus possible d'exploiter ces données à un faible niveau de granularité et il faut donc proposer de nouvelles approches d'agrégation prenant en compte ces différentes contraintes. Nous adaptons les technologies OLAP à un contexte temps réel pour proposer une structure qui (1) permette
more » ... e analyse multidimensionnelle et multiniveau efficace et (2) satisfasse une contrainte critique dans les flots de données : l'espace de stockage. Traditionnellement, l'historique des données de granularité faible n'est consulté que sur un passé proche et vouloir les stocker après ce délai devient superflu. Nous proposons de les agréger en fonction de l'évolution du flot au cours du temps passé en étendant le principe de fenêtres temporelles à toutes les dimensions hiérarchisées et introduisons les fonctions de précision pour déterminer à quel moment un niveau de granularité devient superflu. Ces fonctions sont combinées afin de proposer une structure compacte et rapidement maintenable. ABSTRACT. Real-time surveillance systems and other dynamic environments often generate tremendous volume of multidimensional stream data. This volume is too huge to be scanned multiple times and stream data goes around at rather low level of abstraction. So it is unrealistic to stock such data for two main reasons: the technical limit on today's computer and analysts are mostly interested in higher levels of abstraction. To discover such high-level characteristics, one may need to perform on-line multi-level and multi-dimensional analytical processing of stream data. In this paper, we propose a compact architecture to perform such analysis. Since time and space are critical in the context of stream analysis, our architecture is based on two techniques. First, a tilted-time model is used to compress the temporal dimension: the more recent the data is, the finer it is registered. Secondly, recent data are mostly interrogated on fine level of precision. So, we extend the tilted-time model to other multi-level dimensions: precision levels, which are never interrogated, are not materialized. Based on this design methodology, stream cube can be constructed and maintained incrementally with a low amount of memory and a reasonable computation cost. MOTS-CLÉS : OLAP, flot de données, résumé de données, fenêtres temporelles.
doi:10.3166/isi.15.1.9-33 fatcat:xrdugvmdhjfcrfiak4ci4o7obi

Sequential Data Mining for Information Extraction from Texts

Thierry Charnois, Marc Plantevit, Christophe Rigotti, Bruno Crémilleux
2009 Revue TAL  
Cet article montre l'intérêt d'utiliser les motifs issus des méthodes de fouille de données dans le domaine du TAL appliqué à la biologie médicale et génétique, et plus particulièrement dans les tâches d'extraction d'information. Nous proposons une approche pour apprendre les patrons linguistiques par une méthode de fouille de données fondée sur les motifs séquentiels et sur une fouille dite récursive des motifs eux-mêmes. Une originalité de notre approche est de s'affranchir de l'analyse
more » ... ique tout en permettant de produire des résultats symboliques, intelligibles pour l'utilisateur, a contrario des méthodes numériques qui restent difficilement interprétables. Elle ne nécessite pas de ressources linguistiques autres que le corpus d'apprentissage. Pour la reconnaissance d'entités biologiques nommées, nous proposons une méthode fondée sur un nouveau type de motifs intégrant une séquence et son contexte. ABSTRACT. This paper shows the benefit of using data mining methods for Biological Natural Language Processing. A method for discovering linguistic patterns based on a recursive sequential pattern mining is proposed. It does not require a sentence parsing nor other resource except a training data set. It produces understandable results and we show its interest in the extraction of relations between named entities. For the named entities recognition problem, we propose a method based on a new kind of patterns taking account the sequence and its context. MOTS-CLÉS : extraction d'information, fouille de données, motifs séquentiels et motifs LSR, TAL appliqué aux textes biologiques et génétiques.
dblp:journals/tal/CharnoisPRC09 fatcat:jq4vn5s7sbaidkni5yg3qd6l2y

Sequential Patterns to Discover and Characterise Biological Relations [chapter]

Peggy Cellier, Thierry Charnois, Marc Plantevit
2010 Lecture Notes in Computer Science  
doi:10.1007/978-3-642-12116-6_46 fatcat:tdob7272ovc6ncgf7h3gkscqlm

Local Pattern Detection in Attributed Graphs [chapter]

Jean-François Boulicaut, Marc Plantevit, Céline Robardet
2016 Lecture Notes in Computer Science  
ICASSP + , IJCAI − , KR − , KI − , morik number + Gyula Hermann, Victor Lazzarini, Joseph Timoney, Fred Kitson, Manuel Duarte Ortigueira, Abbas Mohammadi, Riwal Lefort, Jean-Marc Boucher, Artur Przelaskowski  ... 
doi:10.1007/978-3-319-41706-6_8 fatcat:hotg2gtv3jawhf7hdbsu4o4zi4

Gibbs Sampling Subjectively Interesting Tiles [chapter]

Anes Bendimerad, Jefrey Lijffijt, Marc Plantevit, Céline Robardet, Tijl De Bie
2020 Lecture Notes in Computer Science  
The local pattern mining literature has long struggled with the so-called pattern explosion problem: the size of the set of patterns found exceeds the size of the original data. This causes computational problems (enumerating a large set of patterns will inevitably take a substantial amount of time) as well as problems for interpretation and usability (trawling through a large set of patterns is often impractical). Two complementary research lines aim to address this problem. The first aims to
more » ... evelop better measures of interestingness, in order to reduce the number of uninteresting patterns that are returned [6, 10] . The second aims to avoid an exhaustive enumeration of all 'interesting' patterns (where interestingness is quantified in a more traditional way, e.g. frequency), by directly sampling from this set in a way that more 'interesting' patterns are sampled with higher probability [2] . Unfortunately, the first research line does not reduce computational cost, while the second may miss out on the most interesting patterns. In this paper, we combine the best of both worlds for mining interesting tiles [8] from binary databases. Specifically, we propose a new pattern sampling approach based on Gibbs sampling, where the probability of sampling a pattern is proportional to their subjective interestingness [6]-an interestingness measure reported to better represent true interestingness. The experimental evaluation confirms the theory, but also reveals an important weakness of the proposed approach which we speculate is shared with any other pattern sampling approach. We thus conclude with a broader discussion of this issue, and a forward look.
doi:10.1007/978-3-030-44584-3_7 fatcat:6djzaqymsbcbtbqlocobtd2gdi

Trend Mining in Dynamic Attributed Graphs [chapter]

Elise Desmier, Marc Plantevit, Céline Robardet, Jean-François Boulicaut
2013 Lecture Notes in Computer Science  
Many applications see huge demands of discovering important patterns in dynamic attributed graph. In this paper, we introduce the problem of discovering trend sub-graphs in dynamic attributed graphs. This new kind of pattern relies on the graph structure and the temporal evolution of the attribute values. Several interestingness measures are introduced to focus on the most relevant patterns with regard to the graph structure, the vertex attributes, and the time. We design an efficient algorithm
more » ... that benefits from various constraint properties and provide an extensive empirical study from several real-world dynamic attributed graphs.
doi:10.1007/978-3-642-40988-2_42 fatcat:i47usocbejccpcf3adv53hgwhy


Marc Plantevit, Anne Laurent, Maguelonne Teisseire
2006 Proceedings of the 9th ACM international workshop on Data warehousing and OLAP - DOLAP '06  
[12] , Plantevit et al. [13] , and Yu et al. [17] . They aim at discovering patterns that take time into account and that involve several dimensions.  ... 
doi:10.1145/1183512.1183518 dblp:conf/dolap/PlantevitLT06 fatcat:lgjponvs6jeljft34unypqm2oe

Une méthode pour caractériser les communautés des réseaux dynamiques à attributs [article]

Günce Keziban Orman, Vincent Labatut, Marc Plantevit , Jean-François Boulicaut
2013 arXiv   pre-print
Pour ce faire, nous appliquons la méthode de post-traitement définie dans (Plantevit et Cremilleux, 2009 ) pour calculer les taux de croissance de séquences d'article classés.  ... 
arXiv:1312.4676v1 fatcat:qyalaox4bzhnvesrn2pt42smue

Mining exceptional closed patterns in attributed graphs

Anes Bendimerad, Marc Plantevit, Céline Robardet
2017 Knowledge and Information Systems  
Geo-located social media provide a large amount of information describing urban areas based on user descriptions and comments. Such data makes possible to identify meaningful city neighborhoods on the basis of the footprints left by a large and diverse population that uses this type of media. In this paper, we present some methods to exhibit the predominant activities and their associated urban areas to automatically describe a whole city. Based on a suitably attributed graph model, our
more » ... identifies neighborhoods with homogeneous and exceptional characteristics. We introduce the novel problem of exceptional subgraph mining in attributed graphs and propose a complete algorithm that takes benefits from closure operators, new upper bounds and pruning properties. We also define an approach to sample the space of closed exceptional subgraphs within a given time-budget. Experiments performed on 10 real datasets are reported and demonstrate the relevancy of both approaches, and also show their limits.
doi:10.1007/s10115-017-1109-2 fatcat:ogjujeunbzctnajowywb2dyoaa

What effects topological changes in dynamic graphs?

Mehdi Kaytoue, Yoann Pitarch, Marc Plantevit, Céline Robardet
2015 Social Network Analysis and Mining  
To describe the dynamics taking place in networks that structurally change over time, we propose an approach to search for vertex attributes whose value changes impact the topology of the graph. In several applications, it appears that the variations of a group of attributes are often followed by some structural changes in the graph that one may assume they generate. We formalize the triggering pattern discovery problem as a method jointly rooted in sequence mining and graph analysis. We apply
more » ... ur approach on three real-world dynamic graphs of different natures -a co-authoring network, an airline network, and a social bookmarking system -assessing the relevancy of the triggering pattern mining approach.
doi:10.1007/s13278-015-0294-9 fatcat:fjlzvtmoyzespb6chauerwf2cy

Condensed Representation of Sequential Patterns According to Frequency-Based Measures [chapter]

Marc Plantevit, Bruno Crémilleux
2009 Lecture Notes in Computer Science  
Condensed representations of patterns are at the core of many data mining works and there are a lot of contributions handling data described by items. In this paper, we tackle sequential data and we define an exact condensed representation for sequential patterns according to the frequency-based measures. These measures are often used, typically in order to evaluate classification rules. Furthermore, we show how to infer the best patterns according to these measures, i.e., the patterns which
more » ... imize them. These patterns are immediately obtained from the condensed representation so that this approach is easily usable in practice. Experiments conducted on various datasets demonstrate the feasibility and the interest of our approach.
doi:10.1007/978-3-642-03915-7_14 fatcat:xkna4ruy3zdnnhc6bq47vdcwee
« Previous Showing results 1 — 15 out of 65 results