Filters








69 Hits in 1.2 sec

Learning Pretopological Spaces for Lexical Taxonomy Acquisition [chapter]

Guillaume Cleuziou, Gaël Dias
2015 Lecture Notes in Computer Science  
In this paper, we propose a new methodology for semisupervised acquisition of lexical taxonomies. Our approach is based on the theory of pretopology that offers a powerful formalism to model semantic relations and transforms a list of terms into a structured term space by combining different discriminant criteria. In order to learn a parameterized pretopological space, we define the Learning Pretopological Spaces strategy based on genetic algorithms. In particular, rare but accurate pieces of
more » ... owledge are used to parameterize the different criteria defining the pretopological term space. Then, a structuring algorithm is used to transform the pretopological space into a lexical taxonomy. Results over three standard datasets evidence improved performances against state-of-the-art associative and pattern-based approaches.
doi:10.1007/978-3-319-23525-7_30 fatcat:zl54hjnirfeltbiy7fqbd5qrl4

Disjunctive Learning with a Soft-Clustering Method [chapter]

Guillaume Cleuziou, Lionel Martin, Christel Vrain
2003 Lecture Notes in Computer Science  
In the case of concept learning from positive and negative examples, it is rarely possible to find a unique discriminating conjunctive rule; in most cases, a disjunctive description is needed. This problem, known as disjunctive learning, is mainly solved by greedy methods, iteratively adding rules until all positive examples are covered. Each rule is determined by discriminating properties, where the discriminating power is computed from the learning set. Each rule defines a subconcept of
more » ... t to be learned with these methods. The final set of sub-concepts is then highly dependent from both the learning set and the learning method. In this paper, we propose a different strategy: we first build clusters of similar examples thus defining subconcepts, and then we characterize each cluster by a unique conjunctive definition. The clustering method relies on a similarity measure designed for examples described in first order logic. The main particularity of our clustering method is to build "soft clusters", i.e. allowing some objects to belong to different groups. Once clusters have been built, we learn first-order rules defining the clusters, using a general-to-specific method: each step consists in adding a literal that covers all examples of a group and rejects as many negative examples as possible. This strategy limits some drawbacks of greedy algorithms and induces a strong reduction of the hypothesis space: for each group (subconcept), the search space is reduced to the set of rules that cover all the examples of the group and reject the negative examples of the concept.
doi:10.1007/978-3-540-39917-9_7 fatcat:txgo63b6qzftznw6yzgisn3vem

Genre and Domain Processing in an Information Retrieval Perspective [chapter]

Céline Poudat, Guillaume Cleuziou
2003 Lecture Notes in Computer Science  
The massive amount of textual data on the Web raises numerous classification problems. Although the notion of domain is widely acknowledged in the IR field, the applicative concept of genre could solve its weaknesses by taking into account the linguistic properties and the document structures of the texts. Two clustering methods are proposed here to illustrate the complementarity of the notions to characterize a closed scientific article corpus. The results are planned to be used in a Web-based
more » ... application. Relevance of a Domain and Genre Coupling in IR Although it is accepted that IR with textual data has to work with texts rather than with sentences or words, the various variables of discourse analysis (domain,
doi:10.1007/3-540-45068-8_73 fatcat:jssxxqpfgfbhloo245j57kvj3u

Informative Polythetic Hierarchical Ephemeral Clustering

Gaël Dias, Guillaume Cleuziou, David Machado
2011 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology  
Ephemeral clustering has been studied for more than a decade, although with low user acceptance. According to us, this situation is mainly due to (1) an excessive number of generated clusters, which makes browsing difficult and (2) low quality labeling, which introduces imprecision within the search process. In this paper, our motivation is twofold. First, we propose to reduce the number of clusters of Web page results, but keeping all different query meanings. For that purpose, we propose a
more » ... polythetic methodology based on an informative similarity measure, the InfoSimba, and a new hierarchical clustering algorithm, the HISGK-means. Second, a theoretical background is proposed to define meaningful cluster labels embedded in the definition of the HISGK-means algorithm, which may elect as best label, words outside the given cluster. To confirm our intuitions, we propose a new evaluation framework, which shows that we are able to extract most of the important query meanings but generating much less clusters than state-of-the-art systems.
doi:10.1109/wi-iat.2011.123 dblp:conf/webi/DiasCM11 fatcat:3fkd2qebunaoticqinspmcrnry

Learning Pretopological Spaces to Model Complex Propagation Phenomena: A Multiple Instance Learning Approach Based on a Logical Modeling [article]

Gaëtan Caillaut, Guillaume Cleuziou
2018 arXiv   pre-print
Cleuziou and Dias (2015) were the first to tackle the problem of learning a pseudo-closure operator from observations.  ...  Cleuziou and Dias (2015) propose the Learn Pretopological Space (LPS) algorithm which makes use of a genetic algorithm.  ... 
arXiv:1805.01278v1 fatcat:ks652kv3s5di3a5gqkfbmykb7y

Generalization of c-means for identifying non-disjoint clusters with overlap regulation

Chiheb-Eddine ben N'Cir, Guillaume Cleuziou, Nadia Essoussi
2014 Pattern Recognition Letters  
doi:10.1016/j.patrec.2014.03.007 fatcat:b76ompqxgfhj3pnbeqvdyhm4oy

On the Impact of Lexical and Linguistic Features in Genre- and Domain-Based Categorization [chapter]

Guillaume Cleuziou, Céline Poudat
2007 Lecture Notes in Computer Science  
Classification in genres and domains is a major field of research for Information Retrieval (scientific and technical watch, datamining, etc.) and the selection of appropriate descriptors to characterize and classify texts is particularly crucial to that effect. Most of practical experiments consider that domains are correlated to the content level (words, tokens, lemmas, etc.) and genres to the morphosyntactic or linguistic one (function words, POS, etc.). However, currently used variables are
more » ... generally not accurate enough to be applied to the categorization task. The present study assesses the impact of the lexical and linguistic levels in the field of genre and domain categorization. The empirical results we obtained demonstrate how important it is to select an appropriate tagset that meets the requirement of the task. The results also assess the efficiency of the linguistic level for both genre-and domain-based categorization.
doi:10.1007/978-3-540-70939-8_53 fatcat:wlpda4juujcedasf525naze45i

An extended version of the k-means method for overlapping clustering

Guillaume Cleuziou
2008 Pattern Recognition (ICPR), Proceedings of the International Conference on  
This paper deals with overlapping clustering, a trade off between crisp and fuzzy clustering. It has been motivated by recent applications in various domains such as information retrieval or biology. We show that the problem of finding a suitable coverage of data by overlapping clusters is not a trivial task. We propose a new objective criterion and the associated algorithm OKM that generalizes the k-means algorithm. Experiments show that overlapping clustering is a good alternative and indicate that OKM outperforms other existing methods.
doi:10.1109/icpr.2008.4761079 dblp:conf/icpr/Cleuziou08 fatcat:sylgheldjnbfrkpklayd2bne6y

Query log driven web search results clustering

Jose G. Moreno, Gaël Dias, Guillaume Cleuziou
2014 Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval - SIGIR '14  
Different important studies in Web search results clustering have recently shown increasing performances motivated by the use of external resources. Following this trend, we present a new algorithm called Dual C-Means, which provides a theoretical background for clustering in different representation spaces. Its originality relies on the fact that external resources can drive the clustering process as well as the labeling task in a single step. To validate our hypotheses, a series of
more » ... are conducted over different standard datasets and in particular over a new dataset built from the TREC Web Track 2012 to take into account query logs information. The comprehensive empirical evaluation of the proposed approach demonstrates its significant advantages over traditional clustering and labeling techniques.
doi:10.1145/2600428.2609583 dblp:conf/sigir/MorenoDC14 fatcat:7njxxc5rqjghnmfoqxgrjsqkfm

A Proximity Measure and a Clustering Method for Concept Extraction in an Ontology Building Perspective [chapter]

Guillaume Cleuziou, Sylvie Billot, Stanislas Lew, Lionel Martin, Christel Vrain
2006 Lecture Notes in Computer Science  
In this paper, we study the problem of clustering textual units in the framework of helping an expert to build a specialized ontology. This work has been achieved in the context of a French project, called Biotim, handling botany corpora. Building an ontology, either automatically or semi-automatically is a difficult task. We focus on one of the main steps of that process, namely structuring the textual units occurring in the texts into classes, likely to represent concepts of the domain. The
more » ... proach that we propose relies on the definition of a new non-symmetrical measure for evaluating the semantic proximity between lemma, taking into account the contexts in which they occur in the documents. Moreover, we present a non-supervised classification algorithm designed for the task at hand and that kind of data. The first experiments performed on botanical data have given relevant results.
doi:10.1007/11875604_77 fatcat:o3rmqssrgrc5rnt5o64me55fq4

QASSIT: A Pretopological Framework for the Automatic Construction of Lexical Taxonomies from Raw Texts

Guillaume Cleuziou, Davide Buscaldi, Gaël Dias, Vincent Levorato, Christine Largeron
2015 Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)  
As starting point, we consider the work from (Cleuziou et al., 2011) which introduced a set of new statistically-based criteria (e.g.  ...  This formalism, as reviewed by (Belmandt, 2011) is commonly used to model complex propagation phenomena thanks to a pseudo-closure operator, recently employed in (Cleuziou et al., 2011) for LT acquisition  ... 
doi:10.18653/v1/s15-2159 dblp:conf/semeval/CleuziouBDLL15 fatcat:zj5hcqkyrnda3cyhzpmranwalu

A pretopological framework for the automatic construction of lexical-semantic structures from texts

Guillaume Cleuziou, Davide Buscaldi, Vincent Levorato, Gaël Dias
2011 Proceedings of the 20th ACM international conference on Information and knowledge management - CIKM '11  
In this paper, we present a new approach for the automatic generation of lexical-semantic structures from texts. In particular, we propose a pretopological framework to formalize and combine various hypotheses on textual data in order to automatically derive a structure similar to common lexicalsemantic knowledge bases such as WordNet. In addition, we define a new metric to intrinsically evaluate structures.
doi:10.1145/2063576.2063990 dblp:conf/cikm/CleuziouBLD11 fatcat:ll3xjizce5grrmkeebhwr2ydeu

QASSIT at SemEval-2016 Task 13: On the integration of Semantic Vectors in Pretopological Spaces for Lexical Taxonomy Acquisition

Guillaume Cleuziou, Jose G. Moreno
2016 Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)  
This paper presents our participation to the Se-mEval "Task 13: Taxonomy Extraction Evaluation (TExEval-2)" (Bordea et al., 2016) . This year, we propose the combination of recent semantic vectors representation into a methodology for semisupervised and auto-supervised acquisition of lexical taxonomies from raw texts. In our proposal, first similarities between concepts are calculated using semantic vectors, then a pretopological space is defined from which a preliminary structure is
more » ... . Finally, a genetic algorithm is used to optimize two different functions, the quality of the added relationships in the taxonomy and the quality of the structure. Experiments show that our proposal has a competitive performance when compared with the other participants achieving the second position in the general rank.
doi:10.18653/v1/s16-1205 dblp:conf/semeval/CleuziouM16 fatcat:flzasxrvqfe3zkcdxlaen7y2qu

Structuring Natural Language Data by Learning Rewriting Rules [chapter]

Guillaume Cleuziou, Lionel Martin, Christel Vrain
Lecture Notes in Computer Science  
The discovery of relationships between concepts is a crucial point in ontology learning (OL). In most cases, OL is achieved from a collection of domain-specific texts, describing the concepts of the domain and their relationships. A natural way to represent the description associated to a particular text is to use a structured term (or tree). We present a method for learning transformation rules, rewriting natural language texts into trees, where the input examples are couples (text, tree). The
more » ... learning process produces an ordered set of rules such that, applying these rules to a text gives the corresponding tree.
doi:10.1007/978-3-540-73847-3_18 fatcat:hkj5hoaiffgnhbr76uvp5rllxu

Mapping General-Specific Noun Relationships to WordNet Hypernym/Hyponym Relations [chapter]

Gaël Dias, Raycho Mukelov, Guillaume Cleuziou
Lecture Notes in Computer Science  
In this paper, we propose a new methodology based on directed graphs and the TextRank algorithm to automatically induce general-specific noun relations from web corpora frequency counts. Different asymmetric association measures are implemented to build the graphs upon which the TextRank algorithm is applied and produces an ordered list of nouns from the most general to the most specific. Experiments are conducted based on the WordNet noun hierarchy and both quantitative and qualitative evaluations are proposed.
doi:10.1007/978-3-540-87696-0_19 fatcat:5dneyxgi3bfm3m5jai5feqvc2m
« Previous Showing results 1 — 15 out of 69 results