A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is application/pdf
.
Filters
Newton Trees
[chapter]
2010
Lecture Notes in Computer Science
This paper presents Newton trees, a redefinition of probability estimation trees (PET) based on a stochastic understanding of decision trees that follows the principle of attraction (relating mass and distance through the Inverse Square Law). The structure, application and the graphical representation of Newton trees provide a way to make their stochastically driven predictions compatible with user's intelligibility, so preserving one of the most desirable features of decision trees,
doi:10.1007/978-3-642-17432-2_18
fatcat:nw5tg6tvtjbzbexto3vj7utbqu
more »
... bility. Unlike almost all existing decision tree learning methods, which use different kinds of partitions depending on the attribute datatype, the construction of prototypes and the derivation of probabilities from distances are identical for every datatype (nominal and numerical, but also structured). We present a way of graphically representing the original stochastic probability estimation trees using a user-friendly gravitation simile.We include experiments showing that Newton trees outperform other PETs in probability estimation and accuracy.
Using negotiable features for prescription problems
2010
Computing
Data mining is usually concerned on the construction of accurate models from data, which are usually applied to well-defined problems that can be clearly isolated and formulated independently from other problems. Although much computational effort is devoted for their training and statistical evaluation, model deployment can also represent a scientific problem, when several data mining models have to be used together, constraints appear on their application, or they have to be included in
doi:10.1007/s00607-010-0129-5
fatcat:f25lrbqfinhhbhtn73zhzi4qfq
more »
... on processes based on different rules, equations and constraints. In this paper we address the problem of combining several data mining models for objects and individuals in a common scenario, where not only we can affect decisions as the result of a change in one or more data mining models, but we have to solve several optimisation problems, such as choosing one or more inputs to get the best overall result, or readjusting probabilities after a failure. We illustrate the point in the area of Customer Relationship Management (CRM), where we deal with the general problem of prescription between products and customers. We introduce the concept of negotiable feature, which leads to an extended taxonomy of CRM problems of greater complexity, since each new negotiable feature implies a new degree of freedom. In this context, we introduce several new problems and techniques, such as data mining model inversion (by ranging on the inputs or by changing classification problems into regression problems by function inversion), expected profit estimation and curves, global optimisation through a Montecarlo method, and several negotiation strategies in order to solve this maximisation problem.
Bagging Decision Multi-trees
[chapter]
2004
Lecture Notes in Computer Science
Ensemble methods improve accuracy by combining the predictions of a set of different hypotheses. A well-known method for generating hypothesis ensembles is Bagging. One of the main drawbacks of ensemble methods in general, and Bagging in particular, is the huge amount of computational resources required to learn, store, and apply the set of models. Another problem is that even using the bootstrap technique, many simple models are similar, so limiting the ensemble diversity. In this work, we
doi:10.1007/978-3-540-25966-4_4
fatcat:6shcv2wvlbfvhb3xtwdyxrp3ee
more »
... stigate an optimization technique based on sharing the common parts of the models from an ensemble formed by decision trees in order to minimize both problems. Concretely, we employ a structure called decision multi-tree which can contain simultaneously a set of decision trees and hence consider just once the "repeated" parts. A thorough experimental evaluation is included to show that the proposed optimisation technique pays off in practice.
Forgetting and consolidation for incremental and cumulative knowledge acquisition systems
[article]
2015
arXiv
pre-print
The absence of forgetting was masterly described by Jose Luis Borges in his tale "Funes, the Memorious" (1942): "To think is to forget a difference, to generalise, to abstract. ...
arXiv:1502.05615v1
fatcat:tx65kjibczbbhbvv6pfzhifnwq
Shared Ensemble Learning Using Multi-trees
[chapter]
2002
Lecture Notes in Computer Science
Decision tree learning is a machine learning technique that allows us to generate accurate and comprehensible models. Accuracy can be improved by ensemble methods which combine the predictions of a set of different trees. However, a large amount of resources is necessary to generate the ensemble. In this paper, we introduce a new ensemble method that minimises the usage of resources by sharing the common parts of the components of the ensemble. For this purpose, we learn a decision multi-tree
doi:10.1007/3-540-36131-6_21
fatcat:uk4c5v2bsbf2dcu65lxjtvnwt4
more »
... stead of a decision tree. We call this new approach shared ensembles. The use of a multi-tree produces an exponential number of hypotheses to be combined, which provides better results than boosting/bagging. We performed several experiments, showing that the technique allows us to obtain accurate models and improves the use of resources with respect to classical ensemble methods.
Aggregative quantification for regression
2013
Data mining and knowledge discovery
The problem of estimating the class distribution (or prevalence) for a new unlabelled dataset (from a possibly different distribution) is a very common problem which has been addressed in one way or another in the past decades. This problem has been recently reconsidered as a new task in data mining, renamed quantification when the estimation is performed as an aggregation (and possible adjustment) of a single-instance supervised model (e.g., a classifier). However, the study of quantification
doi:10.1007/s10618-013-0308-z
fatcat:fuutiyazfzb77lgqedydrtezzm
more »
... as been limited to classification, while it is clear that this problem also appears, perhaps even more frequently, with other predictive problems, such asregression. In this case, the goal is to determine a distribution or an aggregated indicator of the output variable for a new unlabelled dataset. In this paper, we introduce a comprehensive new taxonomy of quantification tasks, distinguishing between the estimation of the whole distribution and the estimation of some indicators (summary statistics), for both classification and regression. This distinction is especially useful for regression, since predictions are numerical values that can be aggregated in many different ways, as in multi-dimensional hierarchical data warehouses. We focus on aggregative quantification for regression and see that the approaches borrowed from classification do not work. We present several techniques based on segmentation which are able to produce accurate estimations of the expected value and the distribution of the output variable. We show experimentally that these methods especially excel for the relevant scenarios where training and test distributions dramatically differ.
Quantification via Probability Estimators
2010
2010 IEEE International Conference on Data Mining
Quantification is the name given to a novel machine learning task which deals with correctly estimating the number of elements of one class in a set of examples. The output of a quantifier is a real value; since training instances are the same as a classification problem, a natural approach is to train a classifier and to derive a quantifier from it. Some previous works have shown that just classifying the instances and counting the examples belonging to the class of interest (classify & count)
doi:10.1109/icdm.2010.75
dblp:conf/icdm/BellaFHR10
fatcat:dy7iprpesnaobog6sgs5xmpxqq
more »
... typically yields bad quantifiers, especially when the class distribution may vary between training and test. Hence, adjusted versions of classify & count have been developed by using modified thresholds. However, previous works have explicitly discarded (without a deep analysis) any possible approach based on the probability estimations of the classifier. In this paper, we present a method based on averaging the probability estimations of a classifier with a very simple scaling that does perform reasonably well, showing that probability estimators for quantification capture a richer view of the problem than methods based on a threshold.
Similarity-Binning Averaging: A Generalisation of Binning Calibration
[chapter]
2009
Lecture Notes in Computer Science
In this paper we revisit the problem of classifier calibration, motivated by the issue that existing calibration methods ignore the problem attributes (i.e., they are univariate). These methods only use the estimated probability as input and ignore other important information, such as the original attributes of the problem. We propose a new calibration method inspired in binning-based methods in which the calibrated probabilities are obtained from k instances from a dataset. Bins are
doi:10.1007/978-3-642-04394-9_42
fatcat:kwdrt4kzgzfcthli63di4g5wea
more »
... by including the k-most similar instances, considering not only estimated probabilities but also the original attributes. This method has been experimentally evaluated wrt. two calibration measures, including a comparison with other traditional calibration methods. The results show that the new method outperforms the most commonly used calibration methods.
On the effect of calibration in classifier combination
2012
Applied intelligence (Boston)
Ramírez-Quintana DSIC-ELP, Universitat Politècnica de València, Camí de Vera s/n, 46022 Valencia, Spain Tel: +34 96 387 7007 Ext: {83502, 83505, 73585, 73586} Fax: +34 96 387 73 59 E-mail: {abella, cferri ...
doi:10.1007/s10489-012-0388-2
fatcat:nidj62jkzjaxrn3eumsnl6ygli
Data Mining Strategies for CRM Negotiation Prescription Problems
[chapter]
2010
Lecture Notes in Computer Science
In some data mining problems, there are some input features that can be freely modified at prediction time. Examples happen in retailing, prescription or control (prices, warranties, medicine doses, delivery times, temperatures, etc.). If a traditional model is learned, many possible values for the special attribute will have to be tried to attain the maximum profit. In this paper, we exploit the relationship between these modifiable (or negotiable) input features and the output to (1) change
doi:10.1007/978-3-642-13022-9_52
fatcat:yfhpxlswsvg4nncr335npj2mfe
more »
... e problem presentation, possibly turning a classification problem into a regression problem, and (2) maximise profits and derive negotiation strategies. We illustrate our proposal with a paradigmatic Customer Relationship Management (CRM) problem: maximising the profit of a retailing operation where the price is the negotiable input feature. Different negotiation strategies have been experimentally tested to estimate optimal prices, showing that strategies based on negotiable features get higher profits.
CASP-DM: Context Aware Standard Process for Data Mining
[article]
2017
arXiv
pre-print
We propose an extension of the Cross Industry Standard Process for Data Mining (CRISPDM) which addresses specific challenges of machine learning and data mining for context and model reuse handling. This new general context-aware process model is mapped with CRISP-DM reference model proposing some new or enhanced outputs.
arXiv:1709.09003v1
fatcat:giwxqiy7rbc63bdftzghqnfe2i
Learning with Configurable Operators and RL-Based Heuristics
[chapter]
2013
Lecture Notes in Computer Science
In this paper, we push forward the idea of machine learning systems for which the operators can be modified and finetuned for each problem. This allows us to propose a learning paradigm where users can write (or adapt) their operators, according to the problem, data representation and the way the information should be navigated. To achieve this goal, data instances, background knowledge, rules, programs and operators are all written in the same functional language, Erlang. Since changing
doi:10.1007/978-3-642-37382-4_1
fatcat:7oqe67suyjegxaevdiucnqx7du
more »
... rs affect how the search space needs to be explored, heuristics are learnt as a result of a decision process based on reinforcement learning where each action is defined as a choice of operator and rule. As a result, the architecture can be seen as a 'system for writing machine learning systems' or to explore new operators.
On the definition of a general learning system with user-defined operators
[article]
2013
arXiv
pre-print
In this paper, we push forward the idea of machine learning systems whose operators can be modified and fine-tuned for each problem. This allows us to propose a learning paradigm where users can write (or adapt) their operators, according to the problem, data representation and the way the information should be navigated. To achieve this goal, data instances, background knowledge, rules, programs and operators are all written in the same functional language, Erlang. Since changing operators
arXiv:1311.4235v1
fatcat:7pqwyo7vpzhmtimdumfcfowayu
more »
... ct how the search space needs to be explored, heuristics are learnt as a result of a decision process based on reinforcement learning where each action is defined as a choice of operator and rule. As a result, the architecture can be seen as a 'system for writing machine learning systems' or to explore new operators where the policy reuse (as a kind of transfer learning) is allowed. States and actions are represented in a Q matrix which is actually a table, from which a supervised model is learnt. This makes it possible to have a more flexible mapping between old and new problems, since we work with an abstraction of rules and actions. We include some examples sharing reuse and the application of the system gErl to IQ problems. In order to evaluate gErl, we will test it against some structured problems: a selection of IQ test tasks and some experiments on some structured prediction problems (list patterns).
Probabilistic class hierarchies for multiclass classification
2018
Journal of Computational Science
The improvement in the performance of classifiers has been the focus of attention of many researchers over the last few decades. Obtaining accurate predictions becomes more complicated as the number of classes increases. Most families of classification techniques generate models that define decision boundaries trying to separate the classes as well as possible. As an alternative, in this paper, we propose to hierarchically decompose the original multiclass problem by reducing the number of
doi:10.1016/j.jocs.2018.01.006
fatcat:k5zhijhhlbgo7kq7qm7ivfznpi
more »
... es involved in each local subproblem. This is done by deriving a similarity matrix from the misclassification errors given by a first classifier that is learned for this, and then, using the similarity matrix to build a tree-like hierarchy of specialized classifiers. Then, we present two approaches to solve the multiclass problem: the first one traverses the tree of classifiers in a top-down manner similar to the way some hierarchical classification methods do for dealing with hierarchical domains; the second one is inspired in the way probabilistic decision trees compute class membership probabilities. To improve the efficiency of our methods, we propose a criterion to reduce the size of the hierarchy. We experimentally evaluate all of the proposals on a collection of multiclass datasets showing that, in general, the generated classifier hierarchies outperform the original (flat) multiclass classification.
Identifying the Machine Learning Family from Black-Box Models
[chapter]
2018
Lecture Notes in Computer Science
We address the novel question of determining which kind of machine learning model is behind the predictions when we interact with a black-box model. This may allow us to identify families of techniques whose models exhibit similar vulnerabilities and strengths. In our method, we first consider how an adversary can systematically query a given black-box model (oracle) to label an artificially-generated dataset. This labelled dataset is then used for training different surrogate models (each one
doi:10.1007/978-3-030-00374-6_6
fatcat:c2abzp4eojcvxl3w224eky4xfe
more »
... rying to imitate the oracle's behaviour). The method has two different approaches. First, we assume that the family of the surrogate model that achieves the maximum Kappa metric against the oracle labels corresponds to the family of the oracle model. The other approach, based on machine learning, consists in learning a meta-model that is able to predict the model family of a new black-box model. We compare these two approaches experimentally, giving us insight about how explanatory and predictable our concept of family is.
« Previous
Showing results 1 — 15 out of 2,270 results