ON THE GRAPH EDIT DISTANCE COST: PROPERTIES AND APPLICATIONS

ALBERT SOLÉ-RIBALTA, FRANCESC SERRATOSA, ALBERTO SANFELIU
2012 International journal of pattern recognition and artificial intelligence  
We model the edit distance as a function in a labelling space. A labelling space is an Euclidean space where coordinates are the edit costs. Through this model, we define a class of cost. A class of cost is a region in the labelling space that all the edit costs have the same optimal labelling. Moreover, we characterise the distance value through the labelling space. This new point of view of the edit distance gives as the opportunity of defining some interesting properties that are useful for
more » ... better understanding of the edit distance. Finally, we show the usefulness of these properties through some applications. distances should be a metric and thus fulfil the four metric properties: (1) non-negativity, (2) identity of indiscernible, (3) symmetry, and (4) triangle inequality. Some graph distances appear in the literature [20-22], but probably the most well known distance is the Graph Edit Distance [23]. The application of graph edit distance is extensive [24] and therefore numerous algorithms to compute the Graph Edit Distance can be found in the literature, such as [25-29]. Moreover, some theoretical papers describe properties of a particular definition of the Graph Edit Distance [30, 31] and [32]. The use of the Graph Edit Distance tailored to a particular problem requires some application-dependent functions to be defined. The optimal definition and specification of these functions is no trivially undertaken, and several works have addressed to this task. The most relevant of these are [5, 33, 34] and [35] . The main contribution of these works has been to prove that String Edit Distance and Graph Edit Distance contain several classes of cost. In this article, we define some un-described properties of the Graph Edit Distance. Our specific definition and interpretation of the Graph Edit Distance allows each class of cost to be described using a plane equation and allows the shape of each class of cost to be described as well. The use of these new properties is twofold. On the one hand they can be used to improve performance of existing algorithms and, on the other hand, they can be used to develop more efficient graph algorithms. An interesting survey summarizing the most important contributions on Graph Edit Distance has recently been published [24] . The aim of this paper is to go a step further about the findings related about edit costs presented by professor Horst Bunke and his colleagues. They presented some new ideas about the existence of classes of edit costs in strings [32] and graphs [30] . They show that the edit costs can be clustered in some classes, in which the edit distance behaves in a similar manner. We present here a new methodology to represent these classes of cost and some properties. We also show the usefulness of this methodology and properties through some applications.
doi:10.1142/s021800141260004x fatcat:2bplaauyijgp5puvgv3dd3ab6u