Implicit predicates for handling disjunctive fuzzy information in fuzzy databases

Jae Dong Yang
<span title="2002-10-24">2002</span> <i title="Wiley"> <a target="_blank" rel="noopener" href="" style="color: black;">International Journal of Intelligent Systems</a> </i> &nbsp;
There are two aspects of uncertainty: vagueness and ambiguity. This paper proposes IP (Implicit Predicate) to support both vagueness and ambiguity in fuzzy databases by allowing disjunctive fuzzy information. IP is basically a descriptor dedicated to an unknown value which turns out to be ambiguous as well as vague in nature. In this paper we demonstrate that IP is a promising construct which can not only deal with disjunctive fuzzy information, but also make a sophisticated concept-based match
more &raquo; ... possible by coupling thesauri with fuzzy databases. We also propose a query evaluation mechanism to derive exact answers from the disjunctive fuzzy information by fully exploiting IPs . Introduction Uncertainty can be classified into two categories: vagueness and ambiguity in information systems. While vagueness is associated with difficulty making a sharp boundary of a concept, ambiguity is related with difficulty of choice between two or more alternative concepts, in assigning an element to them. For example, suppose available is a piece of disjunctive information that Smith's major is either computer science or electronics. If we try to judge that he majors in µ-processor, we would meet the two kinds of uncertainty; vagueness arises to decide whether µ-processor is included in computer science or not, whereas ambiguity appears to decide which one µ-processor belongs to − computer science or electronics. It is widely admitted that fuzzy database models provide a mathematical foundation for dealing with vagueness [2][8][9][11][14][18][19][20]. In the past decade, various data types to enhance the semantic expressiveness of fuzzy databases have been proposed [9][11][18][19][20], constraints associated with the data types have been introduced [11][14] and fuzzy relational operators for supporting them have been developed [2][8]. These approaches, however, have a well-known drawback that they fail to support disjunctive fuzzy information due to the inherent limitation of the horn clause on which they are based [12]. On the other hand, the concept of uncertainty has been independently developed in databases. The development has been mainly centered around null values which means unknown values or inapplicable values. To explain the semantic difference between them, suppose that a null value ω appears in the husband attribute of a relation, "female." To a married woman, ω in the attribute implies a null value, whose exact value exists but it is unknown at present, whereas, to a spinster, it means a nonexistent value which cannot be applied to the attribute. As most inapplicable values can be removed if database schemas are appropriately designed [6], it is 3 mainly the unknown values that most research is concerned with. Related to the unknown values, numerous researches have been carried out in two directions over the last two decades. One direction is to propose data types capable of allowing disjunctive information [3][4][6][13][15][16] [17] . It includes the design of query evaluation mechanisms as well as operator sets for effectively manipulating them. For example, in a relation R, the disjunctive information that Marry's lab is one of db and ai may be represented as a tuple t = < marry/NAME, [db, ai]/LAB> and t can be evaluated as an answer of the query "retrieve researchers whose lab is db or ai ," in spite of its incompleteness. The other direction is to reformulate the unknown value problem in terms of logic [5][15][17]. For example, in [17], the previous tuple t may be represented as (∀LAB)(R(marry, LAB) ← NULL(marry, LAB)) ∧ (NULL(marry, LAB)←LAB = db ∨ LAB= ai). Given the query "LAB = db ∨ LAB= ai ," t can be its answer, because both R(marry, db) and R(marry, ai) satisfy it. While these approaches for the unknown values introduced so far provide powerful data types to allow disjunctive information, they cannot in turn handle the other kind of uncertainty, i.e., vagueness. It is therefore necessitated to extend the data types for handling the vagueness as well. Such an extension may drastically enhance the semantic expressiveness of a piece of disjunctive information by encoding vagueness into it. For example, db in the previous disjunctive information "db or ai " may be viewed as a vague concept in that it can be interpreted as a generalization of object-oriented db, relational db and so on. This paper proposes Implicit Predicate (IP) to handle the unknown values which are ambiguous as well as vague in a fuzzy database. The IP construct was already introduced in the previous literature [16][17] as a descriptor associated with an attribute. It describes the subtle semantics 4 of unknown values occurring in the attribute. Another literature [18] widened the application area of the framework by tailoring it to handle fuzzy information and to evolve the information with inference rules. In comparison with [16][17] [18] , this paper differs significantly in that it 1) deals with unknown values encompassing disjunctive fuzzy information, 2) provides a conceptbased fuzzy matching, and 3) makes such a matching more sophisticated by using fuzzy membership functions structured like a thesaurus. In this paper, we first show how IP can provide such facilities and then propose a new query evaluation mechanism to extract exact answers by fully exploiting IPs. This paper proceeds as follows. In section 2, we briefly explain database world assumptions to clarify the semantic interpretation of fuzzy information. Section 3 introduces the IP construct and then defines the structure of our fuzzy database augmented with IPs. In section 4, we develop a query evaluation mechanism to support IPs, followed by conclusions in section 5. Database World Assumptions In general, databases may respond to a query based on the Closed World Assumption (CWA)[5][12], the Open World Assumption (OWA)[5], or the Expanded Closed World Assumption (ECWA)[6]. Under CWA, answers are admitted as a result of failure to find a proof. More specially, if no proof of a fact exists, its negation is assumed true. OWA differs from CWA in that such an assumption is not adopted if the negation is not proved in a given database. The third alternative world assumption, ECWA can explicitly state incomplete information using a disjunction of facts. A piece of information is disjunctive when it is represented as a disjunction of facts. All true facts must either be proved from the part of Note that a user input has no impact on our database when condition 3 is met. In other words, IP t (A, l k ) is not produced since the information already stored is more informative. Definition 12 shows a way to derive the most specific information from more than one source of information. To formally verify it, we provide the following proposition. Proposition 2 For IP t (A, l k ) in Definition 12, IP t (A, l k ) " α IP t (A, l i ) and IP t (A, l k ) " α IP t (A, l i +1). Proof According to Definition 10, " α is reflexive. Next, by Definition 12, if IP t (A, l i ) " α IP t (A, l i +1), then IP t (A, l k ) = IP t (A, l i ) " α IP t (A, l i ) and hence IP t (A, l k ) " α IP t (A, l i +1). The same is true when IP t (A, l i +1) " α IP t (A, l i ). Next, suppose IP t (A, l k ) " α IP t (A, l i ) does not hold when IP t (A, l k ) = IP t (A, l i ) ∧ IP t (A, l i +1). Then by Definition 10, for some c ∈ | IP t (A, l k ) |, IP t (c/A, l i ) = 0, which means IP t (c/A, l i ) = 0 for some c satisfying IP t (c/A, l k ) = 1 according to Definition 9. But it is impossible by Definition 6. On the same fashion, IP t (A, l k ) " α IP t (A, l i +1) can be proved as well. " α IP t (A, l i ) for some 0< α ≤ 1. Proposition 3 Let IP t (A, l i ) and IP t (A, l j ) be IPs for t[A]= ω. Then IP t (A, l j ) " α IP t (A, l i ) with some α > 0 if l j ≥ l i +1 and l j is an odd natural number. Proof Suppose IP t (A, l i ) is evolved as the following form due to a series of user information into ω, satisfying (condition 1) of Definition 12 successively. IP construct. It tries to obtain exact answers from disjunctive fuzzy information described by IPs. We begin by defining a primitive constituting a composite query. Definition 13 A mono variable query factor, or briefly monoquery, Qm(A) is given as Qm(A)= ∨ = s i 1 T i (A) where T i (A), i= 1, ..., s are term predicates. A set of exact answers for a monoquery Qm(A) may be given as { t ∈ r | Qm(c/A) = 1 where t[A] = c ∈ D }. But, for unknown values t[A]=ω, the notion of exact answer needs to be expended. Suppose we get a set Q={ t' ∈ r | Qm(T/A) ≥ α where t'[A] = a fuzzy term T and α, 0<α ≤ 1 is a threshold value}. Obviously, t ∈ Q at a point of time if we replace T in t[A] with ω and let IP t (A)= T(A). But, regardless of the lapse of time, can t remain included in Q ? The answer is "yes" since IP t (A) can only be evolved. This means that it would rather satisfy Qm(A) with higher degree according as time elapses. We can hence guarantee that the tuple t in the set can satisfy Qm(A) having α as a minimum degree of possibility regardless of whatever value t[A] may turn out to be later. The following definition defines the expended exact set, where IP t (A) is given as a disjunction of term predicates. Definition 14 Let the exact answer set for a monoquery, Qm(A) with a threshold value α, 0<α ≤1 be ||Qm(A)|| α* . Then it is defined as follows. 20 Intuitively, this definition tells that t can exactly satisfy Qm(A) only when IP t (A) is more specialized than Qm(A) with higher degree than α. This exact answer set may be further developed by using Zadeh's min-max operator. Lemma 1 Let a threshold value α, 0<α ≤1. Then Proof This lemma can be directly proved by Definition 10 and Definition 14. We provide two examples to illustrate this lemma. Example 8 Let the query be Qm(PROJ) = case(PROJ) and let IP t2 (karl, PROJ, l 3 ) = IP t2 (karl, PROJ, l 1 ) ∧ IP t2 (karl, PROJ, l 2 ). Then because IP t2 (karl, PROJ, l 3 ) " α Qm(PROJ), t 2 ∈ ||Qm(PROJ)|| α* where α= min(case(omt), case(uml)) = min(0.91, 0.93) = 0.91 for {omt, uml} = | IP t2 (karl, PROJ, l 3 ) |. Example 9 Let the query be Q 1 (LAB) = db(LAB) ∨ kb(LAB) and let IP t1 (marry, LAB) = s/w(LAB). Then t 1 ∉ ||Qm(LAB)|| α* for any 0 <α ≤ 1 since IP t1 (LAB) " α Qm(LAB) do not hold. Intuitively, t 1 can't be an exact answer since t 1 [LAB] could later turn out to be se. Recall that once t ∈ ||Qm(A)|| α* , it should always satisfy Qm(A) with higher degree than α regardless of whatever value t[A] may turn out to be later. That may entail that α is guaranteed as a minimum degree regardless of every possible IP evolution. More formally, if t ∈ ||Qm(A)|| α* for IP t (A, l 1 ), it should only be possible that t ∈ ||Qm(A)|| β* , β ≥ α for all IP t (A,
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="">doi:10.1002/int.10059</a> <a target="_blank" rel="external noopener" href="">fatcat:ll7b7dhpszb65e4yxax3uehgjq</a> </span>
<a target="_blank" rel="noopener" href="" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href=""> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> </button> </a>