An intelligent approach to handling imperfect information in concept-based natural language queries

Vesper Owei
2002 ACM Transactions on Information Systems  
Missing information, imprecision, inconsistency, vagueness, uncertainty, and ignorance abound in information systems. Such imperfection is a fact of life in database systems. Although these problems are widely studied in relational database systems, this is not the case in conceptual query systems. And yet, concept-based query languages have been proposed and some are already commercial products. It is therefore imperative to study these problems in concept-based query languages, with a view to
more » ... prescribing formal approaches to dealing with the problems. In this article, we have done just that for a concept-based natural language query system that we developed. A methodology for handling and resolving each type of imperfection is developed. The proposed approaches are automated as much as possible, with the user mainly serving an assistive function. V. Owei nonexperts, it is only to be expected that ill-formulated queries would become a very common occurrence. It is therefore imperative that query languages designed to support a broad range of users are functionally powerful enough to aid users in avoiding or resolving queries having imperfect information. The problem with imperfect queries is that, depending on the nature of the imperfection, they may either not retrieve any data, because they cannot be processed, or retrieve erroneous data without the user realizing it. And where this happens in business applications, it could mean the use of erroneous data for business decisions or the introduction of severe inefficiencies in business processes. In either case, the effect could be disastrous for organizations. In this article, therefore, approaches are developed for handling imperfection in concept-based natural language queries. The approaches are specifically applied to the Conceptual Query Language with Natural Language (CQL/NL) that is presented in Owei [2000]. We examine what imperfection means in the context of concept-based query languages, and then determine and define the types of possible imperfect queries in CQL/NL. We then develop a methodology for handling and resolving each type of imperfection. The proposed approaches are automated wherever and as much as possible, with the user mainly playing an assistive role to the system. Although techniques for dealing with imperfect information in queries and DBs have been primarily studied in the context of relational database systems, as we discuss in the next section on related research, this is not the case with other DB systems, especially concept-based DB systems, which are introduced and defined in Section 3. The claim that imperfect information in conceptbased DB systems is not widely studied is made clearer in Section 4, where we introduce and define the types of query imperfections that are addressed in this article. Section 5 deals with semantically mismatched conceptual queries. Section 6 examines missing information in conceptual queries. Inexplicit conceptual queries are addressed in Section 7. Section 8 deals with ambiguity in concept-based queries. Additional discussion on imperfect information in database systems is given in Section 9. The article concludes in Section 9. LITERATURE REVIEW Early studies of handling imperfect information in information systems deal with trying to define and classify imperfect information. This is perhaps to be expected, since the nature and ramifications of the problem must be understood and clearly delineated before it can be solved. Bonnissone and Tong [1985] argue that imperfect information is of three types, namely, uncertainty, incompleteness, and imprecision. In their taxonomy, incompleteness is seen as arising from the absence of a value and imprecision from the existence of a value that cannot be measured with suitable precision. Uncertainty ensues from constructing a subjective opinion about the truth of a fact, the certainty of which is not known. The studies in Bosc and Prade [1993] and Dubois and Prade [1988] go even further by making a finer distinction that includes vagueness and inconsistency as additional categories. Information is considered to be vague if it is fuzzily imprecise. For example, the predicate "tall"
doi:10.1145/568727.568729 fatcat:u7223ub4bnhtlnym4gtzh565qy