Filters








48 Hits in 0.76 sec

The ρ operator

Kemafor Anyanwu, Amit Sheth
2002 SIGMOD record  
In this paper, we introduce an approach that supports querying for Semantic Associations on the Semantic Web. Semantic Associations capture complex relationships between entities involving sequences of predicates, and sets of predicate sequences that interact in complex ways. Detecting such associations is at the heart of many research and analytical activities that are crucial to applications in national security and business intelligence. This in combination with the improving ability to
more » ... ify entities in documents as part of automatic semantic annotation, gives a very powerful capability for semantic analysis of large amounts of heterogeneous content.
doi:10.1145/637411.637418 fatcat:ji2sk3rdxvfbjjdw7fdbk4kdlq

PrefixSolve

Sidan Gao, Kemafor Anyanwu
2013 Proceedings of the 22nd international conference on World Wide Web - WWW '13  
Uncovering the "nature" of the connections between a set of entities e.g. passengers on a flight and organizations on a watchlist can be viewed as a Multi-Source Multi-Destination (MSMD) Path Query problem on labeled graph data models such as RDF. Using existing graph-navigational path finding techniques to solve MSMD problems will require queries to be decomposed into multiple single-source or destination path subqueries, each of which is solved independently. Navigational techniques on
more » ... sident graphs typically generate very poor I/O access patterns for large, disk-resident graphs and for MSMD path queries, such poor access patterns may be repeated if common graph exploration steps exist across subqueries. In this paper, we propose an optimization technique for general MSMD path queries that generalizes an efficient algebraic approach for solving a variety of single-source path problems. The generalization enables holistic evaluation of MSMD path queries without the need for query decomposition. We present a conceptual framework for sharing computation in the algebraic framework that is based on "suffix equivalence". Suffix equivalence amongst subqueries captures the fact that multiple subqueries with different prefixes can share a suffix and as such share the computation of shared suffixes, which allows prefix path computations to share common suffix path computations. This approach offers orders of magnitude better performance than current existing techniques as demonstrated by a comprehensive experimental evaluation over real and synthetic datasets.
doi:10.1145/2488388.2488426 dblp:conf/www/GaoA13 fatcat:2fhshkfgmjcavpz35yxf7sotx4

Implementing graph grammars for intelligence analysis in OCaml [article]

Rod Moten, Kemafor Anyanwu-Ogan, Sahibi Miranshah
2016 arXiv   pre-print
We report on implementing graph grammars for intelligence analysis in OCaml. Graph grammars are represented as elements of an algebraic data type in OCaml. In addition to algebraic data types, we use other concepts from functional programming languages to implement features of graph grammars. We use type checking to perform graph pattern matching. Graph transformations are defined as implicit coercions derived from structural subtyping proofs, subset types, lambda abstractions, and analytics.
more » ... analytic is a general-purpose OCaml function whose output is required to match a graph pattern described by an element of an algebraic data type. By using a strongly-typed language for representing graphs, we can ensure graphs produced from a graph transformation will match a specific schema. This is a high priority requirement for intelligence analysis.
arXiv:1606.01081v1 fatcat:seakru23z5bx7nq4ekt77ztxci

Ρ-Queries

Kemafor Anyanwu, Amit Sheth
2003 Proceedings of the twelfth international conference on World Wide Web - WWW '03  
This paper presents the notion of Semantic Associations as complex relationships between resource entities. These relationships capture both a connectivity of entities as well as similarity of entities based on a specific notion of similarity called ρ-isomorphism. It formalizes these notions for the RDF data model, by introducing a notion of a Property Sequence as a type. In the context of a graph model such as that for RDF, Semantic Associations amount to specific certain graph signatures.
more » ... ifically, they refer to sequences (i.e. directed paths) here called Property Sequences, between entities, networks of Property Sequences (i.e. undirected paths), or subgraphs of ρ-isomorphic Property Sequences. The ability to query about the existence of such relationships is fundamental to tasks in analytical domains such as national security and business intelligence, where tasks often focus on finding complex yet meaningful and obscured relationships between entities. However, support for such queries is lacking in contemporary query systems, including those for RDF. This paper discusses how querying for Semantic Associations might be enabled on the Semantic Web, through the use of an operator ρ. It also discusses two approaches for processing ρqueries on available persistent RDF stores and memory resident RDF data graphs, thereby building on current RDF query languages.
doi:10.1145/775152.775249 dblp:conf/www/AnyanwuS03 fatcat:nptja2lk2nc6toqlxd75rxqrqq

CoSi

Haizhou Fu, Sidan Gao, Kemafor Anyanwu
2011 Proceedings of the 20th international conference companion on World wide web - WWW '11  
The demo will present CoSi, a system that enables contextsensitive interpretation of keyword queries on RDF databases. The techniques for representing, managing and exploiting query history are central to achieving this objective. The demonstration will show the effectiveness of our approach for capturing a user's querying context from their query history. Further, it will show how context is utilized to influence the interpretation of a new query. The demonstration is based on DBPedia, the RDF representation of Wikipedia.
doi:10.1145/1963192.1963291 dblp:conf/www/FuGA11 fatcat:vhufbwcjozgzvnb4qttepygite

Scalable Ontological Query Processing over Semantically Integrated Life Science Datasets using MapReduce [article]

HyeongSik Kim, Kemafor Anyanwu
2016 arXiv   pre-print
To address the requirement of enabling a comprehensive perspective of life-sciences data, Semantic Web technologies have been adopted for standardized representations of data and linkages between data. This has resulted in data warehouses such as UniProt, Bio2RDF, and Chem2Bio2RDF, that integrate different kinds of biological and chemical data using ontologies. Unfortunately, the ability to process queries over ontologically-integrated collections remains a challenge, particularly when data is
more » ... arge. The reason is that besides the traditional challenges of processing graph-structured data, complete query answering requires inferencing to explicate implicitly represented facts. Since traditional inferencing techniques like forward chaining are difficult to scale up, and need to be repeated each time data is updated, recent focus has been on inferencing that can be supported using database technologies via query rewriting. However, due to the richness of most biomedical ontologies relative to other domain ontologies, the queries resulting from the query rewriting technique are often more complex than existing query optimization techniques can cope with. This is particularly so when using the emerging class of cloud data processing platforms for big data processing due to some additional overhead which they introduce. In this paper, we present an approach for dealing such complex queries on big data using MapReduce, along with an evaluation on existing real-world datasets and benchmark queries.
arXiv:1602.01040v1 fatcat:ouw42v4amfcqddqjuqvwjqgxme

SemRank

Kemafor Anyanwu, Angela Maduko, Amit Sheth
2005 Proceedings of the 14th international conference on World Wide Web - WWW '05  
While the idea that querying mechanisms for complex relationships (otherwise known as Semantic Associations) should be integral to Semantic Web search technologies has recently gained some ground, the issue of how search results will be ranked remains largely unaddressed. Since it is expected that the number of relationships between entities in a knowledge base will be much larger than the number of entities themselves, the likelihood that Semantic Association searches would result in an
more » ... lming number of results for users is increased, therefore elevating the need for appropriate ranking schemes. Furthermore, it is unlikely that ranking schemes for ranking entities (documents, resources, etc.) may be applied to complex structures such as Semantic Associations. In this paper, we present an approach that ranks results based on how predictable a result might be for users. It is based on a relevance model SemRank, which is a rich blend of semantic and information-theoretic techniques with heuristics that supports the novel idea of modulative searches, where users may vary their search modes to effect changes in the ordering of results depending on their need. We also present the infrastructure used in the SSARK system to support the computation of SemRank values for resulting Semantic Associations and their ordering.
doi:10.1145/1060745.1060766 dblp:conf/www/AnyanwuMS05 fatcat:fgqycaxejbc4ddu2akb7hjfycu

SPARQ2L

Kemafor Anyanwu, Angela Maduko, Amit Sheth
2007 Proceedings of the 16th international conference on World Wide Web - WWW '07  
Many applications in analytical domains often have the need to "connect the dots" i.e., query about the structure of data. In bioinformatics for example, it is typical to want to query about interactions between proteins. The aim of such queries is to "extract" relationships between entities i.e. paths from a data graph. Often, such queries will specify certain constraints that qualifying results must satisfy e.g. paths involving a set of mandatory nodes. Unfortunately, most present day
more » ... Web query languages including the current draft of the anticipated recommendation SPARQL, lack the ability to express queries about arbitrary path structures in data. In addition, many systems that support some limited form of path queries rely on main memory graph algorithms limiting their applicability to very large scale graphs. In this paper, we present an approach for supporting Path Extraction queries. Our proposal comprises (i) a query language SPARQ2L which extends SPARQL with path variables and path variable constraint expressions, and (ii) a novel query evaluation framework based on efficient algebraic techniques for solving path problems which allows for path queries to be efficiently evaluated on disk resident RDF graphs. The effectiveness of our proposal is demonstrated by a performance evaluation of our approach on both real world and synthetic datasets.
doi:10.1145/1242572.1242680 dblp:conf/www/AnyanwuMS07 fatcat:jfgbbp4fejb7jbru6fz2q5iaua

Efficiently Evaluating Skyline Queries on RDF Databases [chapter]

Ling Chen, Sidan Gao, Kemafor Anyanwu
2011 Lecture Notes in Computer Science  
Skyline queries are a class of preference queries that compute the pareto-optimal tuples from a set of tuples and are valuable for multi-criteria decision making scenarios. While this problem has received significant attention in the context of single relational table, skyline queries over joins of multiple tables that are typical of storage models for RDF data has received much less attention. A naïve approach such as a join-first-skyline-later strategy splits the join and skyline computation
more » ... hases which limit opportunities for optimization. Other existing techniques for multi-relational skyline queries assume storage and indexing techniques that are not typically used with RDF which would require a preprocessing step for data transformation. In this paper, we present an approach for optimizing skyline queries over RDF data stored using a vertically partitioned schema model. It is based on the concept of a "Header Point" which maintains a concise summary of the already visited regions of the data space. This summary allows some fraction of non-skyline tuples to be pruned from advancing to the skyline processing phase, thus reducing the overall cost of expensive dominance checks required in the skyline phase. We further present more aggressive pruning rules that result in the computation of near-complete skylines in significantly less time than the complete algorithm. A comprehensive performance evaluation of different algorithms is presented using datasets with different types of data distributions generated by a benchmark data generator.
doi:10.1007/978-3-642-21064-8_9 fatcat:q5pwmfufabe3jbvhiugrjavveu

Shared Execution of Clustering Tasks

Padmashree Ravindra, Rajeev Gupta, Kemafor Anyanwu
2015 Knowledge Discovery and Data Mining  
Clustering is a central problem in non-relational data analysis, with k-means being the most popular clustering technique. In various scenarios, it may be necessary to perform clustering over the same input data multiple times -with different values of k, different clustering attributes, or different initial centroids -before arriving at the final solution. In this paper, we propose algorithms for parallel execution of multiple runs of k-means clustering in a way that achieves substantial
more » ... s of IO and processing resources. Proposed algorithms can easily be implemented over Hadoop/MapReduce, Spark, etc., with savings in map and reduce phases. Extensive performance evaluation using real-world datasets show that the proposed algorithms result in up to 40% savings in response times when compared to other optimization techniques proposed in literature as well as open-source implementations. The algorithms scale well with increasing data sizes, values of k, and number of clustering tasks.
dblp:conf/kdd/RavindraGA15 fatcat:g5sua44psbamfhw3p3ubx63hkm

Ρ-Queries

Kemafor Anyanwu, Amit Sheth
2003 Proceedings of the twelfth international conference on World Wide Web - WWW '03  
This paper presents the notion of Semantic Associations as complex relationships between resource entities. These relationships capture both a connectivity of entities as well as similarity of entities based on a specific notion of similarity called ρ-isomorphism. It formalizes these notions for the RDF data model, by introducing a notion of a Property Sequence as a type. In the context of a graph model such as that for RDF, Semantic Associations amount to specific certain graph signatures.
more » ... ifically, they refer to sequences (i.e. directed paths) here called Property Sequences, between entities, networks of Property Sequences (i.e. undirected paths), or subgraphs of ρ-isomorphic Property Sequences. The ability to query about the existence of such relationships is fundamental to tasks in analytical domains such as national security and business intelligence, where tasks often focus on finding complex yet meaningful and obscured relationships between entities. However, support for such queries is lacking in contemporary query systems, including those for RDF. This paper discusses how querying for Semantic Associations might be enabled on the Semantic Web, through the use of an operator ρ. It also discusses two approaches for processing ρqueries on available persistent RDF stores and memory resident RDF data graphs, thereby building on current RDF query languages.
doi:10.1145/775248.775249 fatcat:ock2egsehzbojcnvy5v7bzy24a

Optimizing queries over semantically integrated datasets on MapReduce platforms

HyeongSik Kim, Kemafor Anyanwu
2013 2013 IEEE International Conference on Big Data  
Life science databases generally consist of multiple heterogeneous datasets that have been integrated using complex ontologies. Querying such databases typically involves complex graph patterns, and evaluating such patterns poses challenges when MapReduce-based platforms are used to scale up processing, translating to long execution workflows with large amount of disk and network I/O costs. In this poster, we focus on optimizing UNION queries (e.g., unions of conjunctives for inference) and
more » ... ent an algebraic interpretation of the query rewritings which are more amenable to efficient processing on MapReduce.
doi:10.1109/bigdata.2013.6691788 dblp:conf/bigdataconf/KimA13 fatcat:3ie6dd44jjdqjk4h6hy2xcj4fe

Scheduling Hadoop Jobs to Meet Deadlines

Kamal Kc, Kemafor Anyanwu
2010 2010 IEEE Second International Conference on Cloud Computing Technology and Science  
User constraints such as deadlines are important requirements that are not considered by existing cloud-based data processing environments such as Hadoop. In the current implementation, jobs are scheduled in FIFO order by default with options for other priority based schedulers. In this paper, we extend real time cluster scheduling approach to account for the two-phase computation style of MapReduce. We develop criteria for scheduling jobs based on user specified deadline constraints and
more » ... our implementation and preliminary evaluation of a Deadline Constraint Scheduler for Hadoop which ensures that only jobs whose deadlines can be met are scheduled for execution.
doi:10.1109/cloudcom.2010.97 dblp:conf/cloudcom/KcA10 fatcat:7rxk2ipfvnfy3oyepht6c473cm

RAPID: Enabling Scalable Ad-Hoc Analytics on the Semantic Web [chapter]

Radhika Sridhar, Padmashree Ravindra, Kemafor Anyanwu
2009 Lecture Notes in Computer Science  
As the amount of available RDF data continues to increase steadily, there is growing interest in developing efficient methods for analyzing such data. While recent efforts have focused on developing efficient methods for traditional data processing, analytical processing which typically involves more complex queries has received much less attention. The use of cost effective parallelization techniques such as Google's Map-Reduce offer significant promise for achieving Web scale analytics.
more » ... r, currently available implementations are designed for simple data processing on structured data. In this paper, we present a language, RAPID, for scalable ad-hoc analytical processing of RDF data on Map-Reduce frameworks. It builds on Yahoo's Pig Latin by introducing primitives based on a specialized join operator, the MD-join, for expressing analytical tasks in a manner that is more amenable to parallel processing, as well as primitives for coping with semi-structured nature of RDF data. Experimental evaluation results demonstrate significant performance improvements for analytical processing of RDF data over existing Map-Reduce based techniques.
doi:10.1007/978-3-642-04930-9_45 fatcat:gedb5yzr7rcfhmxsvn7kuyo334

Preserving Buyer-Privacy in Decentralized Supply Chain Marketplaces [article]

Varun Madathil, Alessandra Scafuro, Kemafor Anyanwu, Sen Qiao, Akash Pateria, Binil Starly
2022 IACR Cryptology ePrint Archive  
Technology is being used increasingly for lowering the trust barrier in domains where collaboration and cooperation are necessary, but reliability and efficiency are critical due to high stakes. An example is an industrial marketplace where many suppliers must participate in production while ensuring reliable outcomes; hence, partnerships must be pursued with care. Online marketplaces like Xometry facilitate partnership formation by vetting suppliers and mediating the marketplace. However, such
more » ... an approach requires that all trust be vested in the middleman. This centralizes control, making the system vulnerable to being biased towards specific providers. The use of blockchains is now being explored to bridge the trust gap needed to support decentralizing marketplaces, allowing suppliers and customers to interact more directly by using the information on the blockchain. A typical scenario is the need to preserve privacy in certain interactions initiated by the buyer (e.g., protecting a buyer's intellectual property during outsourcing negotiations). In this work, we initiate the formal study of matching between suppliers and buyers when buyer-privacy is required for some marketplace interactions and make the following contributions. First, we devise a formal security definition for private interactive matching in the Universally Composable (UC) Model that captures the privacy and correctness properties expected in specific supply chain marketplace interactions. Second, we provide a lean protocol based on any programmable blockchain, anonymous group signatures, and public-key encryption. Finally, we implement the protocol by instantiating some of the blockchain logic by extending the BigChainDB blockchain platform.
dblp:journals/iacr/MadathilSAQPS22 fatcat:ahuh4vqo4ndsrewi5cxnbiwo2y
« Previous Showing results 1 — 15 out of 48 results