Filters








29 Hits in 1.4 sec

4.-8. März 2019

Xiao Chen, Gabriel Campero Durand, Roman Zoun, David Broneske, Yang Li, Gunter Saake
2019 Datenbanksysteme für Business, Technologie und Web  
Recently word embedding has become a beneficial technique for diverse natural language processing tasks, especially after the successful introduction of several popular neural word embedding models, such as word2vec, GloVe, and FastText. Also entity resolution, i.e., the task of identifying digital records that refer to the same real-world entity, has been shown to benefit from word embedding. However, the use of word embeddings does not lead to a one-size-fits-all solution, because it cannot
more » ... ovide an accurate result for those values without any semantic meaning, such as numerical values. In this paper, we propose to use the combination of general word embedding with traditional hand-picked similarity measures for solving ER tasks, which aims to select the most suitable similarity measure for each attribute based on its property. We provide some guidelines on how to choose suitable similarity measures for different types of attributes and evaluate our proposed hybrid method on both synthetic and real datasets. Experiments show that a hybrid method reliant on correctly selecting required similarity measures can outperform the method of purely adopting traditional or word-embedding-based similarity measures.
doi:10.18420/btw2019-14 dblp:conf/btw/ChenDZBLS19 fatcat:pialvcpapneeng7dhhefcbzk7e

GridTables: A One-Size-Fits-Most H2TAP Data Store

Marcus Pinnecke, Gabriel Campero Durand, David Broneske, Roman Zoun, Gunter Saake
2020 Datenbank-Spektrum  
Heterogeneous Hybrid Transactional Analytical Processing (H 2 TAP) database systems have been developed to match the requirements for low latency analysis of real-time operational data. Due to technical challenges, these systems are hard to architect, non-trivial to engineer, and complex to administrate. Current research has proposed excellent solutions to many of those challenges in isolation -a unified engine enabling to optimize performance by combining these solutions is still missing. In
more » ... is concept paper, we suggest a highly flexible and adaptive data structure (called GRIDTABLE) to physically organize sparse but structured records in the context of H 2 TAP. For this, we focus on the design of an efficient highly-flexible storage layout that is built from scratch for mixed query workloads. The key challenges we address are: (1) partial storage in different memory locations, and (2) the ability to optimize for mixed OLTP-/OLAP access patterns. To guarantee safe and well-specified data definition or manipulation, as well as fast querying with no compromises on performance, we propose two dedicated access paths to the storage. In this paper, we explore the architecture and internals of GRIDTABLES showing design goals, concepts and trade-offs. We close this paper with open research questions and challenges that must be addressed in order to take advantage of the flexibility of our solution.
doi:10.1007/s13222-019-00330-x fatcat:wiv4u3zbsrfrfezjfp6c77mdcy

Design Considerations Towards AI-Driven Co-Processor Accelerated Database Management

Anh Trang Le, Bala Gurumurthy, Christoph Steup, Gabriel Campero Durand, David Broneske, Gunter Saake
2021 Workshop Grundlagen von Datenbanken  
Adopting AI techniques for query optimization is an ongoing research interest in the database community. Currently, the search space for the best plan increases drastically, with the growing heterogeneity of the target hardware, the novel tuning choices offered, and co-processing. Hence, the need for AI techniques to identify such a best plan in a reasonable time-frame is imminent. Though AI-based solutions for improving query processing exist, there is still a need for principled system
more » ... able to incorporate the different innovations, leverage synergy effects, and keep with production-readiness expectations when using AI. In this paper, we propose a series of seven ideal design characteristics we envision for such systems. We then make the case for revisiting the traditional Mariposa system, to consider its market concepts as a useful starting point for new system designs to support the identified characteristics. Altogether, we expect that this short paper could be a modest contribution towards AI-driven heterogeneous processing, emphasizing the practical aspects of a supportive and principled overall design.
dblp:conf/gvd/LeGSDBS21 fatcat:7kt3yuif4jeajohxl3tybaplkm

Low-Latency Transaction Execution on Graphics Processors: Dream or Reality?

Iya Arefyeva, Gabriel Campero Durand, Marcus Pinnecke, David Broneske, Gunter Saake
2018 Very Large Data Bases Conference  
In this paper we take a close look into the role of GPUs for executing OLTP workloads, with a focus on CRUD operatorbased processing, as opposed to more complex OLTP transactions. To this end we develop a prototype system supporting GPU and CPU variants of DSM and NSM processing, with a delegation-based approach that uses a singlethread scheduler to manage concurrency control, enabling reads with guaranteed bounded staleness. We evaluate our prototype using workloads from the Yahoo! cloud
more » ... g benchmark. We report the impact of layout choices, batching configuration and concurrency control designs. Through our study we are able to pinpoint that the contradicting needs in GPU processing for small batches to reduce waiting time, but large batches to reduce execution time, is the essential challenge for OLTP on these processors, affecting all design choices we study. Hence, we propose two preconditions for supporting OLTP with GPUs, aiming to guide researchers in finding scenarios for extending the applicability of GPUs in supporting data management tasks.
dblp:conf/vldb/ArefyevaDPBS18 fatcat:337shspta5aarpxyg2lythp5sy

Piecing Together Large Puzzles, Efficiently: Towards Scalable Loading Into Graph Database Systems

Gabriel Campero Durand, Jingy Ma, Marcus Pinnecke, Gunter Saake
2018 Workshop Grundlagen von Datenbanken  
Many applications rely on network analysis to extract business intelligence from large datasets, requiring specialized graph tools such as processing frameworks (e.g. Apache Giraph, Gradoop), database systems (e.g. Neo4j, JanusGraph) or applications/libraries (e.g. NetworkX, nvGraph). A recent survey reports scalability, particularly for loading, as the foremost practical challenge faced by users. In this paper we consider the design space of tools for efficient and scalable graph bulk loading.
more » ... For this we implement a prototypical loader for a property graph DBMS, using a distributed message bus. With our implementation we evaluate the impact and limits of basic optimizations. Our results confirm the expectation that bulk loading can be best supported as a server-side process. We also find, for our specific case, gains from batching writes (up to 64x speedups in our evaluation), uniform behavior across partitioning strategies, and the need for careful tuning to find the optimal configuration of batching and partitioning. In future work we aim to study loading into alternative physical storages with GeckoDB, an HTAP database system developed in our group.
dblp:conf/gvd/DurandMPS18 fatcat:joffwprfozf4bdpiqbwzuh2hvi

Performance Comparison of Three Spark-Based Implementations of Parallel Entity Resolution [chapter]

Xiao Chen, Kirity Rapuru, Gabriel Campero Durand, Eike Schallehn, Gunter Saake
2018 Communications in Computer and Information Science  
During the last decade, several big data processing frameworks have emerged enabling users to analyze large scale data with ease. With the help of those frameworks, people are easier to manage distributed programming, failures and data partitioning issues. Entity Resolution is a typical application that requires big data processing frameworks, since its time complexity increases quadratically with the input data. In recent years Apache Spark has become popular as a big data framework providing
more » ... flexible programming model that supports in-memory computation. Spark offers three APIs: RDDs, which gives users core low-level data access, and high-level APIs like DataFrame and Dataset, which are part of the Spark SQL library and undergo a process of query optimization. Stemming from their different features, the choice of API can be expected to have an influence on the resulting performance of applications. However, few studies offer experimental measures to characterize the effect of such distinctions. In this paper we evaluate the performance impact of such choices for the specific application of parallel entity resolution under two different scenarios, with the goal to offer practical guidelines for developers.
doi:10.1007/978-3-319-99133-7_6 fatcat:iqgzbggmzzcnjbp5l6lbylmcbq

Backlogs and Interval Timestamps: Building Blocks for Supporting Temporal Queries in Graph Databases

Gabriel Campero Durand, Marcus Pinnecke, David Broneske, Gunter Saake
2017 International Conference on Extending Database Technology  
The analysis of networks, either at a single point in time or through their evolution, is an increasingly important task in modern data management. Graph databases are uniquely suited to improve static network analysis. However, there's still no consensus on how to best model data evolution with these databases. In our work we propose an elementary concept to support temporal analysis with property graph databases, using a single-graph model limited to structural changes. We manage the temporal
more » ... aspects of items with interval timestamps and backlogs. To include backlogs in the model we examine two alternatives: (1) global indexes, and (2) using the graph as an index by resorting to timestamp denormalization. We evaluate density calculation and time slice retrieval over successive days from a SNAP dataset, on an Apache Titan prototype of our model, observing from 2x to 100x response time gains by comparing differential vs. snapshot methods; and no conclusive difference between the backlog alternatives.
dblp:conf/edbt/DurandPBS17 fatcat:45jhvsbkojhk3njluuxrmuqzde

Spread the Good Around! Information Propagation in Schema Matching and Entity Resolution for Heterogeneous Data

Gabriel Campero Durand, Anshu Daur, Vinayak Kumar, Shivalika Suman, Altaf Mohammed Aftab, Sajad Karim, Prafulla Diwesh, Chinmaya Hegde, Disha Setlur, Syed Md Ismail, David Broneske, Gunter Saake
2020 Very Large Data Bases Conference  
Durand, et al.  ...  2 3 4 # of models 2,594 2,594 4,681 4,477 # of items with models assigned 7,226 12,103 15,112 15,722 # of models with more than 10 77 303 313 319 items DI2KG 2020, August 31, 2020, Tokyo, Japan Campero  ... 
dblp:conf/vldb/DurandDKSAKDHSI20 fatcat:mgonzu3cdfgblf6vpqq4cx64km

An evaluation of deep hashing for high-dimensional similarity search on embedded data [article]

Rutuja Shivraj Pawar, Universitäts- Und Landesbibliothek Sachsen-Anhalt, Martin-Luther Universität, Gunter Saake, Gabriel Campero Durand
2019
In today's era, the rate at which data is accumulating is exponential, which makes it increas-ingly challenging to retrieve relevant information. In such a scenario, high-dimensional similarity search serves as a popular method to extract relevant information from large data volumes or Big Data, and it further drives different Machine Learning (ML) tasks including, Near Duplicate Detection & Location Recognition. However, Big Data, due to its charac-teristics, poses a variety of challenges to
more » ... applications, such as high class imbalance, the need for feature engineering to support heterogeneous data and the need for efficient solutions for queries over array data. Consequently, in this thesis, we aim to optimize the data analytics pipeline for the utilization and effective management of feature engineering data (embedding data), offering as one of the solutions in the context of high-dimensional similarity search. In doing so, we evaluate the impact of similarity-preserving hashing on helping with data blocking and skipping for ML applications of supervised entity resolution and top-k similarity search. Precisely, we make the following contributions: First, we utilize and work with embedding data, as an approach to highlight semantic similarity in the data, thus making it more manageable. In doing so, we experiment with three dataset pairs from two different domains, Bibliographic and E-commerce, with their attributes embedded using a fastText pre-trained model. Further, based on its fast query speed and low memory costs, we consider similarity-preserving hashing as the technique to manage these embedding data and efficiently support high-dimensional similarity search. Specifically, we consider two hashing techniques, Locality Sensitive Hashing (LSH) being data-independent, and Learning To Hash (L2H) being data-dependent. Second, based on well-defined metrics, we experimentally evaluate the efficiency and classi-fication accuracy of LSH - Super-Bit, with a focus on the task of supervised entity resolution. T [...]
doi:10.25673/31719 fatcat:76okmnvxnrgyliqq3vbky3e5zq

GridTables : a One-Size-Fits-Most H2TAP data store : vision and concept

Marcus Pinnecke, Gabriel Campero Durand, David Broneske, Roman Zoun, Gunter Saake, Universitäts- Und Landesbibliothek Sachsen-Anhalt, Martin-Luther Universität
2022
Heterogeneous Hybrid Transactional Analytical Processing (H2TAP) database systems have been developed to match the requirements for low latency analysis of real-time operational data. Due to technical challenges, these systems are hard to architect, non-trivial to engineer, and complex to administrate. Current research has proposed excellent solutions to many of those challenges in isolation – a unified engine enabling to optimize performance by combining these solutions is still missing. In
more » ... s concept paper, we suggest a highly flexible and adaptive data structure (called GRIDTABLE) to physically organize sparse but structured records in the context of H2TAP. For this, we focus on the design of an efficient highly-flexible storage layout that is built from scratch for mixed query workloads. The key challenges we address are: (1) partial storage in different memory locations, and (2) the ability to optimize for mixed OLTP-/OLAP access patterns. To guarantee safe and well-specified data definition or manipulation, as well as fast querying with no compromises on performance, we propose two dedicated access paths to the storage. In this paper, we explore the architecture and internals of GRIDTABLES showing design goals, concepts and trade-offs. We close this paper with open research questions and challenges that must be addressed in order to take advantage of the flexibility of our solution.
doi:10.25673/71707 fatcat:4kw5secjrfcvperj3mzfzyc7ne

Table of Contents

2021 2021 IEEE 37th International Conference on Data Engineering Workshops (ICDEW)  
Campero Durand (Otto von Guericke University, Magdeburg), and Gunter Saake (Otto von Guericke University, Magdeburg) 48 Bala Gurumurthy (University of Magdeburg), David Broneske (University of Magdeburg  ...  36 Harish Kumar Harihara Subramanian (Otto von Guericke University, Magdeburg), Bala Gurumurthy (Otto von Guericke University, Magdeburg), David Broneske (Otto von Guericke University, Magdeburg), Gabriel  ... 
doi:10.1109/icdew53142.2021.00004 fatcat:dest3ye37jhrjmj2stay3cnquq

Olga Acosta, John Naranjo y Adalberto Camperos. Delirantes

Muriel Laurent
2022 Anuario Colombiano de Historia Social y de la Cultura  
artística: el producto fue ideado por la historiadora del arte Olga Acosta, profesora de la Universidad de los Andes, en compañía del editor John Naranjo, especializado en cómics, y del ilustrador Adalberto Camperos  ...  La madre de Gutiérrez, la esposa de Cuervo y Madame Durand son personajes esenciales para el desenvolvimiento de los protagonistas.  ...  En los agradecimientos figuran los nombres de varios artistas del siglo xix, como Ramón Torres Méndez, José Gabriel Tatis, Henry Price, Carmelo Fernández, Manuel María Paz, François-Désiré Roulin y José  ... 
doaj:e0c85c7ec0d54a449da9d2c0b4a7a937 fatcat:2dmk2wjhz5edzmifapvqxiaqya

4.-8. März 2019

Roman Zoun, Kay Schallert, David Broneske, Wolfram Fenske, Marcus Pinnecke, Robert Heyer, Sven Brehmer, Dirk Benndorf, Gunter Saake
2019 Datenbanksysteme für Business, Technologie und Web  
Acknowledgments The authors sincerely thank Xiao Chen, Gabriel Campero Durand, Sebastian Krieter and Andreas Meister for their support and advice.  ... 
doi:10.18420/btw2019-33 dblp:conf/btw/ZounSBFPHBBS19 fatcat:qaz37njzinedboimaeivww3hfi

An Investigation of Alternatives to Transform Protein Sequence Databases to a Columnar Index Schema

Roman Zoun, Kay Schallert, David Broneske, Ivayla Trifonova, Xiao Chen, Robert Heyer, Dirk Benndorf, Gunter Saake
2021 Algorithms  
Acknowledgments: The authors sincerely thank Niya Zoun, Gabriel Campero Durand, Marcus Pinnecke, Sebastian Krieter, Sven Helmer, Sven Brehmer, and Andreas Meister for their support and advice.  ... 
doi:10.3390/a14020059 fatcat:xopdgbyrizerrm3ymza7zq43xu

Elite económica y elite política bajo la presidencia de Mauricio Macri: el caso de Ministerio de Producción (2015-2019)

Ana Castellani, Marina Dossi
2021 Estudios sociales del estado  
Esteban Guillermo CAMPERO Lic. Carolina CASTRO Artículos Secretaría de la Transformación Productiva Lic.  ...  ETCHEGOYEN Subsecretario de Servicios Tecnológicos y Productivos Sr, Carlos Gabriel PALLOTTI 81 13 Canelo y Castellani, 2017.Fuente: observatorio de las Elites del CITRA en base a Boletín Oficial de la  ... 
doi:10.35305/ese.v7i14.270 fatcat:u6tayjeg65hpdommgr2ynbyvgi
« Previous Showing results 1 — 15 out of 29 results