Filters








5,245 Hits in 6.3 sec

Leveraging a scalable row store to build a distributed text index

Ning Li, Jun Rao, Eugene Shekita, Sandeep Tata
2009 Proceeding of the first international workshop on Cloud data management - CloudDB '09  
Many content-oriented applications require a scalable text index. Building such an index is challenging.  ...  We developed a distributed text index called HIndex, by judiciously exploiting the control layer of HBase, which is an open source implementation of Google's Bigtable.  ...  Conclusion In this paper, we described our experience of building a distributed text index by leveraging the scalable control layer in a row store.  ... 
doi:10.1145/1651263.1651270 dblp:conf/cikm/LiRST09 fatcat:leu2egstnvh6hhcrrphamwloxu

Experimenting lucene index on HBase in an HPC environment

Xiaoming Gao, Vaibhav Nachankar, Judy Qiu
2011 Proceedings of the first annual workshop on High performance computing meets databases - HPCDB '11  
Leveraging the distributed architecture of HBase, we expect to get high performance and availability, and excellent scalability and flexibility for our searching system.  ...  To achieve efficient search on text data, this paper proposes a searching framework based on Lucene full-text indices implemented as HBase tables.  ...  This paper proposes a distributed Lucene index solution to support interactive real-time search for text data stored in HBase.  ... 
doi:10.1145/2125636.2125646 fatcat:5e6y6rjttjcedhlvrwd6w4a4g4

Infrastructure for supporting exploration and discovery in web archives

Jimmy Lin, Milad Gholami, Jinfeng Rao
2014 Proceedings of the 23rd International Conference on World Wide Web - WWW '14 Companion  
These tools need to be scalable and responsive, and to this end we believe that modern "big data" infrastructure can provide a solid foundation.  ...  Our system provides a flexible data model for storing and managing raw content as well as metadata and extracted knowledge.  ...  Rows are lexicographically sorted, and thus an important element in HBase schema design is to leverage this property for the application's benefit.  ... 
doi:10.1145/2567948.2579045 dblp:conf/www/LinGR14 fatcat:4fdqjxjc7vbtlkxv7cpxi22unu

AnalyticDB

Chaoqun Zhan, Fang Zheng, Chengliang Chai, Maomeng Su, Chuangxian Wei, Xiaoqiang Peng, Liang Lin, Sheng Wang, Zhe Chen, Feifei Li, Yue Pan
2019 Proceedings of the VLDB Endowment  
To further reduce query latency, novel storage-aware SQL optimizer and execution engine are developed to fully utilize the advantages of the underlying storage and indexes.  ...  Moreover, these systems are expected to provide high query concurrency and write throughput, and support queries over structured and complex data types (e.g., JSON, vector and texts).  ...  We also take this opportunity to thank Yineng Chen, Xiaolong Xie, Congnan Luo, Jiye Tu, Wenjun Dai, Xiang Zhou, Shaojin Wen, Wenbo Ma, Jiannan Ji, Yu Dong, Jin Hu, Caihua Yin, Yujun Liao, Zhe Li, Ruonan  ... 
doi:10.14778/3352063.3352124 fatcat:u2oa2bbhqbgbfh5iqe5upraf4u

Rainbow: A distributed and hierarchical RDF triple store with dynamic scalability

Rong Gu, Wei Hu, Yihua Huang
2014 2014 IEEE International Conference on Big Data (Big Data)  
Further, to better support the hybrid indexing scheme, Rainbow adopts a distributed and hierarchical storage architecture that uses HBase as the scalable persistent storage and combines a distributed memory  ...  In this paper, we propose Rainbow, a scalable and efficient RDF triple store.  ...  Besides adopting a distributed storage for data persistence, we also build a distributed memory storage to store frequently-used RDF data indices for fast random access. • Rainbow is dynamically scalable  ... 
doi:10.1109/bigdata.2014.7004274 dblp:conf/bigdataconf/GuHH14 fatcat:mxi5hq657vfrpn6gbyr6hsunyi

Data management projects at Google

Michael Cafarella, Edward Chang, Andrew Fikes, Alon Halevy, Wilson Hsieh, Alberto Lerner, Jayant Madhavan, S. Muthukrishnan
2008 SIGMOD record  
Muthukrishnan As described above, Bigtable is a high-performance, distributed, row-storage system that is highly scalable, but it is not meant to provide relational query processing or sophisticated indexing  ...  Many of our clients want to store indexed data in Bigtable. Currently, they have to manage the indices themselves. We are in the process of building support for indices directly into Bigtable.  ... 
doi:10.1145/1374780.1374789 fatcat:r7n3mrbxcvdancb37chtmf3c5i

Data management projects at Google

Wilson Hsieh, Jayant Madhavan, Rob Pike
2006 Proceedings of the 2006 ACM SIGMOD international conference on Management of data - SIGMOD '06  
Muthukrishnan As described above, Bigtable is a high-performance, distributed, row-storage system that is highly scalable, but it is not meant to provide relational query processing or sophisticated indexing  ...  Many of our clients want to store indexed data in Bigtable. Currently, they have to manage the indices themselves. We are in the process of building support for indices directly into Bigtable.  ... 
doi:10.1145/1142473.1142566 dblp:conf/sigmod/HsiehMP06 fatcat:z4yi7ecmpja4liymavn24fi2k4

SAP HANA-Database: Inter Organisation Cooperations with SAP Systems Perspectives on Data Management for Business Applications

Divya M., Gayathri M., Sangeetha K., Anguraj S.
2018 Bonfring International Journal of Networking Technologies and Applications  
relational data supporting both row and column-oriented physical representations in a hybrid engine, to graph and text processing for semi and unstructured data management within the same system.  ...  On the technical side, the SAP HANA database consists of multiple data processing engines with a distributed query processing environment to provide the full spectrum of data processing-from classical  ...  ACKNOWLEDGMENT We would like to express our sincere thanks to all of our NewDB colleagues for making the HANA story a reality.  ... 
doi:10.9756/bijnta.8379 fatcat:gexhgkrktbbtjl4ywuncnhstpi

Scalable Visual Analytics of Massive Textual Datasets

M. Krishnan, S. Bohn, W. Cowley, V. Crow, J. Nieplocha
2007 2007 IEEE International Parallel and Distributed Processing Symposium  
This paper describes the first scalable implementation of a text processing engine used in visual analytics tools.  ...  By developing a parallel implementation of the text processing engine, we enabled visual analytics tools to exploit cluster architectures and handle massive datasets.  ...  Scan the source documents to identify individual records and fields, compile a list of terms (i.e., a vocabulary), and build an index of terms per field (the "field-to-term index").  ... 
doi:10.1109/ipdps.2007.370232 dblp:conf/ipps/KrishnanBCCN07 fatcat:tkcnjcravza5fm4r2awbn3wcqi

GPU-Based PostgreSQL Extensions for Scalable High-Throughput Pattern Matching

Grant Scott, Matthew England, Kevin Melkowski, Zachary Fields, Derek T. Anderson
2014 2014 22nd International Conference on Pattern Recognition  
By pipelining pattern matching results into a relational expression, the power of the database can be leveraged to build result sets based on various parameterized correlations between the query pattern  ...  , spatial, or text search.  ...  In a similar fashion, other existing extensions could be leveraged such as text search capabilities or content based retrieval techniques.  ... 
doi:10.1109/icpr.2014.329 dblp:conf/icpr/ScottEMFA14 fatcat:ku2jp5knznd2rhdurhof2ft2gu

Efficient Updates for Web-Scale Indexes over the Cloud

Panagiotis Antonopoulos, Ioannis Konstantinou, Dimitrios Tsoumakos, Nectarios Koziris
2012 2012 IEEE 28th International Conference on Data Engineering Workshops  
To the best of our knowledge, this is the first open source system that creates, updates and serves large-scale indexes in a distributed fashion.  ...  In this paper, we present a distributed system which enables fast and frequent updates on web-scale Inverted Indexes.  ...  In order to speed up the compute and storage-intensive update process, we leverage the capabilities of MapReduce in combination with the horizontal scalability and loose-schema features of a distributed  ... 
doi:10.1109/icdew.2012.51 dblp:conf/icde/AntonopoulosKTK12 fatcat:wxp47xowwre7rbi2tphfeqykzi

SAP HANA database

Franz Färber, Sang Kyun Cha, Jürgen Primsch, Christof Bornhövd, Stefan Sigg, Wolfgang Lehner
2012 SIGMOD record  
relational data supporting both row-and column-oriented physical representations in a hybrid engine, to graph and text processing for semi-and unstructured data management within the same system.  ...  On the technical side, the SAP HANA database consists of multiple data processing engines with a distributed query processing environment to provide the full spectrum of data processing -from classical  ...  A system administrator specifies at definition time whether a new table is to be stored in a row-or in a column-oriented format.  ... 
doi:10.1145/2094114.2094126 fatcat:gdefn44ltfa7fl6bsfqimm4q5m

An Efficient and Scalable RDF Indexing Strategy based on B-Hashed-Bitmap Algorithm using CUDA

Sharmi Sankar, Munesh Singh, Awny Sayed, Jihad Alkhalaf Bani-Younis
2014 International Journal of Computer Applications  
Parallel implementation of indices provides a suitable option for dealing with scalable and dynamically generated data over distributed networks.  ...  In this paper, a new efficient and scalable index is proposed that uses a combination of B+ trees, hashing and sparse matrices.  ...  To take advantage of this property, Virtuoso builds bitmap indexes for each ops prefix by default, storing the various subjects.  ... 
doi:10.5120/18216-9221 fatcat:qtjlxp7dgje3nfdfk5vie47iqu

CLUO: Web-Scale Text Mining System for Open Source Intelligence Purposes

Przemyslaw Maciolek, Grzegorz Dobrowolski
2013 Computer Science  
This is especially true in case of complex algorithms, often used in text mining tasks.  ...  The amount of textual information published on the Internet is considered to be in billions of web pages, blog posts, comments, social media updates and others.  ...  Summary In this paper a practical approach to building a text mining solution for Open Source Intelligence purposes has been described.  ... 
doi:10.7494/csci.2013.14.1.45 fatcat:2bbcsvgujzb7jcwim7qusiejei

MLI: An API for Distributed Machine Learning [article]

Evan R. Sparks, Ameet Talwalkar, Virginia Smith, Jey Kottalam, Xinghao Pan, Joseph Gonzalez, Michael J. Franklin, Michael I. Jordan, Tim Kraska
2013 arXiv   pre-print
MLI is an Application Programming Interface designed to address the challenges of building Machine Learn- ing algorithms in a distributed setting based on data-centric computing.  ...  Our initial results show that, relative to existing systems, this interface can be used to build distributed implementations of a wide variety of common Machine Learning algorithms with minimal complexity  ...  CONCLUSION We have presented MLI, an API for building scalable distributed machine learning algorithms.  ... 
arXiv:1310.5426v2 fatcat:tzjolo6bubbfvkra5iddgdg4pu
« Previous Showing results 1 — 15 out of 5,245 results