Filters








560 Hits in 7.9 sec

Fine-Grained Complexity of Analyzing Compressed Data: Quantifying Improvements over Decompress-and-Solve

Amir Abboud, Arturs Backurs, Karl Bringmann, Marvin Kunnemann
2017 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS)  
Suppose we are given a compression of size n of data that originally has size N, and we want to solve a problem with time complexity T(·).  ...  The naive strategy of "decompress-and-solve" gives time T(N), whereas "the gold standard" is time T(n): to analyze the compression as efficiently as if the original data was small.  ...  Inspired by a Dagstuhl seminar on Compressed Pattern Matching in October, and while attending a Dagstuhl seminar on Fine-Grained Complexity in November, Oren asked in the open problems session whether  ... 
doi:10.1109/focs.2017.26 dblp:conf/focs/AbboudBBK17 fatcat:mcpkcysyindwbc3oi5sn5pryvi

Migratory compression: coarse-grained data reordering to improve compressibility

Xing Lin, Guanlin Lu, Fred Douglis, Philip Shilane, Grant Wallace
2014 USENIX Conference on File and Storage Technologies  
In MC, similar data chunks are re-located together, to improve compression factors. After decompression, migrated chunks return to their previous locations.  ...  We propose Migratory Compression (MC), a coarsegrained data transformation, to improve the effectiveness of traditional compressors in modern storage systems.  ...  Acknowledgments We acknowledge Nitin Garg for his initial suggestion of improving data compression by collocating similar content in the Data Domain File System.  ... 
dblp:conf/fast/LinLDSW14 fatcat:ei4p6ji3fbhkxgvruj76ii6ctu

CULZSS-Bit: A Bit-Vector Algorithm for Lossless Data Compression on GPGPUs

Adnan Ozsoy
2014 2014 International Workshop on Data Intensive Scalable Computing Systems  
In this paper, we describe an algorithm to improve dictionary based lossless data compression on GPGPUs.  ...  The implementation of the new compression algorithm on GPUs improves the performance of the compression process compared to the previous attempts.  ...  Arun Chauhan and Dr. Martin Swany for their valuable insights and advice for applying bit-vector approach on lossless data compression.  ... 
doi:10.1109/discs.2014.9 dblp:conf/sc/Ozsoy14 fatcat:q2czmbd6lrgxrhfyif2t5agz2m

Optimal high-level descriptions of dynamical systems [article]

David H. Wolpert, Joshua A. Grochow, Eric Libby, Simon DeDeo
2015 arXiv   pre-print
These include SSC as a measure of the complexity of a dynamical system, and as a way to quantify information flow between the scales of a system.  ...  To analyze high-dimensional systems, many fields in science and engineering rely on high-level descriptions, sometimes called "macrostates," "coarse-grainings," or "effective theories".  ...  G. and E. L. acknowledge the support of Santa Fe Institute Omidyar Fellowships.  ... 
arXiv:1409.7403v2 fatcat:tsilyydsgbeifevfscsfxxikry

Big data and extreme-scale computing

M Asch, T Moore, R Badia, M Beck, P Beckman, T Bidot, F Bodin, F Cappello, A Choudhary, B de Supinski, E Deelman, J Dongarra (+27 others)
2018 The international journal of high performance computing applications  
methods for analyzing and using that data are radically reshaping the landscape of scientific computing.  ...  Over the past four years, the Big Data and Exascale Computing (BDEC) project organized a series of five international workshops that aimed to explore the ways in which the new forms of data-centric discovery  ...  ; Industry Sponsors: Intel, Cray, Data Direct Networks, Fujitsu, Hitachi, Lenovo, and NEC.  ... 
doi:10.1177/1094342018778123 fatcat:vwrrxmad4rhtppq6ioaz4h5q7a

LCQS: an efficient lossless compression tool of quality scores with random access functionality

Jiabing Fu, Bixin Ke, Shoubin Dong
2020 BMC Bioinformatics  
Meanwhile, some compressors attempt to construct a fine-grained index structure to solve the problem of slow random access decompression speed.  ...  Existing lossless compressors of quality scores mainly utilize specific patterns generated by specific sequencer and complex context modeling techniques to solve the problem of low compression ratio.  ...  Acknowledgments We would like to thank the Editor and the Reviewers for their precious comments on this work which helped improve the quality of this paper. We  ... 
doi:10.1186/s12859-020-3428-7 pmid:32183707 fatcat:msy2uzpph5e4jgumzw35ph7fne

Achieving Portable Performance For Wavelet Compression Using Data Parallel Primitives [article]

Shaomeng Li, Nicole Marsaglia, Vincent Chen, Christopher Sewell, John Clyne, Hank Childs
2017 Eurographics Symposium on Parallel Graphics and Visualization  
We consider the problem of wavelet compression in the context of portable performance over multiple architectures.  ...  Because of the data parallel primitives approach, our algorithm is hardware-agnostic and yet can run on many-core architectures.  ...  While these fine-grained tunings are very effective in making the most out of the hardware, they usually require a good amount of GPU programming skills, and the performance gains are not guaranteed to  ... 
doi:10.2312/pgv.20171095 dblp:conf/egpgv/LiMCSCC17 fatcat:uokc2nxn5rhdhhsemrz42vevhu

Challenges of Interactive Multimedia Data Mining in Social and Behavioral Studies for latest Computing &Communication of an Ideal Applications

Mohammed Waseem Ashfaque, Abdul Samad Shaikh, Sumegh Tharewal, Sayyada Sara Banu, Mohammed Ali Sohail
2014 IOSR Journal of Computer Engineering  
a computing challenges of data mining techniques can often only search and extract pre-defined patterns or knowledge from complex heterogeneous data.  ...  And from the raw data ideology will come out the current status of analysis can be helpful and fruitful in further analysis.  ...  New applications are to be researched .The compression and decompression with high compression ratio & lossless compression with less storage space. .The query for audio & video entries in database should  ... 
doi:10.9790/0661-16672131 fatcat:fywxnwe3wfgtdmof6y5xgfq2gq

Effective Data Versioning for Collaborative Data Analytics

Silu Huang
2020 Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data  
With the massive proliferation of datasets in a variety of sectors, data science teams in these sectors spend vast amounts of time collaboratively constructing, curating, and analyzing these datasets.  ...  Versions of datasets are routinely generated during this data science process, via various data processing operations like data transformation and cleaning, feature engineering and normalization, among  ...  However, cell-level instructions can be very verbose due to the fine-grained nature of cells.  ... 
doi:10.1145/3318464.3394027 dblp:conf/sigmod/Huang20 fatcat:cmqanfq5fvdbrjwdlo6dyq7uzy

The Unified Logging Infrastructure for Data Analytics at Twitter [article]

George Lee, Jimmy Lin, Chuang Liu, Andrew Lorek, Dmitriy Ryaboy
2012 arXiv   pre-print
The development of this infrastructure has streamlined log collection and data analysis, thereby improving our ability to rapidly experiment and iterate on various aspects of the service.  ...  A less-explored topic is how those data, dominated by application logs, are collected and structured to begin with.  ...  A custom Pig loader abstracts over details of the physical layout of session sequences, transparently parsing each field in the tuple and handling decompression.  ... 
arXiv:1208.4171v1 fatcat:cdae6wrn5vfudhwcccgmrsarpa

The unified logging infrastructure for data analytics at Twitter

George Lee, Jimmy Lin, Chuang Liu, Andrew Lorek, Dmitriy Ryaboy
2012 Proceedings of the VLDB Endowment  
The development of this infrastructure has streamlined log collection and data analysis, thereby improving our ability to rapidly experiment and iterate on various aspects of the service.  ...  A lessexplored topic is how those data, dominated by application logs, are collected and structured to begin with.  ...  A custom Pig loader abstracts over details of the physical layout of session sequences, transparently parsing each field in the tuple and handling decompression.  ... 
doi:10.14778/2367502.2367516 fatcat:xbru7xjusfdpbd4wwzpvtfpg4a

Column-oriented storage techniques for MapReduce

Avrilia Floratou, Jignesh M. Patel, Eugene J. Shekita, Sandeep Tata
2011 Proceedings of the VLDB Endowment  
Many of the problems they encounter can be overcome by applying techniques learned from over three decades of research on parallel DBMSs.  ...  We also show that dealing with complex column types such as arrays, maps, and nested records, which are common in MapReduce jobs, can incur significant CPU overhead.  ...  ACKNOWLEDGEMENTS We would like to thank the reviewers of this paper for their constructive comments. This research was supported in part by the National Science Foundation under grant IIS-0963993.  ... 
doi:10.14778/1988776.1988778 fatcat:rp5avslwtzctlapk3jyyhl232q

An Introduction to Sensor Data Analytics [chapter]

Charu C. Aggarwal
2012 Managing and Mining Sensor Data  
transmit their data over time.  ...  This index can be used for improving the performance of query processing.  ...  The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Laboratory  ... 
doi:10.1007/978-1-4614-6309-2_1 fatcat:pfbx566yfzgqpnjcuzonmxr23q

Data Service Outsourcing and Privacy Protection in Mobile Internet [chapter]

Zhen Qin, Erqiang Zhou, Yi Ding, Yang Zhao, Fuhu Deng, Hu Xiong
2018 Data Service Outsourcing and Privacy Protection in Mobile Internet  
This monograph focuses on key technologies of data service outsourcing and privacy protection in mobile Internet, including the existing methods of data analysis and processing, the fine-grained data access  ...  Preface The data of mobile Internet have the characteristics of large scale, variety of patterns, complex association and so on.  ...  However, the work of achieving fine-grained access control of data in the MI and promoting the sharing of data in the MI is still worthy to investigate.  ... 
doi:10.5772/intechopen.79903 fatcat:kvdisoudirgdhd7tvscnhsb6gm

Binary RDF for scalable publishing, exchanging and consumption in the web of data

Javier D. Fernández
2012 Proceedings of the 21st international conference companion on World Wide Web - WWW '12 Companion  
In the first case, the complexities of compression and decompression denote the behavior of a technique.  ...  It has to deal with the complexities of scientific data creation or capture, sharing these data with other scientists, and finally processing and analyzing such data.  ...  That is, it retrieves the data of all the compressed dictionary partitions in D comp , and loads them in the appropriated succinct data structures.  ... 
doi:10.1145/2187980.2187997 dblp:conf/www/Fernandez12 fatcat:j2rclthkdnanfbbnwym4smlno4
« Previous Showing results 1 — 15 out of 560 results