Filters








2,550 Hits in 3.1 sec

Boosting textual compression in optimal linear time

Paolo Ferragina, Raffaele Giancarlo, Giovanni Manzini, Marinella Sciortino
2005 Journal of the ACM  
of time; and (c) it admits a decompression algorithm again optimal in time.  ...  We provide a general boosting technique for Textual Data Compression.  ...  The authors are deeply indebted to the referees, and in particular one of them, for a very careful reading of the article that lead to very useful, punctual and constructive comments.  ... 
doi:10.1145/1082036.1082043 fatcat:ljhmm5ehanc4lnaoasuaykbasu

What, where, and when

Sergey Nepomnyachiy, Bluma Gelley, Wei Jiang, Tehila Minkus
2014 Proceedings of the 8th Workshop on Geographic Information Retrieval - GIR '14  
It exploits the structure of time-stamped data to dramatically shrink the temporal search space and uses a shallow tree based on the spatial distribution of tweets to allow speedy search over the spatial  ...  With the adoption of timestamps and geotags on Web data, search engines are increasingly being asked questions of "where" and "when" in addition to the classic "what."  ...  Acknowledgements This work was supported in part by the NSF (under grants 0966187 and 0904246) and by GAANN Grant P200A090157 from the US Department of Education.  ... 
doi:10.1145/2675354.2675358 dblp:conf/gir/NepomnyachiyGJM14 fatcat:e7mrj576uvdennsmxajmjq6wh4

On Optimally Partitioning a Text to Improve Its Compression [chapter]

Paolo Ferragina, Igor Nitto, Rossano Venturini
2009 Lecture Notes in Computer Science  
In this paper we investigate the problem of partitioning an input string T in such a way that compressing individually its parts via a base-compressor C gets a compressed output that is shorter than applying  ...  ACM 50(6):825-851, 2003) in the context of table compression, and then further elaborated and extended to strings and trees by Ferragina et al. (J.  ...  showing that our algorithmic solution to the text partitioning problem could be used as a tool for approximating efficiently the interesting class of Dynamic-Programming Recurrences we have dealt with in  ... 
doi:10.1007/978-3-642-04128-0_38 fatcat:nsl7gu5y6fh4boojrtiypnnddu

On Optimally Partitioning a Text to Improve Its Compression

Paolo Ferragina, Igor Nitto, Rossano Venturini
2010 Algorithmica  
In this paper we investigate the problem of partitioning an input string T in such a way that compressing individually its parts via a base-compressor C gets a compressed output that is shorter than applying  ...  ACM 50(6):825-851, 2003) in the context of table compression, and then further elaborated and extended to strings and trees by Ferragina et al. (J.  ...  showing that our algorithmic solution to the text partitioning problem could be used as a tool for approximating efficiently the interesting class of Dynamic-Programming Recurrences we have dealt with in  ... 
doi:10.1007/s00453-010-9437-6 fatcat:4rmr2kxc4zct3b2trrl37pstpa

Optimally Partitioning a Text to Improve Its Compression [chapter]

Rossano Venturini
2013 Atlantis Studies in Computing  
In this paper we investigate the problem of partitioning an input string T in such a way that compressing individually its parts via a base-compressor C gets a compressed output that is shorter than applying  ...  ACM 50(6):825-851, 2003) in the context of table compression, and then further elaborated and extended to strings and trees by Ferragina et al. (J.  ...  showing that our algorithmic solution to the text partitioning problem could be used as a tool for approximating efficiently the interesting class of Dynamic-Programming Recurrences we have dealt with in  ... 
doi:10.2991/978-94-6239-033-1_3 fatcat:oobkz6d7rfapjlh64t4kgsglyq

From first principles to the Burrows and Wheeler transform and beyond, via combinatorial optimization

R. Giancarlo, A. Restivo, M. Sciortino
2007 Theoretical Computer Science  
Sciortino, Boosting textual compression in optimal linear time, Journal of the ACM 52 (2005) 688-713] . Therefore, they are all highly compressible.  ...  We also show that the class of optimal word permutations defined here is identical to the one identified by Ferragina et al. for compression boosting [P. Ferragina, R. Giancarlo, G. Manzini, M.  ...  In fact, they defined a class of word permutations well suited for compression boosting, i.e., bwt is not the only word permutation that is useful for boosting.  ... 
doi:10.1016/j.tcs.2007.07.019 fatcat:vcpnsui7fzf4rdtfqp2rieyz7i

Compressing Sentence Representation for Semantic Retrieval via Homomorphic Projective Distillation [article]

Xuandong Zhao, Zhiguo Yu, Ming Wu, Lei Li
2022 arXiv   pre-print
In this paper, we propose Homomorphic Projective Distillation (HPD) to learn compressed sentence embeddings.  ...  We evaluate our method with different model sizes on both semantic textual similarity (STS) and semantic retrieval (SR) tasks.  ...  Conclusion and Discussion In this paper, we propose an effective method to compress sentence representation using homomorphic projective distillation.  ... 
arXiv:2203.07687v1 fatcat:f6pxlypyjjgshctgvh4skjm26q

WIDIT in TREC-2003 Web Track

Kiduk Yang, Dan E. Albertson
2003 Text Retrieval Conference  
in real time.  ...  Reranking Module In order to optimize retrieval performance in top ranks, fusion results were reranked based on combinations of site compression technique and content-link evidence ranking heuristic.  ... 
dblp:conf/trec/YangA03 fatcat:qbevhxaup5bqhipsoj3emw2t5y

A Survey on CDPCF: Concise Discriminative Patterns Based Classification Framework

Ashwini Shahpurkar, Prof. S B Chaudhari
2018 IJARCCE  
Simple models such as generalized linear models have ordinary performance but strong interpretability on a set of simple features.  ...  There are different series which includes tree-based models, organize numerical, categorical and high dimensional features into a comprehensive structure with rich interpretable information in the data  ...  Their two step approach [3] , which combines random forest and a stepwise selection, provides a realistic approach for selecting an optimal set of features within a reasonable computational time.  ... 
doi:10.17148/ijarcce.2018.7104 fatcat:kzfjryb6cveilg7npqzq4q5t64

Text vs. space

Maria Christoforaki, Jinru He, Constantinos Dimopoulos, Alexander Markowetz, Torsten Suel
2011 Proceedings of the 20th ACM international conference on Information and knowledge management - CIKM '11  
We feel that previous work has often focused on the spatial aspect at the expense of performance considerations in text processing, such as inverted index access, compression, and caching.  ...  In this paper, we take a fresh look at this problem.  ...  We executed queries on our own document-at-a-time (DAAT) query processor, optimized through block-wise compression and forward skips in the inverted lists.  ... 
doi:10.1145/2063576.2063641 dblp:conf/cikm/ChristoforakiHDMS11 fatcat:f5kaghcwnzcrlgxkcvjlp344zy

A Comparative Assessment of Data Mining Algorithms to Predict Fraudulent Firms

Harshit Monish, Avinash Chandra Pandey
2020 2020 10th International Conference on Cloud Computing, Data Science & Engineering (Confluence)  
The process of data mining is helpful in discovering meaningful patterns in historical or unstructured data in order to make better business decisions.  ...  We have implemented Decision Trees, Linear Support Vector Machines, RBF Kernel Support Vector Machines, K-Nearest Neighbor, Artificial Neural Network and logistic regression classification models.  ...  PCA have many benefits from compression to reduction in computational complexity, noise reduction. IV.  ... 
doi:10.1109/confluence47617.2020.9057968 fatcat:yxruvis7qzfqjgww6zz2gmxuae

Discovering gis sources on the web using summaries

Ramaswamy Hariharan, Bijit Hore, Sharad Mehrotra
2008 Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries - JCDL '08  
Existing techniques simply rely on textual metadata accompanying such datasets to compute relevance to user-queries.  ...  Such approaches result in poor search results, often missing the most relevant sources on the web.  ...  In practice the MinSkew algorithm runs very fast and the time taken is almost linear in n and |B|.  ... 
doi:10.1145/1378889.1378907 dblp:conf/jcdl/HariharanHM08 fatcat:qtdn6d7jmbdvxe6ncjj35aecyq

ITI-CERTH participation to TRECVID 2009 HLFE and Search

Anastasia Moumtzidou, Anastasios Dimou, Paul King, Stefanos Vrochidis, Angeliki Angeletou, Vasileios Mezaris, Spiros Nikolopoulos, Ioannis Kompatsiaris, Lambros Makris
2009 TREC Video Retrieval Evaluation  
In a separate run, the use of compressed video information to form a Bag-of-Words model for shot representation is studied.  ...  The search task is based on an interactive retrieval application combining retrieval functionalities in various modalities (i.e. textual, visual and concept search) with a user interface supporting interactive  ...  In this run, the use of compressed video information for BoW model generation was examined for the first time.  ... 
dblp:conf/trecvid/MoumtzidouDKVAM09 fatcat:w2dzfw6csnccbbr5q474dfefru

Page 3210 of Mathematical Reviews Vol. , Issue 2004d [page]

2004 Mathematical Reviews  
for compression and analysis of very large remote sensing data sets (429-441); Juan K.  ...  Schapire, The boosting approach to ma- chine learning: an overview (149-171); Dragos D. Margineantu and Thomas G.  ... 

Neural Markovian Predictive Compression: An Algorithm for Online Lossless Data Compression

Erez Shermer, Mireille Avigal, Dana Shapira
2010 2010 Data Compression Conference  
The result is an interesting combination of properties: Linear processing time, constant memory storage performance and great adaptability to parallelism.  ...  This work proposes a novel practical and general-purpose lossless compression algorithm named Neural Markovian Predictive Compression (NMPC), based on a novel combination of Bayesian Neural Networks (BNNs  ...  LZW is well-suited to online compression, as it does not require the input stream to be divided into blocks. Using a fixed size dictionary, LZW can be implemented in linear time.  ... 
doi:10.1109/dcc.2010.26 dblp:conf/dcc/ShermerAS10 fatcat:f4ovjsi5xjf7bi62dkp6zmjtiq
« Previous Showing results 1 — 15 out of 2,550 results