Memory-Restricted and Secure Algorithms For Large Datasets [thesis]

Alireza Farhadi
With the rise of big data, there is a growing need to solve optimization tasks on massive datasets. Running optimization tasks on large-scale datasets can be quite challenging as classical algorithms often have unrealistic assumptions about the data. For example, classical algorithms oftentimes assume that the computing machines have enough memory to store all of the data. However, in the era of big data, we often face massive datasets that are too large to fit into the computer's memory. In
more » ... s thesis we study fundamental problems in computer science such as maximum matching, sorting, edit distance and longest common subsequence in massive datasets. We consider two popular models to deal with massive inputs. The first model is the streaming model in which the input is represented as a sequence of items and the algorithm has limited space and is only allowed to store a sub-linear portion of the input. In this model, we show the existence of single-pass approximation algorithms for the maximum matching, longest common subsequence, and edit distance problems. The other model that we consider for dealing with large datasets is the external memory model. In this model, the input is stored on external storage (such as cloud storage, hard drive, etc.), and the size of the input can be significantly larger than the main memory of the computing machine. In this thesis, we present a tight conditional lower bound on the complexity of external memory sorting of integers. We also show that our techniques can be used to derive a tight lower bound for the sorting of integers in the Oblivious RAM model which is a secure model of computation.
doi:10.13016/owez-vvvz fatcat:qpk4ex6u3ba5dhczjbwwjrnj6i