Latency Tradeoffs in Distributed Storage Access
[article]
(:Unkn) Unknown, University, My, Krishna Kant
2020
The performance of storage systems is central to handling the huge amount of data being generated from a variety of sources including scientific experiments, social media, crowdsourcing, and from an increasing variety of cyber-physical systems. The emerging high-speed storage technologies enable the ingestion of and access to such large volumes of data efficiently. However, the combination of high data volume requirements of new applications that largely generate unstructured and semistructured
more »
... streams of data combined with the emerging high-speed storage technologies pose a number of new challenges, including the low latency handling of such data and ensuring that the network providing access to the data does not become the bottleneck. The traditional relational model is not well suited for efficiently storing and retrieving unstructured and semi-structured data. An alternate mechanism, popularly known as Key-Value Store (KVS) has been investigated over the last decade to handle such data. A KVS store only needs a 'key' to uniquely identify the data record, which may be of variable length and may or may not have further structure in the form of predefined fields. Most of the KVS in existence have been designed for hard-disk based storage (before the SSDs gain popularity) where avoiding random accesses is crucial for good performance. Unfortunately, as the modern solid-state drives become the norm as the data center storage, the HDD-based KV structures result in high read, write, and space amplifications which becomes detrimental to both the SSD's performance and endurance. Also note that regardless of how the storage systems are deployed, access to large amounts of storage by many nodes must necessarily go over the network. At the same time, the emerging storage technologies such as Flash, 3D-crosspoint, phase change memory (PCM), etc. coupled with highly efficient access protocols such as NVMe are capable of ingesting and reading data at rates that challenge even the leading edge networking technologies such as [...]
doi:10.34944/dspace/2201
fatcat:epm7bymjkrg63m73ng4e24krv4