Enabling What-if Explorations in Systems (CMU-PDL-07-103)

Eno Thereska
With a large percentage of total system cost going to system administration tasks, ease of system management remains a difficult and important goal. As a step towards that goal, this dissertation presents a success story on building systems that are self-predicting. Self-predicting systems continuously monitor themselves and provide quantitative answers to What...if questions about hypothetical workload or resource changes. Self-prediction has the potential to simplify administrators' decision
more » ... aking, such as acquisition planning and performance tuning, by reducing the detailed workload and internal system knowledge required. Self-prediction has as the primary building block mathematical models, that, once built into the system, analyze past, and predict future behavior. Because of the traditional disconnect between systems researchers and theoretical researchers, however, there are fundamental difficulties in enabling existing mathematical models to make meaningful predictions in real systems. In part, this dissertation serves as a bridge between research in theory (e.g., queuing theory and statistical theory) and research in systems (e.g., database and storage systems). It identifies ways to build systems to support use of mathematical models and addresses fundamental show-stoppers that keep models from being useful in practice. For example, we explore many opportunities to deeply understand workload-system interactions by having models be first-class system components, rather than developing and deploying them separately from the system, as is traditionally done. As another example, lack of good measurement information in a distributed system can be a show-stopper for models based on queuing analysis. This dissertation introduces a measurement framework that replaces performance counters with end-to-end activity tracing. End-to-end tracing allows contextual information to be propagated with requests so that queuing models can attribute resource demands to the correct workloads. In addition, this dissertation [...]
doi:10.1184/r1/6619574.v1 fatcat:gc4rkbqqyfh5flsjk4zirtzaaq