Filters








63 Hits in 3.7 sec

Capturing, indexing, clustering, and retrieving system history

Ira Cohen, Steve Zhang, Moises Goldszmidt, Julie Symons, Terence Kelly, Armando Fox
2005 Proceedings of the twentieth ACM symposium on Operating systems principles - SOSP '05  
In operating today's complex systems, the lack of a systematic way to capture and query the essential system state characterizing an incident of performance failure or unavailability makes it difficult  ...  We validate our approach on both synthetic traces and several weeks of production traces from a customer-facing geoplexed 24 × 7 system; in the latter case, our approach identified a recurring problem  ...  We say the system is in violation of its SLO if the metric(s) exceed the policy threshold, and in compliance with its SLO otherwise.  ... 
doi:10.1145/1095810.1095821 dblp:conf/sosp/CohenZGSKF05 fatcat:tou6rzwnlbhfnfd6qlwoixbnga

Capturing, indexing, clustering, and retrieving system history

Ira Cohen, Steve Zhang, Moises Goldszmidt, Julie Symons, Terence Kelly, Armando Fox
2005 ACM SIGOPS Operating Systems Review  
In operating today's complex systems, the lack of a systematic way to capture and query the essential system state characterizing an incident of performance failure or unavailability makes it difficult  ...  We validate our approach on both synthetic traces and several weeks of production traces from a customer-facing geoplexed 24 × 7 system; in the latter case, our approach identified a recurring problem  ...  We say the system is in violation of its SLO if the metric(s) exceed the policy threshold, and in compliance with its SLO otherwise.  ... 
doi:10.1145/1095809.1095821 fatcat:37fcapllqfhcxddvdioqme7hom

Short term performance forecasting in enterprise systems

Rob Powers, Moises Goldszmidt, Ira Cohen
2005 Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining - KDD '05  
We use data mining and machine learning techniques to predict upcoming periods of high utilization or poor performance in enterprise systems.  ...  Second, it quantifies the variations in accuracy when using different classes of system and workload features.  ...  in enterprise systems.  ... 
doi:10.1145/1081870.1081976 dblp:conf/kdd/PowersGC05 fatcat:liqtfzhfmncqha75gxxqg4s53e

Dependency-Driven Analytics: A Compass for Uncharted Data Oceans

Ruslan Mavlyutov, Carlo Curino, Boris Asipov, Philippe Cudré-Mauroux
2017 Conference on Innovative Data Systems Research  
We qualitatively discuss the improvement over the brute-force analytics our users used to perform by considering a series of practical applications, including: job auditing and compliance, automated SLO  ...  producing petabytes of system logs daily.  ...  Once deployed, the automated DDA-based Compliance Monitoring system detects compliance violations on a continuous basis, providing improved coverage and latency.  ... 
dblp:conf/cidr/MavlyutovCAC17 fatcat:kkdxhau55bei5bqycwjq7tehym

Mining Performance Regression Testing Repositories for Automated Performance Analysis

King Chun Foo, Zhen Ming Jiang, Bram Adams, Ahmed E. Hassan, Ying Zou, Parminder Flora
2010 2010 10th International Conference on Quality Software  
In this paper, we present an automated approach to detect potential performance regressions in a performance regression test.  ...  Performance regression testing detects performance regressions in a system under load.  ...  ACKNOWLEDGMENT We are grateful to Research In Motion (RIM) for providing access to the enterprise application used in our case study.  ... 
doi:10.1109/qsic.2010.35 dblp:conf/qsic/FooJAHZF10 fatcat:o2w2kpddwvff7hfwiefz3upw7m

DeCaf: Diagnosing and Triaging Performance Issues in Large-Scale Cloud Services [article]

Chetan Bansal, Sundararajan Renganathan, Ashima Asudani, Olivier Midy, Mathru Janakiraman
2020 arXiv   pre-print
In this paper, we present the design, implementation and experience from building and deploying DeCaf, a system for automated diagnosis and triaging of KPI issues using service logs.  ...  Large volume of logs and mixed type of attributes (categorical, continuous) in the logs makes diagnosis of regressions non-trivial.  ...  ACKNOWLEDGMENTS We would like to acknowledge the invaluable contributions and support of B.  ... 
arXiv:1910.05339v4 fatcat:fnjjnsebsvaxjoalwj5uwph4ea

Software-defined Cloud Computing: A Systematic Review on Latest Trends and Developments

Aaqif Afzaal Abbasi, Almas Abbasi, Shahaboddin Shamshirband, Anthony Theodore Chronopoulos, Valerio Persico, Antonio Pescaph
2019 IEEE Access  
SDCC-related concepts change the previous state of affairs by promoting the centralized control of networking functions in a data center.  ...  We also explore the potential of SDCC in two domains, namely, resource orchestration and application development, as case studies of specific interest.  ...  We present a simplified overview of Frenetic functions in Table 7 . Orchestration is often known as the automated configuration and management of computing systems.  ... 
doi:10.1109/access.2019.2927822 fatcat:eb2tntkpzngilouvtrw76emx7y

Initial Service Provider DevOps concept, capabilities and proposed tools [article]

Wolfgang John, Catalin Meirosu, Pontus Sköldström, Felician Nemeth, Andras Gulyas, Mario Kind, Sachin Sharma, Ioanna Papafili, George Agapiou, Guido Marchetto, Riccardo Sisto, Rebecca Steinert (+3 others)
2015 arXiv   pre-print
The sketch is based on lessons learned from a study of management and operational practices in the industry and recent related work with respect to management of SDN and cloud.  ...  This report presents a first sketch of the Service Provider DevOps concept including four major management processes to support the roles of both service and VNF developers as well as the operator in a  ...  The progress of these discussions, along with the progress made by partners in designing monitoring, verification and troubleshooting capabilities will be reported in MS4.1.  ... 
arXiv:1510.02220v2 fatcat:djlkhh2dtrbxbo7iinrnlu33kq

Who Do You Call? Problem Resolution through Social Compute Units [chapter]

Bikram Sengupta, Anshu Jain, Kamal Bhattacharya, Hong-Linh Truong, Schahram Dustdar
2012 Lecture Notes in Computer Science  
time per incident and number of SLO violations being at times as low as 52.7% and 27.3% respectively of the corresponding values for pure workflow based incident management.  ...  In this paper we use IT services management, specifically, incident management for large scale systems, to investigate the interplay of workflow systems and social computing.  ...  From the business impact perspective, the most compelling case for the SCU comes from the dramatically improved SLO performance that results from its faster resolution of tickets, with Number of SLO Violations  ... 
doi:10.1007/978-3-642-34321-6_4 fatcat:ogprxeez3jb7njzalv4bkbdjiu

Failure Diagnosis of Complex Systems [chapter]

Soila P. Kavulya, Kaustubh Joshi, Felicita Di Giandomenico, Priya Narasimhan
2012 Resilience Assessment and Evaluation of Computing Systems  
While diagnosis has historically been a largely manual process requiring significant human input, techniques to automate as much of the process as possible have significantly grown in importance in many  ...  This chapter presents a survey of automated failure diagnosis techniques including both model-based and model-free approaches.  ...  They use Service-Level Objective (SLO) violations to identify periods of time where the system was behaving abnormally and use tree augmented Bayesian networks (TANs) to determine which metrics are most  ... 
doi:10.1007/978-3-642-29032-9_12 fatcat:dyxufulyhfgpfbjruwizaj7eia

Self-awareness of Cloud Applications [chapter]

Alex Iosup, Xiaoyun Zhu, Arif Merchant, Eva Kalyvianaki, Martina Maggio, Simon Spinner, Tarek Abdelzaher, Ole Mengshoel, Sara Bouchenak
2017 Self-Aware Computing Systems  
A much shorter, revised version of this material will be available in print, as part of a Springer book on "Self-Aware Computing". The book is due to appear in 2017.  ...  In this chapter, we propose a conceptual framework for analyzing state-of-the-art self-awareness approaches used in the context of cloud applications.  ...  of Systems, by the Swedish Research Council (VR) for the projects "Cloud Control" and "Power and temperature control for large-scale computing infrastructures", and through the LCCC Linnaeus and ELLIIT  ... 
doi:10.1007/978-3-319-47474-8_20 fatcat:ckcxfmjmvvas5bhg7wp5qfmehy

Operationalizing Machine Learning: An Interview Study [article]

Shreya Shankar, Rolando Garcia, Joseph M. Hellerstein, Aditya G. Parameswaran
2022 arXiv   pre-print
deployment process, and (iv) monitoring of performance drops in production.  ...  ., deploy and maintain ML pipelines in production.  ...  We are also grateful to Sarah Catanzaro for connecting us to some of the interviewees, and Alex Tamkin and Preetum Nakkiran for helpful suggestions.  ... 
arXiv:2209.09125v1 fatcat:dlqmvgehbfc67k5q2al3gtwiqa

CLUEBOX: A Performance Log Analyzer for Automated Troubleshooting

S. Ratna Sandeep, M. Swapna, Thirumale Niranjan, Sai Susarla, Siddhartha Nandi
2008 USENIX Symposium on Operating Systems Design and Implementation  
By identifying the most relevant anomalies to focus on, CLUEBOX automates the most onerous aspects of performance troubleshooting.  ...  Performance problems in complex systems are often caused by underprovisioning, workload interference, incorrect expectations or bugs.  ...  They report that models serve well to identify SLO violations in web services.  ... 
dblp:conf/osdi/SandeepSNSN08 fatcat:b5bfowd5afbz7mqe324juipdf4

VM-Flow [chapter]

Ivo J. G. dos Santos, Edmundo R. M. Madeira
2004 IFIP International Federation for Information Processing  
For the avoidance of doubt, the content of this report reflects only the opinion of the project consortium members and its authors.  ...  The European Commission is not responsible for its contents, or liable for the possible effects of any usage of the information contained therein.  ...  ) and the appropriate actions to be taken if a violation of these SLOs has been detected.  ... 
doi:10.1007/1-4020-8155-3_15 fatcat:tdnlcms72rbznjvwphoztdx37y

A Comprehensive Feature Comparison Study of Open-Source Container Orchestration Frameworks

Eddy Truyen, Dimitri Van Landuyt, Davy Preuveneers, Bert Lagaisse, Wouter Joosen
2019 Applied Sciences  
This study aims at (i) identifying the common and unique features of allframeworks, (ii) comparing these frameworks qualitatively ánd quantitatively with respect togenericity in terms of supported features  ...  (ii)Kubernetes supports the highest number of accumulated common and unique features for all 9functional aspects; however, no evidence has been found for significant differences in genericitywith Docker  ...  The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.  ... 
doi:10.3390/app9050931 fatcat:csmwlkzqdjeptbot3gmef2czzi
« Previous Showing results 1 — 15 out of 63 results