Filters








58 Hits in 4.7 sec

Reliable Computing Service in Massive-Scale Systems through Rapid Low-Cost Failover

Renyu Yang, Yang Zhang, Peter Garraghan, Yihui Feng, Jin Ouyang, Jie Xu, Zhuo Zhang, Chao Li
2017 IEEE Transactions on Services Computing  
The proposed approach was implemented, deployed and evaluated within Fuxi system, the underlying resource management system used within Alibaba Cloud.  ...  Large-scale distributed systems deployed as Cloud datacenters are capable of provisioning service to consumers with diverse business requirements.  ...  ACKNOWLEDGMENTS Special thanks must go to the overall Fuxi distributed resource scheduling team in Alibaba Cloud Inc. and the SIGRS group from Beihang University for their supports and collaborative contributions  ... 
doi:10.1109/tsc.2016.2544313 fatcat:ppxfp4hq6fd7zoyf5sfdhgry2i

AntMan: Dynamic Scaling on GPU Clusters for Deep Learning

Wencong Xiao, Shiru Ren, Yong Li, Yang Zhang, Pengyang Hou, Zhi Li, Yihui Feng, Wei Lin, Yangqing Jia
2020 USENIX Symposium on Operating Systems Design and Implementation  
This paper presents AntMan, a deep learning infrastructure that co-designs cluster schedulers with deep learning frameworks and has been deployed in production at Alibaba to manage tens of thousands of  ...  daily deep learning jobs across thousands of GPUs.  ...  We would also like to thank Chen Xing, Jin Ouyang, Xinyuan Li, Lixue Xia for their help in improving quality of writing.  ... 
dblp:conf/osdi/XiaoRLZHLFLJ20 fatcat:lfa2xrj7zveulikmjxzi66e7fm

Big Data and cloud computing: innovation opportunities and challenges

Chaowei Yang, Qunying Huang, Zhenlong Li, Kai Liu, Fei Hu
2016 International Journal of Digital Earth  
This paper surveys the two frontiers -Big Data and cloud computingand reviews the advantages and consequences of utilizing cloud computing to tackling Big Data in the digital earth and relevant science  ...  cloud computing and processing Big Data; (v) open availability of Big Data and processing capability pose social challenges of geospatial significance and (vi) a weave of innovations is transforming Big  ...  Acknowledgements We thank the anonymous reviewers for their insightful comments and reviews. Dr George Taylor reviewed a previous version of this manuscript.  ... 
doi:10.1080/17538947.2016.1239771 fatcat:qbcgqj2pcvbgja6dnnakoj2saa

Toward "On‐Demand" Materials Synthesis and Scientific Discovery through Intelligent Robots

Jiagen Li, Yuxiao Tu, Rulin Liu, Yihua Lu, Xi Zhu
2020 Advanced Science  
This work provides a walking example for the "On-Demand" materials synthesis system, and demonstrates how artificial intelligence technology can reshape traditional materials science research in the future  ...  MAOS frees the experimental researchers out of the tedious labor as well as the extensive exploration of optimal reaction conditions.  ...  MySQL on the cloud server is used to store the data.  ... 
doi:10.1002/advs.201901957 pmid:32274293 pmcid:PMC7141037 fatcat:srutloooszbe7dlq7gnbq32csm

Rationalizing Resource Utilization in Cloud Computing Using Coalition Formation Strategy

Hend Fakhri Noureldin, Mai Fadel
2021 Journal of Computer Science  
In conclusion, it is observed that improvements depend on accuracy of the prediction of usage pattern of the user.  ...  resources and loose enough to prevent any degradation in Quality of Service (QoS) that may lead to the violation of the Service Level Agreement (SLA) between the service provider and the cloud user.  ...  Fathy Eassa -the head of our research team-for his continuous guidance. In addition, we would like to thank Dr. Lamiaa Elrfaei, Prof. Hanan Elazhari and Dr. Etimad Fadel for their valuable input.  ... 
doi:10.3844/jcssp.2021.539.555 fatcat:yee5jzqelvdplh6ytjjaqy77bu

Performance-aware Speculative Resource Oversubscription for Large-scale Clusters

Renyu Yang, Chunming Hu, Xiaoyang Sun, Peter Garraghan, Tianyu Wo, Zhenyu Wen, Hao Peng, Jie Xu, Chao Li
2020 IEEE Transactions on Parallel and Distributed Systems  
Experiments show that in case of mixed co-location of batch jobs and latency-sensitive LRAs, the CPU utilization and the disk utilization can reach 56.34% and 43.49%, respectively, but the 95th percentile  ...  Instead of waiting for resource allocation to be confirmed by the centralized scheduler, job managers in ROSE can independently request to launch speculative tasks within specific machines according to  ...  This work is supported by National Key R&D Program of China (2016YFB1000503), NSFC (61421003), the EPSRC (EP/T01461X/1) and Beijing Advanced Innovation Center for Big Data and Brain Computing (BDBC).  ... 
doi:10.1109/tpds.2020.2970013 fatcat:2tftu6ehofcqflyfhumaytry7u

Ant Colony Algorithm for Multi-Objective Optimization of Container-based Microservice Scheduling in Cloud

Miao Lin, Jianqing Xi, Weihua Bai, Jiayin Wu
2019 IEEE Access  
In cloud architectures, the microservice model divides an application into a set of loosely coupled and collaborative fine-grained services.  ...  Despite a large number of solutions and implementations, there remain open issues that have not been completely addressed in the deployment and management of the microservice containers.  ...  EXPERIMENT SETUP The first step of experimental evaluation is to select test data set and set up the parameters of our model. 1) TEST DATA Based on the analysis of the real data from Alibaba Cluster  ... 
doi:10.1109/access.2019.2924414 fatcat:3t7wpywqhvgk7apohcksl2jvku

A Helpful Analysis of the Technological Mix based on HPC and Artificial Intelligence for Maintaining Competitiveness in the Business Environment

Vasile Mazilescu, Dunarea de Jos University of Galati, Romania
2020 Annals of Dunarea de Jos University. Fascicle I : Economics and Applied Informatics  
, engineering or analytical tasks that are particularly computationally intensive, utilized memory or data management, with increasing use in the business field.  ...  Retailers run AI models to analyze customer data and provide targeted purchasing recommendations.  ...  A critical issue for the efficient operation of AI applications in the cloud is the characterization of the calculation and data transfer processes of these work tasks.  ... 
doi:10.35219/eai1584040976 fatcat:vfyssthnozey7p5n2rchpsreyy

Big Data: A Survey

Min Chen, Shiwen Mao, Yunhao Liu
2014 Journal on spesial topics in mobile networks and applications  
Alibaba, generates data of tens of Terabyte (TB) for online trading per day.  ...  Relationship between cloud computing and big data Cloud computing is closely related to big data. The key components of cloud computing are shown in Fig. 3 .  ... 
doi:10.1007/s11036-013-0489-0 fatcat:lvnmkesqbngc3lgccms3hmm7jq

Resource Scheduling in Edge Computing: A Survey [article]

Quyuan Luo, Shihong Hu, Changle Li, Guanghui Li, Weisong Shi
2021 arXiv   pre-print
By moving the services and functions located in the cloud to the proximity of users, edge computing can provide powerful communication, storage, networking, and communication capacity.  ...  With the proliferation of the Internet of Things (IoT) and the wide penetration of wireless networks, the surging demand for data communications and computing calls for the emerging edge computing paradigm  ...  At present, the difficulty and trend of this subject are how to place tasks with data dependencies when the service or application is composed of multiple dependent tasks.  ... 
arXiv:2108.08059v1 fatcat:oo4lepcn3rhdfafefw5lkq2lia

Massivizing Computer Systems: a Vision to Understand, Design, and Engineer Computer Ecosystems through and beyond Modern Distributed Systems [article]

Alexandru Iosup, Alexandru Uta, Laurens Versluis, Georgios Andreadis, Erwin van Eyk, Tim Hegeman, Sacheendra Talluri, Vincent van Beek, Lucian Toader
2018 arXiv   pre-print
Our society is digital: industry, science, governance, and individuals depend, often transparently, on the inter-operation of large numbers of distributed computer systems.  ...  Beyond establishing and growing a body of knowledge about computer ecosystems and their constituent systems, the community in this domain should also aim to educate many about design and engineering for  ...  Acknowledgments This work is supported by the Dutch projects Vidi Magna-Data and KIEM KIESA, by the Dutch Commit and the Commit project Commissioner, and by generous donations from Oracle Labs, USA.  ... 
arXiv:1802.05465v2 fatcat:kkxitbrvdndsfknisdeqnjwp2q

Is this artificial intelligence?

Vladan Devedzic
2020 Facta universitatis - series Electronics and Energetics  
Recent advancements in the field of AI have certainly contributed to the AI hype, and so have numerous applications and results of using AI technology in practice.  ...  Artificial Intelligence (AI) has become one of the most frequently used terms in the technical jargon (and often in not-so-technical jargon).  ...  them in the cloud as microservices [68] .  ... 
doi:10.2298/fuee2004499d fatcat:kis6mxw6lrdcfhjdrmndxpnjke

Cloud Computing and Privacy Risks in the Information/Knowledge/Digital Risk Society and Economy: An Overview

Sompurna Bhadra
2020 International Journal for Research in Applied Science and Engineering Technology  
jobs into the cloud.  ...  Most of the value in this new technology stack is in the applications and services that can be created by using IoT, cloud computing, big data analytics and artificial intelligence ... in a wide range  ...  and tasked with the job of both monitoring for privacy violations and creating accountability through enforcement.  ... 
doi:10.22214/ijraset.2020.6362 fatcat:jhpjpe4uundtbmk4jodptvhwqa

Challenges and Solution Directions of Microservice Architectures: A Systematic Literature Review

Mehmet Söylemez, Bedir Tekinerdogan, Ayça Kolukısa Kolukısa Tarhan
2022 Applied Sciences  
On the other hand, the adoption of MSA for a specific software system is not trivial and a number of challenges have been reported in the literature.  ...  This article aims at identifying the state of the art of MSA and describing the challenges in applying MSA together with the identified solution directions.  ...  Study [AV] investigates the task scheduling and auto-scaling challenges in clouds.  ... 
doi:10.3390/app12115507 fatcat:qfu64p2nbvffzpc5tgvlfl7unu

A Roadmap for Big Model [article]

Sha Yuan, Hanyu Zhao, Shuai Zhao, Jiahong Leng, Yangxiao Liang, Xiaozhi Wang, Jifan Yu, Xin Lv, Zhou Shao, Jiaao He, Yankai Lin, Xu Han (+88 others)
2022 arXiv   pre-print
We introduce 16 specific BM-related topics in those four parts, they are Data, Knowledge, Computing System, Parallel Training System, Language Model, Vision Model, Multi-modal Model, Theory&Interpretability  ...  Researchers have achieved various outcomes in the construction of BMs and the BM application in many fields.  ...  thousands of tasks in parallel.  ... 
arXiv:2203.14101v4 fatcat:rdikzudoezak5b36cf6hhne5u4
« Previous Showing results 1 — 15 out of 58 results