3 Hits in 2.6 sec


K. R. Jayaram, Archit Verma, Falk Pollok, Rania Khalaf, Vinod Muthusamy, Parijat Dube, Vatche Ishakian, Chen Wang, Benjamin Herta, Scott Boag, Diana Arroyo, Asser Tantawi
2019 Proceedings of the 20th International Middleware Conference on - Middleware '19  
As a result, large scale on-premise and cloud-hosted deep learning platforms have become essential infrastructure in many organizations.  ...  the overheads introduced by the platform for various deep learning models, the load and performance observed in a real case study using FfDL within our organization, the frequency of various faults observed  ...  : A Flexible Multi-tenant Deep Learning Platform MIDDLEWARE'19, December 09-13, 2019, Davis, CA, USA failure reason message % of pods Binding Rejected  ... 
doi:10.1145/3361525.3361538 dblp:conf/middleware/JayaramMDIWHBAT19 fatcat:vhwii2hpjrcbtjb4qffenmdjna

Scalable Deep Learning on Distributed Infrastructures: Challenges, Techniques and Tools [article]

Ruben Mayer, Hans-Arno Jacobsen
2019 arXiv   pre-print
This incorporates infrastructures for DL, methods for parallel DL training, multi-tenant resource scheduling and the management of training and model data.  ...  Deep Learning (DL) has had an immense success in the recent past, leading to state-of-the-art results in various domains such as image recognition and natural language processing.  ...  IBM Fabric for Deep Learning [18] (FfDL) is a cloud-based deep learning stack used at IBM by AI researchers.  ... 
arXiv:1903.11314v2 fatcat:y62z7mteyzeq5kenb7srwtlg7q

D1.1 - State of the Art Analysis

Danilo Ardagna
2021 Zenodo  
Then, the deliverable provides a background on AI applications design, also considering some advanced design trends (e.g., Network Architecture Search, Federated Learning, Deep Neural Networks partitioning  ...  The aim of the AI-SPRINT "Artificial intelligence in Secure PRIvacy-preserving computing coNTinuum" project is to develop a platform composed of design and runtime management tools to seamlessly design  ...  Furthermore, FfDL is developed as a middleware between the deep learning frameworks and the hardware resources and supports scalability and fault tolerance of applications.  ... 
doi:10.5281/zenodo.6372377 fatcat:f6ldfuwivbcltew4smiiwphfty