1,270 Hits in 4.8 sec

fv3gfs-wrapper: a Python wrapper of the FV3GFS atmospheric model

Jeremy McGibbon, Noah D. Brenowitz, Mark Cheeseman, Spencer K. Clark, Johann P. S. Dahm, Eddie C. Davis, Oliver D. Elbert, Rhea C. George, Lucas M. Harris, Brian Henn, Anna Kwa, W. Andre Perkins (+4 others)
2021 Geoscientific Model Development  
Model performance is identical to the fully compiled Fortran model, unless routines to copy the state in and out of the model are used.  ...  This copy overhead is well within an acceptable range of performance and could be avoided with modifications to the Fortran source code.  ...  UFS community for publicly hosting source code for the FV3GFS model (, last access: 21 May 2021) and NOAA-EMC for providing the necessary forcing data to  ... 
doi:10.5194/gmd-14-4401-2021 fatcat:r43y2tk6r5h3vcldbei6b5peo4

Making a case for distributed file systems at Exascale

Ioan Raicu, Ian T. Foster, Pete Beckman
2011 Proceedings of the third international workshop on Large-scale system and application performance - LSAP '11  
At exascale, basic functionality at high concurrency levels will suffer poor performance, and combined with system mean-time-to-failure in hours, will lead to a performance collapse for large-scale heroic  ...  This approach will not scale several orders of magnitude in terms of concurrency and throughput, and will thus prevent the move from petascale to exascale.  ...  We also want to thank the anonymous reviewers whose feedback was invaluable to improving the clarity of the paper.  ... 
doi:10.1145/1996029.1996034 fatcat:bon3bizokzckhl5ajrzsqt7tqq

MapReduce: Simplified Data Analysis of Big Data

Seema Maitrey, C.K. Jha
2015 Procedia Computer Science  
Efficient parallel/concurrent algorithms and implementation techniques are the key to meeting the scalability and performance requirements entailed in such large scale data mining analyses.  ...  A big problem has been encountered in various fields for making the full use of these large scale data which support decision making.  ...  Experiment Writing a Hadoop MapReduce application The best way to understand and get familiar with the working of Hadoop is to walk through the process of writing a Hadoop MapReduce application.  ... 
doi:10.1016/j.procs.2015.07.392 fatcat:whtpro3grzbpphvfzbptlun744

MLPerf HPC: A Holistic Benchmark Suite for Scientific Machine Learning on HPC Systems [article]

Steven Farrell, Murali Emani, Jacob Balma, Lukas Drescher, Aleksandr Drozd, Andreas Fink, Geoffrey Fox, David Kanter, Thorsten Kurth, Peter Mattson, Dawei Mu, Amit Ruhela (+31 others)
2021 arXiv   pre-print
There is a critical need to understand fair and effective benchmarking of machine learning applications that are representative of real-world scientific use cases.  ...  To overcome the data-parallel scalability challenge at large batch sizes, we discuss specific learning techniques and hybrid data-and-model parallelism that are effective on large systems.  ...  ACKNOWLEDGMENT This research was funded in part by the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357.  ... 
arXiv:2110.11466v2 fatcat:qb6qfyklefb4bcufuj3eozjzcm


Michael Albrecht, Patrick Donnelly, Peter Bui, Douglas Thain
2012 Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies - SWEET '12  
In recent years, there has been a renewed interest in languages and systems for large scale distributed computing.  ...  We evaluate Workbench on two physical architectures -the first a storage cluster with local disks and a slower network and the second a high performance computing cluster with a central parallel filesystem  ...  Even compared to SGE, which is supported by a large, fast, parallel filesystem, Work Queue still performs the best at small filesizes.  ... 
doi:10.1145/2443416.2443417 dblp:conf/sigmod/AlbrechtDBT12 fatcat:xxkqit5xsnggpbehfe5v5ktpv4


K. Ashwin Kumar, Jonathan Gluck, Amol Deshpande, Jimmy Lin
2013 Proceedings of the VLDB Endowment  
., machine learning, which generally operates over smaller and more refined datasets. To address these trends, we propose "scaling down" Hadoop to run on shared-memory machines.  ...  This allows us to take existing Hadoop algorithms and find the most suitable runtime environment for execution on datasets of varying sizes.  ...  There has been a wealth of activity in applying Hadoop to problems in data management as well as data mining and machine learning; the community has learned much about how to recast algorithms in terms  ... 
doi:10.14778/2536274.2536314 fatcat:fuj7i33ltzh7dfaqb73r3khama

Methodology and Application of HPC: I/O Characterization with MPIProf and IOT

Yan-Tyng Sherry Chang, Henry Jin, John Bauer
2016 2016 5th Workshop on Extreme-Scale Programming Tools (ESPT)  
The SGI MPT library, the prevailing MPI library for our systems, was found to gather small writes from a large number of ranks to perform larger writes by a small subset of collective buffering ranks.  ...  This method is applied to answer four I/O questions in this paper.  ...  the Lustre filesystem in order to minimize the I/O time for better scaling.  ... 
doi:10.1109/espt.2016.005 dblp:conf/sc/ChangJB16 fatcat:hysv7akoljawjowdxv7lu7izie

Large-scale Predictive Analytics in Vertica

Shreya Prasad, Arash Fard, Vishrut Gupta, Jorge Martinez, Jeff LeFevre, Vincent Xu, Meichun Hsu, Indrajit Roy
2015 Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data - SIGMOD '15  
Building models on single-threaded R does not scale. Finally, it is nearly impossible to use R or other common tools, to apply models on terabytes of newly arriving data.  ...  A typical predictive analytics workflow will pre-process data in a database, transfer the resulting data to an external statistical tool such as R, create machine learning models in R, and then apply the  ...  Second, even if machine learning models are created in R, it is ill-suited for applying the model on large amounts of data in a timely manner.  ... 
doi:10.1145/2723372.2742789 dblp:conf/sigmod/PrasadFGMLXHR15 fatcat:7c6el34hsfedhnzqh2fqgyyjj4

Efficient Distributed Preprocessing Model for Machine Learning-Based Anomaly Detection over Large-Scale Cybersecurity Datasets

Xavier Larriva-Novo, Mario Vega-Barbas, Víctor A. Villagrá, Diego Rivera, Manuel Álvarez-Campana, Julio Berrocal
2020 Applied Sciences  
In addition, the paper analyzes the use of machine learning techniques in order to improve the response and efficiency of the proposed preprocessing model.  ...  Finally, the proposal shows the adequateness of decision tree algorithms for training a machine learning model by using a large dataset when compared with a multilayer perceptron neural network.  ...  Sections 6 and 7 provide the obtained results after applying our proposal to large-scale datasets.  ... 
doi:10.3390/app10103430 fatcat:rz3u3txwobdg3dqtwkn67gnuvi

All-Pairs: An Abstraction for Data-Intensive Computing on Campus Grids

Christopher Moretti, Hoang Bui, Karen Hollingsworth, Brandon Rich, Patrick Flynn, Douglas Thain
2010 IEEE Transactions on Parallel and Distributed Systems  
A large workload composed in what seems to be the obvious way by a naive user may accidentally abuse shared resources and achieve very poor performance.  ...  We demonstrate that an optimized All-Pairs abstraction is both easier to use than the underlying system, achieves performance orders of magnitude better than the obvious but naive approach, and is both  ...  We thank David Cieslak, Tim Faltemier, Tanya Peters, and Robert McKeon for testing early versions of this work.  ... 
doi:10.1109/tpds.2009.49 fatcat:txpj65f3ubaj3j3v2xjsee3hse

Evaluating the Price of Consistency in Distributed File Storage Services [chapter]

José Valerio, Pierre Sutra, Étienne Rivière, Pascal Felber
2013 Lecture Notes in Computer Science  
., linearizability without close-to-open semantics) harm performance; and (ii) when close-to-open semantics is in use, linearizability delivers performance similar to sequential or eventual consistency  ...  Distributed file storage services (DFSS) such as Dropbox, iCloud, SkyDrive, or Google Drive, offer a filesystem interface to a distributed data store.  ...  Enabling further research on DFSS to scale and break the petabyte barrier requires developers to understand and be able to compare systematically the multiple components of a design.  ... 
doi:10.1007/978-3-642-38541-4_11 fatcat:w6u3xxzezvezfeje2j3n4eyw3u

Performance Characterization and Modeling of Serverless and HPC Streaming Applications [article]

Andre Luckow, Shantenu Jha
2019 arXiv   pre-print
Understanding of the performance and scaling characteristics of streaming applications and infrastructure presents another challenge for EILC.  ...  Using experiments on HPC and AWS Lambda, we demonstrate that StreamInsight provides an accurate model for a variety of application characteristics, e.g., machine learning model sizes and resource configurations  ...  We evaluated StreamInsight using different complex machine learning tasks and showed that the USL approach is well suited to predict the scaling properties of streaming applications requiring only small  ... 
arXiv:1909.06055v1 fatcat:srn4aojierffzlzlp2hmwm6io4


Eric Van Hensbergen, Pravin Shinde, Noah Evans
2011 Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers - ROSS '11  
vizualization of more traditional high performance computing simulations.  ...  Together with our dataflow shell, named PUSH, it is intended to be used for the management of non-traditional super computing applications as well as provide a mechanism to manage in-situ analysis and  ...  ACKNOWLEDGEMENTS This work was supported in part by the Department of Energy Office of Science under award number DE-FG02-08ER25851.  ... 
doi:10.1145/1988796.1988807 fatcat:tijy7lj54zg5lhebwkjbyhvpni

A Survey on Vertical and Horizontal Scaling Platforms for Big Data Analytics

Ahmed Hussein Ali, ICCI, Informatics Institute for Postgraduate Studies, Baghdad, IRAQ, Mahmood Zaki Abdullah, Department of Computer Engineering, Al-Mustansiriyah University, Baghdad, IRAQ
2019 International Journal of Integrated Engineering  
Special thanks to the anonymous reviewers for their valuable suggestions and constructive comments.  ...  Acknowledgement The authors would like to thank ICCI, Informatics Institute for Postgraduate Studies (IIPS_IRAQ) for their moral support.  ...  Although MOA is ideal for machine learning, it is not supported on a large scale. It can only be executed on a single machine and could not be scaled to multiple machines when necessary [51] .  ... 
doi:10.30880/ijie.2019.11.06.015 fatcat:qbtbeq6ukbe5pmpmgkld3r33fe

Remote I/O

Ian Foster, David Kohr, Rakesh Krishnaiyer, Jace Mogill
1997 Proceedings of the fifth workshop on I/O in parallel and distributed systems - IOPADS '97  
We argue instead for a remote I/O paradigm in which programs use familiar parallel I/O interfaces to access remote filesystems.  ...  However, remote I/O also introduces new technical challenges in the areas of portability, performance, and integration with distributed computing systems.  ...  Department of Energy, under Contract W-31-109-Eng-38, in support of the multiagency Scalable I/O project.  ... 
doi:10.1145/266220.266222 dblp:conf/iopads/FosterKKM97 fatcat:4z4kja2yvraqfc6h2wlluv7yby
« Previous Showing results 1 — 15 out of 1,270 results