Filters








266 Hits in 4.0 sec

Extending MPI to Better Support Multi-application Interaction [chapter]

Jay Lofstead, Jai Dayal
2012 Lecture Notes in Computer Science  
Current scientific workflows consist of generally several components either integrated in situ or as completely independent, asynchronous components using centralized storage as an interface. Neither of these approaches are likely to scale well into Exascale. Instead, separate applications and services will be launched using online communication to link these components of the scientific discovery process. Our experiences with coupling multiple, independent MPI applications, each with separate
more » ... rocessing phases, exposes limitations preventing use of some of the optimized mechanisms within the MPI standard. In this regard, we have identified two shortcomings with current MPI implementations. First, MPI intercommunicators offer a mechanism to communicate across application boundaries, but do not address the impact this operating mode has on possible programming models for each separate application. Second, MPI Probe offers a way to interleave both local messaging and remote messages, but has limitations as MPI Bcast and other collective calls are not supported by MPI Probe thus limiting use of optimize collective calls in this operating mode.
doi:10.1007/978-3-642-33518-1_32 fatcat:vfgs523kenf27htorm635jr4z4

D2T: Doubly Distributed Transactions for High Performance and Distributed Computing

Jay Lofstead, Jai Dayal, Karsten Schwan, Ron Oldfield
2012 2012 IEEE International Conference on Cluster Computing  
The Doubly Distributed Transaction (D2T) protocol offers a mechanism for a collection of clients and a separate collection of servers to orchestrate an action with semantics inspired by database ACID-transactions. Our previous work showed good potential, but suffered from known limitations due to the client and server side split coordination. The initial performance of this approach was acceptable and showed potential for good scalability. However, the communication bottleneck at scale between
more » ... he client side coordination and the server side coordination would be unworkable. In this poster, we address these limitations and introduce three new technologies: 1) A client-side coordinator only optimization; 2) Data storage requirements and evaluation for a example transaction aware system; and 3) a metadata system requirements and evaluation for supporting transaction control over entries. Additionally, we were able to find many redundant or unneeded messages in the first version of the protocol that we removed without reducing the offered resilience. Experimental results show that with our refinements, we can attain the same level of transactional guarantees with a measurable decrease in overhead costs.
doi:10.1109/cluster.2012.79 dblp:conf/cluster/LofsteadDSO12 fatcat:vfi66xcw2bekvkifeu6kw66iz4

Efficient transactions for parallel data movement

Jay Lofstead, Jai Dayal, Ivo Jimenez, Carlos Maltzahn
2013 Proceedings of the 8th Parallel Data Storage Workshop on - PDSW '13  
doi:10.1145/2538542.2538567 dblp:conf/sc/LofsteadDJM13 fatcat:uln3gdy4uveolebqpgtmpxe6x4

Efficient, Failure Resilient Transactions for Parallel and Distributed Computing

Jay Lofstead, Jai Dayal, Ivo Jimenez, Carlos Maltzahn
2014 2014 International Workshop on Data Intensive Scalable Computing Systems  
Scientific simulations are moving away from using centralized persistent storage for intermediate data between workflow steps towards an all online model. This shift is motivated by the relatively slow IO bandwidth growth compared with compute speed increases. The challenges presented by this shift to Integrated Application Workflows are motivated by the loss of persistent storage semantics for node-to-node communication. One step towards addressing this semantics gap is using transactions to
more » ... gically delineate a data set from 100,000s of processes to 1000s of servers as an atomic unit. Our previously demonstrated Doubly Distributed Transactions (D 2 T) protocol showed a high-performance solution, but had not explored how to detect and recover from faults. Instead, the focus was on demonstrating high-performance typical case performance. The research presented here addresses fault detection and recovery based on the enhanced protocol design. The total overhead for a full transaction with multiple operations at 65,536 processes is on average 0.055 seconds. Fault detection and recovery mechanisms demonstrate similar performance to the success case with only the addition of appropriate timeouts for the system. This paper explores the challenges in designing a recoverable protocol for doubly distributed transactions, particularly for parallel computing environments.
doi:10.1109/discs.2014.13 dblp:conf/sc/LofsteadDJM14 fatcat:3avk5z7ranetfmnpkxyv3pt3r4

eMOLST: a documentation flow for distributed health informatics

Gregor von Laszewski, Jai Dayal, Lizhe Wang
2011 Concurrency and Computation  
Electronic Health Records (EHRs) have many potential advantages over traditional paper records, such as wide scale access, error checking, and protection from physical damage to a record. As with any medical record, paper or electronic, both the patient's privacy and the document's integrity must be guaranteed. With initiatives such as Integrating the Healthcare Enterprise (IHE), computerized healthcare systems are able to share EHRs on a large scale, while protecting the patient's privacy
more » ... s. However, IHE does not yet meet the needs for all healthcare systems, as we will show with the eMOLST project. The eMOLST project delivers software in support of Medical Order for Life Sustaining Treatment (MOLST) forms and uses IHE specifications for cross enterprise document storage and sharing, patient identification, and user authentication & authorization. The Web based system provides secure access to electronic MOLST documents regardless of the patient's or healthcare provider's location. The eMOLST project allows a user to have Single Sign On (SSO) access to the system from either the user's associated enterprise, or through a Web portal shared amongst all users across all enterprises. In this paper, we show a security solution to allow SSO from multiple access points for IHE compliant systems.
doi:10.1002/cpe.1745 fatcat:w5cg7d7vyfgkfi4fqte5bu2z4i

Using Kansei Engineering with new JIT to accomplish cost advantage

Jay Rajasekera, Shantanu Dayal
2010 International Journal of Biometrics (IJBM)  
action where the process of biding together these concepts also is part of the Kansei (Shimizu, 2004) .Combining the above three definitions we can say that: Kansei = F (Emotions) Rajasekera and Dayal  ...  Rajasekera and Dayal We can have an idea how IT can be useful in capturing the customer requirements by referring to the Figure 3 : The system shown in the Figure 3 , when applied to manufacturing, can  ... 
doi:10.1504/ijbm.2010.031795 fatcat:7y4jeoq4obaktgntlqaiprmdxy

Provide Virtual Distributed Environments for Grid computing on demand

Lizhe Wang, Gregor von Laszewski, Marcel Kunze, Jie Tao, Jai Dayal
2010 Advances in Engineering Software  
Grid users always expect to meet some challenges to employ Grid resources, such as customized computing environment and QoS support. In this paper, we propose a new methodology for Grid computing -to use virtual machines as computing resources and provide Virtual Distributed Environments (VDE) for Grid users. It is declared that employing virtual environment for Grid computing can bring various advantages, for instance, computing environment customization, QoS guarantee and easy management. A
more » ... ght weight Grid middleware, Grid Virtualization Engine, is developed accordingly to provide functions of building virtual environment for Grids. We also present a typical use case, on-demand build a virtual e-Science infrastructure to justify the methodology.
doi:10.1016/j.advengsoft.2009.09.002 fatcat:u7cuc2fzlne25bqjtctbcndbwy

I/O Containers: Managing the Data Analytics and Visualization Pipelines of High End Codes

Jai Dayal, Jianting Cao, Greg Eisenhauer, Karsten Schwan, Matthew Wolf, Fang Zheng, Hasan Abbasi, Scott Klasky, Norbert Podhorszki, Jay Lofstead
2013 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum  
Lack of I/O scalability is known to cause measurable slowdowns for large-scale scientific applications running on high end machines. This is prompting researchers to devise 'I/O staging' methods in which outputs are processed via online analysis and visualization methods to support desired science outcomes. Organized as online workflows and carried out in I/O pipelines, these analysis components run concurrently with science simulations, often using a smaller set of nodes on the high end
more » ... termed 'staging areas'. This paper presents a new approach to dealing with several challenges arising for such online analytics, including: how to efficiently run multiple analytics components on staging area resources providing them with the levels of end-to-end performance they need and how to manage staging resources when analytics actions change due to user or data-dependent behavior. Our approach designs and implements middleware constructs that delineate and manage I/O pipeline resources called 'I/O Containers'. Experimental evaluations of containers with realistic scientific applications demonstrate the feasibility and utility of the approach.
doi:10.1109/ipdpsw.2013.198 dblp:conf/ipps/DayalCESWZAKPL13 fatcat:pk2jnw4dkvhghkacrcue5kdmdm

System-Level Support for Composition of Applications

Brian Kocoloski, Kevin Pedretti, Patrick G. Bridges, John Lange, Hasan Abbasi, David E. Bernholdt, Terry R. Jones, Jai Dayal, Noah Evans, Michael Lang, Jay Lofstead
2015 Proceedings of the 5th International Workshop on Runtime and Operating Systems for Supercomputers - ROSS '15  
Current HPC system software lacks support for emerging application deployment scenarios that combine one or more simulations with in situ analytics, sometimes called multi-component or multi-enclave applications. This paper presents an initial design study, implementation, and evaluation of mechanisms supporting composite multi-enclave applications in the Hobbes exascale operating system. These mechanisms include virtualization techniques isolating application custom enclaves while using the
more » ... dor-supplied host operating system and high-performance inter-VM communication mechanisms. Our initial single-node performance evaluation of these mechanisms on multi-enclave science applications, both real and proxy, demonstrate the ability to support multi-enclave HPC job composition with minimal performance overhead.
doi:10.1145/2768405.2768412 dblp:conf/hpdc/KocoloskiLABJDE15 fatcat:vih2uxybcjg5npxqplyztple3a

Thermal aware workload scheduling with backfilling for green data centers

Lizhe Wang, Gregor von Laszewski, Jai Dayal, Thomas R. Furlani
2009 2009 IEEE 28th International Performance Computing and Communications Conference  
Data centers now play an important role in modern IT infrastructures. Related research has shown that the energy consumption for data center cooling systems has recently increased significantly. There is also strong evidence to show that high temperatures with in a data center will lead to higher hardware failure rates and thus an increase in maintenance costs. This paper devotes itself in the field of thermal aware resource management for data centers. This paper proposes an analytical model,
more » ... hich describes data center resources with heat transfer properties and workloads with thermal features. Then a thermal aware task scheduling algorithm with backfilling is presented which aims to reduce power consumption and temperatures in a data center. A simulation study is carried out to evaluate the performance of the algorithm. Simulation results show that our algorithm can significantly reduce temperatures in data centers by introducing endurable decline in performance.
doi:10.1109/pccc.2009.5403821 dblp:conf/ipccc/WangLDF09 fatcat:nvt2djoy3rebjixseefxfm5glu

In-situ I/O processing

Fang Zheng, Hasan Abbasi, Jianting Cao, Jai Dayal, Karsten Schwan, Matthew Wolf, Scott Klasky, Norbert Podhorszki
2011 Proceedings of the sixth workshop on Parallel Data Storage - PDSW '11  
Increasingly severe I/O bottlenecks on High-End Computing machines are prompting scientists to process output data during simulation time, "in-situ", and before placing data on disks. This paper argues for flexibility in the implementation of such in-situ data analytics, using measurements and a performance model that demonstrate the potential advantages and limitations of performing analytics at different levels of the I/O hierarchy, including on a machine's compute nodes vs. on separate
more » ... ng" nodes dedicated to analysis tasks. Model and measurement results are guided by realistic large-scale applications running on leadership class machines, and I/O and analytics actions are described as computational dataflow graphs -termed I/O graphsthat combine data movement with 'in transit' operations on data as it is being moved across the I/O hierarchy. Results demonstrate the importance of flexibility in analytics placement and characterize the attributes of analytics operations that lead to different placement decisions.
doi:10.1145/2159352.2159362 fatcat:2644earqanf3rixptfmgjampxa

Task scheduling with ANN-based temperature prediction in a data center: a simulation-based study

Lizhe Wang, Gregor von Laszewski, Fang Huang, Jai Dayal, Tom Frulani, Geoffrey Fox
2011 Engineering with Computers  
High temperatures within a data center can cause a number of problems, such as increased cooling costs and increased hardware failure rates. To overcome this problem, researchers have shown that workload management, focused on a data center's thermal properties, effectively reduces temperatures within a data center. In this paper, we propose a method to predict a workload's thermal effect on a data center, which will be suitable for real-time scenarios. We use machine learning techniques, such
more » ... s artificial neural networks (ANN) as our prediction methodology. We use real data taken from a data center's normal operation to conduct our experiments. To reduce the data's complexity, we introduce a thermal impact matrix to capture the spacial relationship between the data center's heat sources, such as the compute nodes. Our results show that machine learning techniques can predict the workload's thermal effects in a timely manner, thus making them well suited for real-time scenarios. Based on the temperature prediction techniques, we developed a thermal-aware workload scheduling algorithm for data centers, which aims to reduce power consumption and temperatures in a data center. A simulation study is carried out to evaluate the performance of the algorithm. Simulation results show that our algorithm can significantly reduce temperatures in data centers by introducing an endurable decline in performance.
doi:10.1007/s00366-011-0211-4 fatcat:pyalssklbfekpg5pfgve2jdrkq

Does GD 356 have a terrestrial planetary companion?

Dayal T. Wickramasinghe, Jay Farihi, Christopher A. Tout, Lilia Ferrario, Richard J. Stancliffe
2010 Monthly notices of the Royal Astronomical Society  
AC K N OW L E D G M E N T S CAT thanks Churchill College for his fellowship and Profs Dayal Wickramasinghe and John Lattanzio for invitations to work in Australia.  ... 
doi:10.1111/j.1365-2966.2010.16417.x fatcat:cjduccq2mzgsjbkw4mymwpunvm

Towards Thermal Aware Workload Scheduling in a Data Center

Lizhe Wang, Gregor von Laszewski, Jai Dayal, Xi He, Andrew J. Younge, Thomas R. Furlani
2009 2009 10th International Symposium on Pervasive Systems, Algorithms, and Networks  
High density blade servers are a popular technology for data centers, however, the heat dissipation density of data centers increases exponentially. There is strong evidence to support that high temperatures of such data centers will lead to higher hardware failure rates and thus an increase in maintenance costs. Improperly designed or operated data centers may either suffer from overheated servers and potential system failures, or from overcooled systems, causing extraneous utilities cost.
more » ... mizing the cost of operation (utilities, maintenance, device upgrade and replacement) of data centers is one of the key issues involved with both optimizing computing resources and maximizing business outcome. This paper proposes an analytical model, which describes data center resources with heat transfer properties and workloads with thermal features. Then a thermal aware task scheduling algorithm is presented which aims to reduce power consumption and temperatures in a data center. A simulation study is carried out to evaluate the performance of the algorithm. Simulation results show that our algorithm can significantly reduce temperatures in data centers by introducing endurable decline in performance.
doi:10.1109/i-span.2009.22 dblp:conf/ispan/WangLDHYF09 fatcat:3sxrjvl4yjdfznfdtbosgv3mg4

Towards Energy Aware Scheduling for Precedence Constrained Parallel Tasks in a Cluster with DVFS

Lizhe Wang, Gregor von Laszewski, Jay Dayal, Fugang Wang
2010 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing  
Reducing energy consumption for high end computing can bring various benefits such as, reduce operating costs, increase system reliability, and environment respect. This paper aims to develop scheduling heuristics and to present application experience for reducing power consumption of parallel tasks in a cluster with the Dynamic Voltage Frequency Scaling (DVFS) technique. In this paper, formal models are presented for precedence-constrained parallel tasks, DVFS enabled clusters, and energy
more » ... mption. This paper studies the slack time for non-critical jobs, extends their execution time and reduces the energy consumption without increasing the task's execution time as a whole. Additionally, Green Service Level Agreement is also considered in this paper. By increasing task execution time within an affordable limit, this paper develops scheduling heuristics to reduce energy consumption of a tasks execution and discusses the relationship between energy consumption and task execution time. Models and scheduling heuristics are examined with a simulation study. Test results justify the design and implementation of proposed energy aware scheduling heuristics in the paper.
doi:10.1109/ccgrid.2010.19 dblp:conf/ccgrid/WangLDW10 fatcat:kiu5oibsjjaphedihykmeeuiwa
« Previous Showing results 1 — 15 out of 266 results