Filters








20,773 Hits in 7.1 sec

An efficient uniform run-time scheme for mixed regular-irregular applications

Dhruva R. Chakrabarti, Nagaraj Shenoy, Alok Choudhary, Prithviraj Banerjee
1998 Proceedings of the 12th international conference on Supercomputing - ICS '98  
Furthermore, we also show that code generated for regular accesses using compile-time schemes are not alzvays compatible to code generated for irregular accesses using run-time schemes.  ...  This study presents a uniform scheme to handle both regular and irregular accesses in a mixed regularirregular application.  ...  Solution for Mixed Regular-Irregular Applications We advocate using a uniform run-time scheme for handling both regular and irregular accesses in mixed regular-irregular applications.  ... 
doi:10.1145/277830.277848 dblp:conf/ics/ChakrabartiSCB98 fatcat:gwj7kg77srg2vekhx2qooczlfq

RISO

Hang Lu, Guihai Yan, Yinhe Han, Binzhang Fu, Xiaowei Li
2013 Proceedings of the 50th Annual Design Automation Conference on - DAC '13  
It permits underutilized links to be shared by multiple applications, at the same time keeps the aggregated traffic in check to enforce performance isolation.  ...  The experimental results show that the consolidation density is improved more than 12% in comparison with previous strict isolation scheme, meanwhile reducing network latency by 38.4% on average.  ...  running time.  ... 
doi:10.1145/2463209.2488781 dblp:conf/dac/LuYHF013 fatcat:s37syaejvzd77opiafnnykaww4

Multigrid on GPU: Tackling Power Grid Analysis on parallel SIMT platforms

Zhuo Feng, Peng Li
2008 2008 IEEE/ACM International Conference on Computer-Aided Design  
For the first time, we show how to exploit recent massively parallel single-instruction multiple-thread (SIMT) based graphics processing unit (GPU) platforms to tackle power grid analysis with promising  ...  Different from the standard CPU based CAD development, care must be taken to balance between computing and memory access, reduce random memory access patterns and simplify flow control to achieve efficiency  ...  Then, by examining the pitches in the collapsed 2D irregular grid, a fixed uniform pitch is chosen for the X and Y directions for the final 2D regular grid, on which all the circuit elements are mapped  ... 
doi:10.1109/iccad.2008.4681645 dblp:conf/iccad/FengL08 fatcat:pomfnp2rrffz3jvfskpxhwtisa

Efficient GPU rendering of subdivision surfaces using adaptive quadtrees

Wade Brainerd, Tim Foley, Manuel Kraemer, Henry Moreton, Matthias Nießner
2016 ACM Transactions on Graphics  
By traversing the quadtree for each post-tessellation vertex, we are able to accurately and efficiently evaluate the limit surface.  ...  In addition, our streaming formulation makes it easier to integrate subdivision surfaces into applications and shader code written for polygonal models.  ...  for their valuable feedback.  ... 
doi:10.1145/2897824.2925874 fatcat:sdf3vn5t2jaldfxjqziirobkga

Control of turbulence in oscillatory reaction-diffusion systems through a combination of global and local feedback

Michael Stich, Alfonso C. Casal, Jesús Ildefonso Díaz
2007 Physical Review E  
Numerical simulations show that while a purely local control is unsuitable to produce uniform oscillations, a mixed local and global control can be efficient and also able to create other patterns such  ...  can be used to suppress turbulence by inducing uniform oscillations.  ...  For time integration, we use an explicit Euler scheme with ⌬t = 0.002 ͑⌬t = 0.001 for single simulations to assure conver-gence͒.  ... 
doi:10.1103/physreve.76.036209 pmid:17930325 fatcat:o6foh46e3jhyrf37ttijob34dy

Irregular Coarse-Grain Data Parallelism under LPARX

Scott R. Kohn, Scott B. Baden
1996 Scientific Programming  
LPARX is a software development tool for implementing dynamic, irregular scientific applications, such as multilevel finite difference and particle methods, on high-performance multiple instruction multiple  ...  LPARX, implemented as a C++ class library, is currently running on diverse MIMD platforms, including the Intel Paragon, Cray C-90, IBM SP2, and networks of workstations running under PVM.  ...  Intel Paragon and Cray C-90 time were provided by a UCSD School of Engineering Block Grant.  ... 
doi:10.1155/1996/701628 fatcat:ln7pks2jxvcglbszomu2dvtbri

Adaptive heterogeneous scheduling for integrated GPUs

Rashid Kaleem, Rajkishore Barik, Tatiana Shpeisman, Brian T. Lewis, Chunling Hu, Keshav Pingali
2014 Proceedings of the 23rd international conference on Parallel architectures and compilation - PACT '14  
We evaluate our asymmetric scheduling algorithm on a desktop system with an Intel 4 th Generation Core Processor using a set of sixteen regular and irregular workloads from diverse application areas.  ...  As a consequence, programmers can effectively utilize both the CPU and GPU to execute a single application. This paper presents novel adaptive scheduling techniques for integrated CPU-GPU processors.  ...  We present an experimental evaluation of our algorithms on an Intel 4 th Generation Core Processor using an extensive set of sixteen benchmarks which comprise of a mix of regular and irregular code.  ... 
doi:10.1145/2628071.2628088 dblp:conf/IEEEpact/KaleemBSLHP14 fatcat:u6245bme6nbqtodioaalmn6mcu

Real-Time Virtual Resource: A Timely Abstraction for Embedded Systems [chapter]

Aloysius K. Mok, Alex Xiang Feng
2002 Lecture Notes in Computer Science  
necessary for schedulability analysis.  ...  The real-time virtual resource abstraction allows tasks with wide-ranging timing criticality to be programmed as if they run on dedicated but slower CPUs such that global knowledge of the tasks is not  ...  Introduction As embedded systems become more complex, a typical embedded system will likely involve a mix of soft and hard real-time applications that share the same embedded run-time platform.  ... 
doi:10.1007/3-540-45828-x_14 fatcat:2l5vppe6trevtlx3e6yni32ibm

Mesh type tradeoffs in 2D hydrodynamic modeling of flooding with a Godunov-based flow solver

Byunghyun Kim, Brett F. Sanders, Jochen E. Schubert, James S. Famiglietti
2014 Advances in Water Resources  
times and L1 norms for uniform flow in a trapezoidal channel.  ...  Bed elevation Table 3 3 Properties of meshes, run times and L1 norms for dam-break flow with an uneven bottom.  ...  Numerical scheme A.1. Topographic and water storage model Eq. (4) are discretized on an unstructured mesh of N v vertices, N c triangular and/or quadrilateral cells, and N e edges.  ... 
doi:10.1016/j.advwatres.2014.02.013 fatcat:apbkq2pttfbu7pw6wxdxhtf5ia

Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators

Yunsup Lee, Rimas Avizienis, Alex Bishara, Richard Xia, Derek Lockhart, Christopher Batten, Krste Asanović
2011 Proceeding of the 38th annual international symposium on Computer architecture - ISCA '11  
We find the vector cores provide greater efficiency than the MIMD cores, even on fairly irregular kernels.  ...  Using an extensive design-space exploration of full VLSI implementations of many accelerator design points, we evaluate the varying tradeoffs between programmability and implementation efficiency among  ...  ACKNOWLEDGMENTS The authors acknowledge and thank Jiongjia Fang and Ji Kim for their help writing application kernels, Christopher Celio for his help writing Maven software and developing the vector-SIMD  ... 
doi:10.1145/2000064.2000080 dblp:conf/isca/LeeABXLBA11 fatcat:wwaa7pxzhffkdpkxtglil3lu3y

Exploring Fine-Grained Task-Based Execution on Multi-GPU Systems

Long Chen, Oreste Villa, Guang R. Gao
2011 2011 IEEE International Conference on Cluster Computing  
Experiments with a molecular dynamics application show that, for nonuniform distributed workload, the solutions based on our framework achieve good load balance, and considerable performance improvement  ...  solving the above issues and efficiently utilizing multi-GPU systems.  ...  Introduction How to efficiently utilize single-GPU systems for general purpose scientific computing has been investigated for many applications.  ... 
doi:10.1109/cluster.2011.50 dblp:conf/cluster/ChenVG11 fatcat:nk3qod6lengutiveiihw55zkjq

Exploring the Tradeoffs between Programmability and Efficiency in Data-Parallel Accelerators

Yunsup Lee, Rimas Avizienis, Alex Bishara, Richard Xia, Derek Lockhart, Christopher Batten, Krste Asanović
2013 ACM Transactions on Computer Systems  
We find the vector cores provide greater efficiency than the MIMD cores, even on fairly irregular kernels.  ...  Using an extensive design-space exploration of full VLSI implementations of many accelerator design points, we evaluate the varying tradeoffs between programmability and implementation efficiency among  ...  ACKNOWLEDGMENTS The authors acknowledge and thank Jiongjia Fang and Ji Kim for their help writing application kernels, Christopher Celio for his help writing Maven software and developing the vector-SIMD  ... 
doi:10.1145/2518037.2491464 fatcat:o4rpknhgijdrrkra6deyuft76m

Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators

Yunsup Lee, Rimas Avizienis, Alex Bishara, Richard Xia, Derek Lockhart, Christopher Batten, Krste Asanović
2011 SIGARCH Computer Architecture News  
We find the vector cores provide greater efficiency than the MIMD cores, even on fairly irregular kernels.  ...  Using an extensive design-space exploration of full VLSI implementations of many accelerator design points, we evaluate the varying tradeoffs between programmability and implementation efficiency among  ...  ACKNOWLEDGMENTS The authors acknowledge and thank Jiongjia Fang and Ji Kim for their help writing application kernels, Christopher Celio for his help writing Maven software and developing the vector-SIMD  ... 
doi:10.1145/2024723.2000080 fatcat:yfs2b427mbd3pbj32peqgr3lfy

Supporting computational data model representation with high-performance I/O in parallel netCDF

Kui Gao, Chen Jin, Alok Choudhary, Wei-keng Liao
2011 2011 18th International Conference on High Performance Computing  
This scheme also allows concurrent metadata construction for different data objects from multiple groups of application processes, an important feature in obtaining a high degree of I/O parallelism for  ...  Using an example of adaptive mesh refinement data model, we demonstrate the proposed scheme can produce scalable performance results for both data and metadata creation and access.  ...  Last but not the least, the total run time of applications will be reduced.  ... 
doi:10.1109/hipc.2011.6152746 dblp:conf/hipc/GaoJCL11 fatcat:vsxxaubivzd5pf6rvo2l4y62gu

Performance dependence of GPU accelerated sparse linear system solvers on the finite element mesh structure in micromagnetic simulations

Friedrich Zahn, Wolfgang Nagel, Jeronimo Castrillon, Holger Brunst, Attila Kákay
2020 Zenodo  
In the same way, this discriminator can also be applied to choose an appropriate sparse matrix vector multiplication routine for a specific matrix.  ...  All performance metrics taken are both logged for each run, as well as added to an average for each metric, where applicable.  ...  As highly regular meshes the µMAG standard problem #4 thin film strip and a regular cuboid were chosen. Mixed-regularity meshes were produced for a moebius strip and a disk.  ... 
doi:10.5281/zenodo.4479180 fatcat:hatggsupj5d4xhowfyksy2677a
« Previous Showing results 1 — 15 out of 20,773 results