Filters








25 Hits in 1.1 sec

FLEXclusion

Jaewoong Sim, Jaekyu Lee, Moinuddin K. Qureshi, Hyesoon Kim
2012 SIGARCH Computer Architecture News  
Exclusive last-level caches (LLCs) reduce memory accesses by effectively utilizing cache capacity. However, they require excessive on-chip bandwidth to support frequent insertions of cache lines on eviction from upper-level caches. Non-inclusive caches, on the other hand, have the advantage of using the on-chip bandwidth more effectively but suffer from a higher miss rate. Traditionally, the decision to use the cache as exclusive or non-inclusive is made at design time. However, the best option
more » ... er, the best option for a cache organization depends on application characteristics, such as working set size and the amount of traffic consumed by LLC insertions. This paper proposes FLEXclusion, a design that dynamically selects between exclusion and non-inclusion depending on workload behavior. With FLEXclusion, the cache behaves like an exclusive cache when the application benefits from extra cache capacity, and it acts as a non-inclusive cache when additional cache capacity is not useful, so that it can reduce on-chip bandwidth. FLEXclusion leverages the observation that both non-inclusion and exclusion rely on similar hardware support, so our proposal can be implemented with negligible hardware changes. Our evaluations show that a FLEXclusive cache reduces the on-chip LLC insertion traffic by 72.6% compared to an exclusive design and improves performance by 5.9% compared to a non-inclusive design.
doi:10.1145/2366231.2337196 fatcat:msn7gafeerbd7pzcsbctctwjni

Supporting CUDA for an extended RISC-V GPU architecture [article]

Ruobing Han, Blaise Tine, Jaewon Lee, Jaewoong Sim, Hyesoon Kim
2021 arXiv   pre-print
Sim, and Hyesoon Kim https://github.com/ROCm-Developer-Tools/HIPIFY https://github.com/vortexgpgpu/pocl 5 https://vortex.cc.gatech.edu/  ...  texture no Table 2 https://github.com/gthparch/NVPTX-SPIRV-Translator arXiv:2109.00673v1 [cs.PL] 2 Sep 2021 Conference'17, July 2017, Washington, DC, USA Ruobing Han, Blaise Tine, Jaewon Lee, Jaewoong  ... 
arXiv:2109.00673v1 fatcat:c5hebaydfrdf7lphewbxcdyeoa

COX: CUDA on X86 by Exposing Warp-Level Functions to CPUs [article]

Ruobing Han, Jaewon Lee, Jaewoong Sim, Hyesoon Kim
2021 arXiv   pre-print
As CUDA programs become the de facto program among data parallel applications such as high-performance computing or machine learning applications, running CUDA on other platforms has been a compelling option. Although several efforts have attempted to support CUDA on other than NVIDIA GPU devices, due to extra steps in the translation, the support is always behind a few years from supporting CUDA's latest features. The examples are DPC, Hipfy, where CUDA source code have to be translated to
more » ... e translated to their native supporting language and then they are supported. In particular, the new CUDA programming model exposes the warp concept in the programming language, which greatly changes the way the CUDA code should be mapped to CPU programs. In this paper, hierarchical collapsing that correctly supports CUDA warp-level functions on CPUs is proposed. Based on hierarchical collapsing, a framework, COX, is developed that allows CUDA programs with the latest features to be executed efficiently on CPU platforms. COX consists of a compiler IR transformation (new LLVM pass) and a runtime system to execute the transformed programs on CPU devices. COX can support the most recent CUDA features, and the application coverage is much higher (90 also show that the warp-level functions in CUDA can be efficiently executed by utilizing CPU SIMD (AVX) instructions.
arXiv:2112.10034v1 fatcat:l7pgnpsjgzfrviptgblu3f5r5u

A performance analysis framework for identifying potential benefits in GPGPU applications

Jaewoong Sim, Aniruddha Dasgupta, Hyesoon Kim, Richard Vuduc
2012 SIGPLAN notices  
Tuning code for GPGPU and other emerging many-core platforms is a challenge because few models or tools can precisely pinpoint the root cause of performance bottlenecks. In this paper, we present a performance analysis framework that can help shed light on such bottlenecks for GPGPU applications. Although a handful of GPGPU profiling tools exist, most of the traditional tools, unfortunately, simply provide programmers with a variety of measurements and metrics obtained by running applications,
more » ... ning applications, and it is often difficult to map these metrics to understand the root causes of slowdowns, much less decide what next optimization step to take to alleviate the bottleneck. In our approach, we first develop an analytical performance model that can precisely predict performance and aims to provide programmer-interpretable metrics. Then, we apply static and dynamic profiling to instantiate our performance model for a particular input code and show how the model can predict the potential performance benefits. We demonstrate our framework on a suite of micro-benchmarks as well as a variety of computations extracted from real codes.
doi:10.1145/2370036.2145819 fatcat:763qhlvghvfo7k6mxu73jbgpg4

Resilient die-stacked DRAM caches

Jaewoong Sim, Gabriel H. Loh, Vilas Sridharan, Mike O'Connor
2013 Proceedings of the 40th Annual International Symposium on Computer Architecture - ISCA '13  
Part of this work was conducted while Jaewoong Sim was on an internship at AMD Research. Jaewoong Sim is also supported by NSF award number 1139083.  ... 
doi:10.1145/2485922.2485958 dblp:conf/isca/SimLSO13 fatcat:n6uix5stxnag7dambcoqhtoy4a

FLEXclusion: Balancing cache capacity and on-chip bandwidth via Flexible Exclusion

Jaewoong Sim, Jaekyu Lee, Moinuddin K. Qureshi, Hyesoon Kim
2012 2012 39th Annual International Symposium on Computer Architecture (ISCA)  
Exclusive last-level caches (LLCs) reduce memory accesses by effectively utilizing cache capacity. However, they require excessive on-chip bandwidth to support frequent insertions of cache lines on eviction from upper-level caches. Non-inclusive caches, on the other hand, have the advantage of using the on-chip bandwidth more effectively but suffer from a higher miss rate. Traditionally, the decision to use the cache as exclusive or non-inclusive is made at design time. However, the best option
more » ... er, the best option for a cache organization depends on application characteristics, such as working set size and the amount of traffic consumed by LLC insertions. This paper proposes FLEXclusion, a design that dynamically selects between exclusion and non-inclusion depending on workload behavior. With FLEXclusion, the cache behaves like an exclusive cache when the application benefits from extra cache capacity, and it acts as a non-inclusive cache when additional cache capacity is not useful, so that it can reduce on-chip bandwidth. FLEXclusion leverages the observation that both non-inclusion and exclusion rely on similar hardware support, so our proposal can be implemented with negligible hardware changes. Our evaluations show that a FLEXclusive cache reduces the on-chip LLC insertion traffic by 72.6% compared to an exclusive design and improves performance by 5.9% compared to a non-inclusive design.
doi:10.1109/isca.2012.6237028 dblp:conf/isca/SimLQK12 fatcat:noqu2fegxjhfxdlg3n7bf4cy34

Transparent Hardware Management of Stacked DRAM as Part of Memory

Jaewoong Sim, Alaa R. Alameldeen, Zeshan Chishti, Chris Wilkerson, Hyesoon Kim
2014 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture  
Part of this work was conducted while Jaewoong Sim was on an internship at Intel Labs. We acknowledge the support of Intel, Sandia National Laboratories, and NSF CAREER award 1054830.  ... 
doi:10.1109/micro.2014.56 dblp:conf/micro/SimACWK14 fatcat:ad3avdcvqfg7telvtr7vwficty

BSSync: Processing Near Memory for Machine Learning Workloads with Bounded Staleness Consistency Models

Joo Hwan Lee, Jaewoong Sim, Hyesoon Kim
2015 2015 International Conference on Parallel Architecture and Compilation (PACT)  
Parallel machine learning workloads have become prevalent in numerous application domains. Many of these workloads are iterative convergent, allowing different threads to compute in an asynchronous manner, relaxing certain readafter-write data dependencies to use stale values. While considerable effort has been devoted to reducing the communication latency between nodes by utilizing asynchronous parallelism, inefficient utilization of relaxed consistency models within a single node have caused
more » ... e node have caused parallel implementations to have low execution efficiency. The long latency and serialization caused by atomic operations have a significant impact on performance. The data communication is not overlapped with the main computation, which reduces execution efficiency. The inefficiency comes from the data movement between where they are stored and where they are processed. In this work, we propose Bounded Staled Sync (BSSync), a hardware support for the bounded staleness consistency model, which accompanies simple logic layers in the memory hierarchy. BSSync overlaps the long latency atomic operation with the main computation, targeting iterative convergent machine learning workloads. Compared to previous work that allows staleness for read operations, BSSync utilizes staleness for write operations, allowing stale-writes. We demonstrate the benefit of the proposed scheme for representative machine learning workloads. On average, our approach outperforms the baseline asynchronous parallel implementation by 1.33x times.
doi:10.1109/pact.2015.42 dblp:conf/IEEEpact/LeeSK15 fatcat:ey5nkob5uvgq3imfydkz5ist44

A Mostly-Clean DRAM Cache for Effective Hit Speculation and Self-Balancing Dispatch

Jaewoong Sim, Gabriel H. Loh, Hyesoon Kim, Mike OConnor, Mithuna Thottethodi
2012 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture  
Part of this work was conducted while Jaewoong Sim was on an internship and Mithuna Thottethodi was on sabbatical leave at AMD Research.  ... 
doi:10.1109/micro.2012.31 dblp:conf/micro/SimLKOT12 fatcat:gcmnluolizda7hiwrho2j4pek4

A performance analysis framework for identifying potential benefits in GPGPU applications

Jaewoong Sim, Aniruddha Dasgupta, Hyesoon Kim, Richard Vuduc
2012 Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming - PPoPP '12  
Tuning code for GPGPU and other emerging many-core platforms is a challenge because few models or tools can precisely pinpoint the root cause of performance bottlenecks. In this paper, we present a performance analysis framework that can help shed light on such bottlenecks for GPGPU applications. Although a handful of GPGPU profiling tools exist, most of the traditional tools, unfortunately, simply provide programmers with a variety of measurements and metrics obtained by running applications,
more » ... ning applications, and it is often difficult to map these metrics to understand the root causes of slowdowns, much less decide what next optimization step to take to alleviate the bottleneck. In our approach, we first develop an analytical performance model that can precisely predict performance and aims to provide programmer-interpretable metrics. Then, we apply static and dynamic profiling to instantiate our performance model for a particular input code and show how the model can predict the potential performance benefits. We demonstrate our framework on a suite of micro-benchmarks as well as a variety of computations extracted from real codes.
doi:10.1145/2145816.2145819 dblp:conf/ppopp/SimDKV12 fatcat:fo4fdxdlo5ebzm2pr3lw7v77cy

Wnt-C59 inhibits proinflammatory cytokine expression by reducing the interaction between β-catenin and NF-κB in LPS-stimulated epithelial and macrophage cells

Jaewoong Jang, Jaewon Song, Inae Sim, Yoosik Yoon
2021 Korean Journal of Physiology and Pharmacology  
Dysregulation of the Wnt pathway causes various diseases including cancer, Parkinson's disease, Alzheimer's disease, schizophrenia, osteoporosis, obesity and chronic kidney diseases. The modulation of dysregulated Wnt pathway is absolutely necessary. In the present study, we evaluated the anti-inflammatory effect and the mechanism of action of Wnt-C59, a Wnt signaling inhibitor, in lipopolysaccharide (LPS)-stimulated epithelial cells and macrophage cells. Wnt-C59 showed a dose-dependent
more » ... e-dependent anti-inflammatory effect by suppressing the expression of proinflammatory cytokines including IL6, CCL2, IL1A, IL1B, and TNF in LPS-stimulated cells. The dysregulation of the Wnt/-catenin pathway in LPS stimulated cells was suppressed by Wnt-C59 treatment. The level of -catenin, the executor protein of Wnt/-catenin pathway, was elevated by LPS and suppressed by Wnt-C59. Overexpression of -catenin rescued the suppressive effect of Wnt-C59 on proinflammatory cytokine expression and nuclear factor-kappa B (NF-B) activity. We found that the interaction between -catenin and NF-B, measured by co-immunoprecipitation assay, was elevated by LPS and suppressed by Wnt-C59 treatment. Both NF-B activity for its target DNA binding and the reporter activity of NF-B-responsive promoter showed identical patterns with the interaction between -catenin and NF-B. Altogether, our findings suggest that the anti-inflammatory effect of Wnt-C59 is mediated by the reduction of the cellular level of -catenin and the interaction between -catenin and NF-B, which results in the suppressions of the NF-B activity and proinflammatory cytokine expression. https://doi.org/10.4196/kjpp.
doi:10.4196/kjpp.2021.25.4.307 fatcat:enna4h33qbbxjhozt7s424qkxa

LGK974 suppresses lipopolysaccharide-induced endotoxemia in mice by modulating the crosstalk between the Wnt/β-catenin and NF-κB pathways

Jaewoong Jang, Jaewon Song, Hyunji Lee, Inae Sim, Young V. Kwon, Eek-hoon Jho, Yoosik Yoon
2021 Experimental and Molecular Medicine  
AbstractEndotoxemia, a type of sepsis caused by gram-negative bacterial endotoxin [i.e., lipopolysaccharide (LPS)], is associated with manifestations such as cytokine storm; failure of multiple organs, including the liver; and a high mortality rate. We investigated the effect and mechanism of action of LGK974, a Wnt signaling inhibitor, in mice with LPS-induced endotoxemia, an animal model of sepsis. LGK974 significantly and dose-dependently increased the survival rate and reduced plasma
more » ... duced plasma cytokine levels in mice with LPS-induced endotoxemia. Transcriptome analysis of liver tissues revealed significant changes in the expression of genes associated with the Wnt pathway as well as cytokine and NF-κB signaling during endotoxemia. LGK974 treatment suppressed the activation of NF-κB signaling and cytokine expression as well as the Wnt/β-catenin pathway in the livers of endotoxemic mice. Coimmunoprecipitation of phospho-IκB and β-transducin repeat-containing protein (β-TrCP) was increased in the livers of endotoxemic mice but was reduced by LGK974 treatment. Moreover, LGK974 treatment decreased the coimmunoprecipitation and colocalization of β-catenin and NF-κB, which were elevated in the livers of endotoxemic mice. Our results reveal crosstalk between the Wnt/β-catenin and NF-κB pathways via interactions between β-TrCP and phospho-IκB and between β-catenin and NF-κB during endotoxemia. The results of this study strongly suggest that the crosstalk between the Wnt/β-catenin and NF-κB pathways contributes to the mutual activation of these two pathways during endotoxemia, which results in amplified cytokine production, liver damage and death, and that LGK974 suppresses this vicious amplification cycle by reducing the crosstalk between these two pathways.
doi:10.1038/s12276-021-00577-z pmid:33692475 pmcid:PMC8080716 fatcat:ochsntrasrhpbfcrxkvcwzntne

Wnt-Signaling Inhibitor Wnt-C59 Suppresses the Cytokine Upregulation in Multiple Organs of Lipopolysaccharide-Induced Endotoxemic Mice via Reducing the Interaction between β-Catenin and NF-κB

Jaewoong Jang, Jaewon Song, Inae Sim, Young V. Kwon, Yoosik Yoon
2021 International Journal of Molecular Sciences  
Sepsis is characterized by multiple-organ dysfunction caused by the dysregulated host response to infection. Until now, however, the role of the Wnt signaling has not been fully characterized in multiple organs during sepsis. This study assessed the suppressive effect of a Wnt signaling inhibitor, Wnt-C59, in the kidney, lung, and liver of lipopolysaccharide-induced endotoxemic mice, serving as an animal model of sepsis. We found that Wnt-C59 elevated the survival rate of these mice and
more » ... se mice and decreased their plasma levels of proinflammatory cytokines and organ-damage biomarkers, such as BUN, ALT, and AST. The Wnt/β-catenin and NF-κB pathways were stimulated and proinflammatory cytokines were upregulated in the kidney, lung, and liver of endotoxemic mice. Wnt-C59, as a Wnt signaling inhibitor, inhibited the Wnt/β-catenin pathway, and its interaction with the NF-κB pathway, which resulted in the inhibition of NF-κB activity and proinflammatory cytokine expression. In multiple organs of endotoxemic mice, Wnt-C59 significantly reduced the β-catenin level and interaction with NF-κB. Our findings suggest that the anti-endotoxemic effect of Wnt-C59 is mediated via reducing the interaction between β-catenin and NF-κB, consequently suppressing the associated cytokine upregulation in multiple organs. Thus, Wnt-C59 may be useful for the suppression of the multiple-organ dysfunction during sepsis.
doi:10.3390/ijms22126249 fatcat:g3spjfkojfgwxgxl6cnxcugl6i

Table of contents

2019 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)  
Sim (Intel Corporation), Phillip Tomson (Intel Corporation), Huseyin Sumbul (Intel Corporation), Gregory Chen (Intel Corporation), Phil Knag (Intel Corporation), Raghavan Kumar (Intel Corporation),  ...  Together: FPGA-ASIC Integration for Persistent RNNs 199 Eriko Nurvitadhi (Intel Corporation), Dongup Kwon (Intel Corporation), Ali Jafari (Intel Corporation), Andrew Boutros (Intel Corporation), Jaewoong  ... 
doi:10.1109/fccm.2019.00004 fatcat:qku57w2j2vfs3kluykjmqfbzya

Program/Erase Characteristics of Amorphous Gallium Indium Zinc Oxide Nonvolatile Memory

Huaxiang Yin, Sunil Kim, Hyuck Lim, Yosep Min, Chang Jung Kim, Ihun Song, Jaechul Park, Sang-Wook Kim, Alexander Tikhonovsky, Jaewoong Hyun, Youngsoo Park
2008 IEEE Transactions on Electron Devices  
Jaewoong Hyun is currently working toward the Ph.D. degree at the Harvard University, MA. He was with the Semiconductor Device Laboratory, Samsung Advanced Institute of Technology, Yongin, Korea.  ...  The cross-sectional view of the transmission electron microscopy (TEM) image, high-resolution TEM (HRTEM) image, and secondary ion mass spectrometry (SIMS) depth profile analysis data is shown in Fig.  ... 
doi:10.1109/ted.2008.926727 fatcat:x4c2blw6djdxtncmt7zlbjx4iu
« Previous Showing results 1 — 15 out of 25 results