Filters








370 Hits in 4.6 sec

Automatic Throughput and Critical Path Analysis of x86 and ARM Assembly Kernels [article]

Jan Laukemann, Julian Hammer, Georg Hager, Gerhard Wellein
2019 arXiv   pre-print
While an instruction throughput prediction sets a lower bound to the kernel runtime, the critical path defines an upper bound.  ...  Useful models of loop kernel runtimes on out-of-order architectures require an analysis of the in-core performance behavior of instructions and their dependencies.  ...  Summary We have shown that automatic extraction, throughput, and critical path analysis of assembly loop kernels is feasible using our cross-platform tool OSACA.  ... 
arXiv:1910.00214v2 fatcat:yiowdsjml5hh5lx5jxk4januzq

Automated Instruction Stream Throughput Prediction for Intel and AMD Microarchitectures

Jan Laukemann, Julian Hammer, Johannes Hofmann, Georg Hager, Gerhard Wellein
2018 2018 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS)  
An accurate prediction of scheduling and execution of instruction streams is a necessary prerequisite for predicting the in-core performance behavior of throughput-bound loop kernels on out-of-order processor  ...  To validate the constructed models, we apply them to several assembly kernels and compare runtime predictions with actual measurements.  ...  Summary Using our Open-Source Architecture Code Analyzer (OS-ACA) we have shown that a partially automatic machine model construction and fully automatic throughput analysis of loop kernels based on benchmarking  ... 
doi:10.1109/pmbs.2018.8641578 dblp:conf/sc/LaukemannHHHW18 fatcat:ypd7fgflorcofi4klzg3knh5ru

Undo Workarounds for Kernel Bugs

Seyed Mohammadjavad Seyed Talebi, Zhihao Yao, Ardalan Amiri Sani, Zhiyun Qian, Daniel Austin
2021 USENIX Security Symposium  
We also present a static analysis tool, called Hecaton, that generates bowknots automatically and inserts them into the kernel.  ...  Through extensive evaluations on the kernel of Android devices as well as x86 upstream kernels, we demonstrate that bowknots are effective in mitigating kernel bugs and vulnerabilities.  ...  Acknowledgments The work was supported in part by NSF Awards #1953932, #1953933, #1846230, #1617481, and #1617513. We thank the anonymous reviewers for their insightful comments.  ... 
dblp:conf/uss/TalebiYSQA21 fatcat:kvviid3zrbh53kswzyzdcp4yku

Edge computing

Antonio Barbalace, Mohamed L. Karaoui, Wei Wang, Tong Xing, Pierre Olivier, Binoy Ravindran
2020 Proceedings of the 16th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments  
Furthermore, we show the benefits of H-Container in real scenarios, proving for example up to 94% increase in Redis throughput when unlocking heterogeneity.  ...  H-Container targets Linux, adopts LLVM, extends CRIU, and integrates with Docker.  ...  This work was supported in part by US Office of Naval Research under grants N00014-16-1-2104, N00014-16-1-2711, and N00014-19-1-2493.  ... 
doi:10.1145/3381052.3381321 dblp:conf/vee/BarbalaceKWXOR20 fatcat:zewcev3d2nhgfgyz7q6eo3yo4i

Silentium! Run-Analyse-Eradicate the Noise out of the DB/OS Stack [article]

Wolfgang Mauerer, Ralf Ramsauer, Edson R. F. Lucas, Stefanie Scherzinger
2021 arXiv   pre-print
We then critically discuss these findings in the light of a broader family of database systems (e.g., including disk-based), and how to extend the approach of this paper accordingly.  ...  We discuss these results in the context of ongoing efforts to build custom operating systems for database workloads, and point out that for certain use cases, the margin for improvement is rather narrow  ...  The information and results set out in this publication are those of the authors and do not necessarily reflect the opinion of the ECSEL Joint Undertaking.  ... 
arXiv:2102.06219v2 fatcat:2vjttspji5hifhfveaiwtme65e

sysfilter: Automated System Call Filtering for Commodity Software

Nicholas DeMarinis, Kent Williams-King, Di Jin, Rodrigo Fonseca, Vasileios P. Kemerlis
2020 International Symposium on Recent Advances in Intrusion Detection  
We implement sysfilter for x86-64 Linux, and present a set of program analyses for constructing system call sets statically, and in a scalable, precise, and complete (safe over-approximation) manner.  ...  To tackle this problem, we present sysfilter: a binary analysis-based framework that automatically (1) limits what OS services attackers can (ab)use, by enforcing the principle of least privilege with  ...  Any opinions, findings, and conclusions or recommendations expressed herein are those of the authors and do not necessarily reflect the views of the US government, ONR, or DARPA.  ... 
dblp:conf/raid/DeMarinisWJFK20 fatcat:4bon5uqqcbfdpfjgzeflncflqu

Fuzzing of Embedded Systems: A Survey

Joobeom Yun, Fayozbek Rustamov, Juhwan Kim, Youngjoo Shin
2022 ACM Computing Surveys  
Fuzzing is an efficient method to identify vulnerabilities automatically, and many publications have been released to date.  ...  Finally, future directions for fuzzing research of embedded systems are predicted and discussed.  ...  ACKNOWLEDGMENTS We thank the anonymous referees for their valuable comments and helpful suggestions.  ... 
doi:10.1145/3538644 fatcat:rhu6jhxkjvagre2dfoigqrt7hi

XuanYuan: An AI-Native Database

Guoliang Li, Xuanhe Zhou, Sihao Li
2019 IEEE Data Engineering Bulletin  
In this paper, we introduce five levels of AI-native databases and provide several open challenges of designing an AI-native database.  ...  ., ARM, GPU, AI chips). Moreover, besides relational model, we can utilize tensor model to accelerate AI operations. Thus, we need to design new techniques to make full use of new hardware.  ...  self-assembling, but also provides in-database AI capabilities to lower the burden of using AI.  ... 
dblp:journals/debu/0001ZL19 fatcat:pdulbqa33jhjxeeqsisqwol6hu

kGuard: Lightweight Kernel Protection against Return-to-User Attacks

Vasileios P. Kemerlis, Georgios Portokalidis, Angelos D. Keromytis
2012 USENIX Security Symposium  
an overhead of 11.4% on system call and I/O latency on x86 OSs, and 10.3% on x86-64.  ...  Return-to-user (ret2usr) attacks exploit the operating system kernel, enabling local users to hijack privileged execution paths and execute arbitrary code with elevated privileges.  ...  Any opinions, findings, conclusions, or recommendations expressed herein are those of the authors, and do not necessarily reflect those of the US Government, DARPA, or the Air Force.  ... 
dblp:conf/uss/KemerlisPK12 fatcat:lx3iu3w7encbhmgwqdna67gwva

Understanding the GPU Microarchitecture to Achieve Bare-Metal Performance Tuning

Xiuxia Zhang, Guangming Tan, Shuangbai Xue, Jiajia Li, Keren Zhou, Mingyu Chen
2017 Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPoPP '17  
The toolchain is an attempt to automatically crack different GPU ISA encodings and build an assembler adaptively for the purpose of performance enhancements to applications on GPUs.  ...  The performance boost is achieved by tuning FFMA throughput by activating dual-issue, eliminating register bank conflicts, adding non-FFMA instructions with little penalty, and choosing proper width of  ...  Mary Hall and other reviewers for the very useful comments and suggestions which help us improve the quality of our paper.  ... 
doi:10.1145/3018743.3018755 fatcat:pueil5biffgtlplus57wybzysa

From Library Portability to Para-rehosting: Natively Executing Microcontroller Software on Commodity Hardware [article]

Wenqiang Li, Le Guan, Jingqiang Lin, Jiameng Shi, Fengjun Li
2021 arXiv   pre-print
To demonstrate the superiority of our approach in terms of security testing, we used off-the-shelf dynamic analysis tools (AFL and ASAN) against the rehosted programs and discovered 28 previously-unknown  ...  However, ad-hoc re-hosting is a daunting and tedious task and subject to many issues (library-dependence, kernel-dependence and hardware-dependence).  ...  The work reported in this paper was supported in part by JFSG from the University of Georgia Research Foundation, Inc., NSF IIS-2014552, DGE-1565570, NSA Science of Security Initiative H98230-18-D-0009  ... 
arXiv:2107.12867v1 fatcat:y2xpjkggyvfdjfbt4epsu2knmq

DTrace: fine-grained and efficient data integrity checking with hardware instruction tracing

Xiayang Wang, Fuqian Huang, Haibo Chen
2019 Cybersecurity  
Recently released Intel processors have been equipped with hardware instruction tracing facilities to securely and efficiently record the program execution path.  ...  In this paper, we study a case for data integrity checking based on Intel Processor Trace (Intel PT), the instruction tracing facility on x86 processors.  ...  Funding This work is supported in part by National Key Research and Development Program of China and a research grant from Huawei Technologies, Inc.  ... 
doi:10.1186/s42400-018-0018-3 fatcat:q5u56ugi4nedhlm24a7q5jfmjm

xTag: Mitigating Use-After-Free Vulnerabilities via Software-Based Pointer Tagging on Intel x86-64 [article]

Lukas Bernhard, Michael Rodler, Thorsten Holz, Lucas Davi
2022 arXiv   pre-print
Our approach is highly compatible, allowing pointers to be passed back and forth between instrumented and non-instrumented code without losing metadata, and it is even compatible with inline assembly.  ...  In this paper, we present the design and implementation of an efficient, software-only pointer tagging scheme for Intel x86-64 based on a novel metadata embedding scheme.  ...  However, a recent analysis of serious security bugs in Chrome [57] revealed that about 36 % of the analyzed 912 high or critical severity security bugs since 2015 were related to use-after-free.  ... 
arXiv:2203.04117v1 fatcat:t4s63gr4bbdylm2ul5d2dyozvm

Vector Extensions in COTS Processors to Increase Guaranteed Performance in Real-Time Systems

Roger Pujol, Josep Jorba, Hamid Tabani, Leonidas Kosmidis, Enrico Mezzetti, Jaume Abella, Francisco Cazorla
2022 ACM Transactions on Embedded Computing Systems  
of existing code coverage and timing analysis tools.  ...  We develop vectorized versions of neural network kernels and show that the NVIDIA Xavier VExt provide a reasonable increase in guaranteed application performance of up to 2.7x.  ...  ACKNOWLEDGEMENTS This work has received funding from the the European Research Council (ERC) grant agreement No. 772773 (SuPerCom) and the Spanish Ministry of Science and Innovation (AEI/10.13039/501100011033  ... 
doi:10.1145/3561054 fatcat:3jrkylaimfdadnmgj5cldom3va

Comparing the performance of different x86 SIMD instruction sets for a medical imaging application on modern multi- and manycore chips

Johannes Hofmann, Jan Treibig, Georg Hager, Gerhard Wellein
2014 Proceedings of the 2014 Workshop on Workshop on programming models for SIMD/Vector processing - WPMVP '14  
We analyze the performance of SSE (128 bit), AVX (256 bit), AVX2 (256 bit), and IMCI (512 bit) implementations on recent Intel x86 systems.  ...  throughput.  ...  It is known that the AVX kernel suffers from critical path dependencies [10] , which is confirmed by the large benefit gained with SMT.  ... 
doi:10.1145/2568058.2568068 dblp:conf/ppopp/HofmannTHW14 fatcat:w66hhteubrep3ekonywb4d6c2q
« Previous Showing results 1 — 15 out of 370 results