Filters








325 Hits in 7.7 sec

Practically efficient methods for performing bit-reversed permutation in C++11 on the x86-64 architecture [article]

Christian Knauth, Boran Adas, Daniel Whitfield, Xuesong Wang, Lydia Ickler, Tim Conrad, Oliver Serang
2017 arXiv   pre-print
Three new strategies for performing the bit-reversed permutation in C++11 are proposed: an inductive method using the bitwise XOR operation, a template-recursive closed form, and a cache-oblivious template-recursive  ...  This paper presents optimized C++11 implementations of five extant methods for computing the bit-reversed permutation: Stockham auto-sort, naive bitwise swapping, swapping via a table of reversed bytes  ...  Acknowledgements We are grateful to Thimo Wellner and Guy Ling for their contributions. This paper grew out of the masters course in Scientific Computing taught by Oliver Serang.  ... 
arXiv:1708.01873v1 fatcat:vx3zpajytrcf7o3hyyk6weozum

Harnessing ISA diversity: Design of a heterogeneous-ISA chip multiprocessor

Ashish Venkat, Dean M. Tullsen
2014 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA)  
Heterogeneous multicore architectures have the potential for high performance and energy efficiency.  ...  This work exploits the diversity offered by three modern ISAs: Thumb, x86-64, and Alpha.  ...  Acknowledgements The authors would like to thank the anonymous reviewers for their helpful insights. This research was supported in part by NSF Grants CCF-1219059 and CCF-1302682.  ... 
doi:10.1109/isca.2014.6853218 dblp:conf/isca/VenkatT14 fatcat:ochijsbccvdrhfofpyqnmo2zyy

Harnessing ISA diversity

Ashish Venkat, Dean M. Tullsen
2014 SIGARCH Computer Architecture News  
Heterogeneous multicore architectures have the potential for high performance and energy efficiency.  ...  This work exploits the diversity offered by three modern ISAs: Thumb, x86-64, and Alpha.  ...  Acknowledgements The authors would like to thank the anonymous reviewers for their helpful insights. This research was supported in part by NSF Grants CCF-1219059 and CCF-1302682.  ... 
doi:10.1145/2678373.2665692 fatcat:s2iurlomofbfficeone5inbz3q

The Software Performance of Authenticated-Encryption Modes [chapter]

Ted Krovetz, Phillip Rogaway
2011 Lecture Notes in Computer Science  
Still we find room for algorithmic improvements to OCB, showing how to trim one blockcipher call (most of the time, assuming a counter-based nonce) and reduce latency.  ...  Our findings contrast with those of McGrew and Viega (2004) , who claimed similar performance for GCM and OCB.  ...  Acknowledgments Phil Rogaway had interesting discussions with Tariq Ahmad (University of Massachusetts) on hardware aspects of GCM and OCB3. The authors appreciate the support of NSF CNS 0904380.  ... 
doi:10.1007/978-3-642-21702-9_18 fatcat:rxh5ghghgjhx3hlzxk4bojbhty

Reverse Engineering x86 Processor Microcode [article]

Philipp Koppe and Benjamin Kollenda and Marc Fyrbiak and Christian Kison and Robert Gawlik and Christof Paar and Thorsten Holz
2019 arXiv   pre-print
In this paper, we reverse engineer the microcode semantics and inner workings of its update mechanism of conventional COTS CPUs on the example of AMD's K8 and K10 microarchitectures.  ...  Microcode is an abstraction layer on top of the physical components of a CPU and present in most general-purpose CPUs today.  ...  Acknowledgement We thank the reviewers for their valuable feedback.  ... 
arXiv:1910.00948v1 fatcat:lajfppfs55f2vd3mj4wcat77ly

Speeding up decimal multiplication [article]

Viktor Krapivensky
2020 arXiv   pre-print
We also present a simple cache-efficient algorithm for in-place 2n × n or n × 2n matrix transposition, the need for which arises in the "six-step algorithm" variation of the matrix Fourier algorithm, and  ...  Decimal multiplication is the task of multiplying two numbers in base 10^N. Specifically, we focus on the number-theoretic transform (NTT) family of algorithms.  ...  Algorithm for bit-reversal permutation See [17] for overview of algorithms for bit-reversal permutation.  ... 
arXiv:2011.11524v4 fatcat:smi7l3qehbfwhi673pyinegu64

LPCP: An efficient Privacy-Preserving Protocol for Polynomial Calculation Based on CRT

Jiajian Tang, Zhenfu Cao, Jiachen Shen, Xiaolei Dong
2022 Applied Sciences  
For practical purpose, we describe a distance measurement application for mobile devices based on LPCP.  ...  To solve this problem, we propose an efficient two-party computation protocol secure against semi-honest adversary based on the Chinese remainder theorem (CRT).  ...  Table 3 shows the practical performance of the two-party LPCP protocol on additive and multiplicative arithmetic operations in both the ARM architecture and the X86 architecture environment.  ... 
doi:10.3390/app12063117 fatcat:b6ltvmnvmfhtrdd6dr6rtd4rfq

TIVA: Trusted Integrity Verification Architecture [chapter]

Mahadevan Gomathisankaran, Akhilesh Tyagi
2006 Lecture Notes in Computer Science  
We propose one such novel solution, TIVA, in this paper.  ...  Verifying the integrity of the software running on these devices in such a scenario is an interesting and difficult problem.  ...  Flexibility of TIVA We explained TIVA for 10-bit permutation functions in a 32-bit architecture producing a 64-bit checksum, thus handling a memory size of 4KB.  ... 
doi:10.1007/11787952_2 fatcat:zqtyk4dxtrf7ldkv6i5epmweq4

Gadge me if you can

Lucas Vincenzo Davi, Alexandra Dmitrienko, Stefan Nürnberger, Ahmad-Reza Sadeghi
2013 Proceedings of the 8th ACM SIGSAC symposium on Information, computer and communications security - ASIA CCS '13  
Our prototype implementation supports the Linux ELF file format and covers both mainstream processor architectures x86 and ARM.  ...  Our evaluation demonstrates that XIFER performs efficiently at load-and during run-time (1.2% overhead).  ...  To evaluate the efficiency, we used the SPEC CPU2006 integer benchmark suite for the x86 version. Runtime Overhead on Intel x86.  ... 
doi:10.1145/2484313.2484351 dblp:conf/ccs/DaviDNS13 fatcat:wqm3u4vlibh5ndzkrbb3ilhqrm

Vectorized and performance-portable Quicksort [article]

Mark Blacher, Joachim Giesen, Peter Sanders, Jan Wassenberg
2022 arXiv   pre-print
This paper focuses on the practical engineering aspects enabling the speed and portability, which we have not yet seen demonstrated for a Quicksort implementation.  ...  To the best of our knowledge, this is the fastest sort for non-tuple keys on CPUs, up to 20 times as fast as the sorting algorithms implemented in standard libraries.  ...  In contrast to previous works, which can be seen as proofs of concept, we have focused on practical usability (support for 32/64-bit floatingpoint and 16/32/64/128-bit integer keys in ascending or descending  ... 
arXiv:2205.05982v1 fatcat:zuykzjuq75hl5fwqzdfft4eokm

Record Setting Software Implementation of DES Using CUDA

Giovanni Agosta, Alessandro Barenghi, Fabrizio De Santis, Gerardo Pelosi
2010 2010 Seventh International Conference on Information Technology: New Generations  
improvement in the cost efficiency of the attack.  ...  This turns out in a better cost-availability tradeoff and minimizes the required setup time for such an attack to be mounted.  ...  Acknowledgements This work was partially supported by MIUR in the framework of the PRIN SESAME project.  ... 
doi:10.1109/itng.2010.43 dblp:conf/itng/AgostaBSP10 fatcat:put2eqy7lzeglczkipdlkbgs7m

Scalable validation of binary lifters

Sandeep Dasgupta, Sushant Dinesh, Deepan Venkatesh, Vikram S. Adve, Christopher W. Fletcher
2020 Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation  
My work is the first to do translation validation of single instructions on an ii architecture as extensive as x86-64, uses the most precise formal semantics available, and has the widest coverage in terms  ...  Second, I show that formal translation validation of single instructions for a complex ISA like x86-64 is not only practical but can be used as a building block for scalable full-program validation.  ...  x86-64 Instruction Set Architecture x86-64 is the 64-bit extension of x86, a family of backward-compatible ISAs.  ... 
doi:10.1145/3385412.3385964 dblp:conf/pldi/DasguptaDVAF20 fatcat:3khjl5gbmnetjay23fk3sc2ktu

Security through amnesia

Patrick Simmons
2011 Proceedings of the 27th Annual Computer Security Applications Conference on - ACSAC '11  
Loop-Amnesia is written for x86-64, but our technique is applicable to other register-based architectures.  ...  We offer theoretical justification of Loop-Amnesia's invulnerability to the attack, verify that our implementation is not vulnerable in practice, and present measurements showing our impact on I/O accesses  ...  Acknowledgements We thank Andrew Lenharth of the University of Texas at Austin for his invaluable inspiration and advice in the early stages of this work.  ... 
doi:10.1145/2076732.2076743 dblp:conf/acsac/Simmons11 fatcat:igxzta2rgfhk5i7o4jokmhvic4

Security Through Amnesia: A Software-Based Solution to the Cold Boot Attack on Disk Encryption [article]

Patrick Simmons
2011 arXiv   pre-print
Loop-Amnesia is written for x86-64, but our technique is applicable to other register-based architectures.  ...  We offer theoretical justification of Loop-Amnesia's invulnerability to the attack, verify that our implementation is not vulnerable in practice, and present measurements showing our impact on I/O accesses  ...  Acknowledgements We thank Andrew Lenharth of the University of Texas at Austin for his invaluable inspiration and advice in the early stages of this work.  ... 
arXiv:1104.4843v1 fatcat:aekgqlim2fevrmgp3zx3maxybq

Studying Security Issues in HPC (Super Computer) Environment

Anirban Mitra, Ramanuja Nayak
2012 International journal of computer and communication technology  
It is the purpose of this paper to present some practical security issues related to High Performance Computing Environment.  ...  Due to cluster type architecture and high processing speed, we have experienced that it works far better and handles the loads in much more efficient manner then series of desktop with normal configuration  ...  Using Gluster-HPC ver 1.3 x86-64 bit edition the complete cluster along with Intel and gcc and gcc4 compiler makes it one of the most stable HPC.  ... 
doi:10.47893/ijcct.2012.1134 fatcat:ideuy3tlpngopf3pnnlzfsf3qu
« Previous Showing results 1 — 15 out of 325 results