A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is application/pdf
.
Filters
Practically efficient methods for performing bit-reversed permutation in C++11 on the x86-64 architecture
[article]
2017
arXiv
pre-print
Three new strategies for performing the bit-reversed permutation in C++11 are proposed: an inductive method using the bitwise XOR operation, a template-recursive closed form, and a cache-oblivious template-recursive ...
This paper presents optimized C++11 implementations of five extant methods for computing the bit-reversed permutation: Stockham auto-sort, naive bitwise swapping, swapping via a table of reversed bytes ...
Acknowledgements We are grateful to Thimo Wellner and Guy Ling for their contributions. This paper grew out of the masters course in Scientific Computing taught by Oliver Serang. ...
arXiv:1708.01873v1
fatcat:vx3zpajytrcf7o3hyyk6weozum
Harnessing ISA diversity: Design of a heterogeneous-ISA chip multiprocessor
2014
2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA)
Heterogeneous multicore architectures have the potential for high performance and energy efficiency. ...
This work exploits the diversity offered by three modern ISAs: Thumb, x86-64, and Alpha. ...
Acknowledgements The authors would like to thank the anonymous reviewers for their helpful insights. This research was supported in part by NSF Grants CCF-1219059 and CCF-1302682. ...
doi:10.1109/isca.2014.6853218
dblp:conf/isca/VenkatT14
fatcat:ochijsbccvdrhfofpyqnmo2zyy
Harnessing ISA diversity
2014
SIGARCH Computer Architecture News
Heterogeneous multicore architectures have the potential for high performance and energy efficiency. ...
This work exploits the diversity offered by three modern ISAs: Thumb, x86-64, and Alpha. ...
Acknowledgements The authors would like to thank the anonymous reviewers for their helpful insights. This research was supported in part by NSF Grants CCF-1219059 and CCF-1302682. ...
doi:10.1145/2678373.2665692
fatcat:s2iurlomofbfficeone5inbz3q
The Software Performance of Authenticated-Encryption Modes
[chapter]
2011
Lecture Notes in Computer Science
Still we find room for algorithmic improvements to OCB, showing how to trim one blockcipher call (most of the time, assuming a counter-based nonce) and reduce latency. ...
Our findings contrast with those of McGrew and Viega (2004) , who claimed similar performance for GCM and OCB. ...
Acknowledgments Phil Rogaway had interesting discussions with Tariq Ahmad (University of Massachusetts) on hardware aspects of GCM and OCB3. The authors appreciate the support of NSF CNS 0904380. ...
doi:10.1007/978-3-642-21702-9_18
fatcat:rxh5ghghgjhx3hlzxk4bojbhty
Reverse Engineering x86 Processor Microcode
[article]
2019
arXiv
pre-print
In this paper, we reverse engineer the microcode semantics and inner workings of its update mechanism of conventional COTS CPUs on the example of AMD's K8 and K10 microarchitectures. ...
Microcode is an abstraction layer on top of the physical components of a CPU and present in most general-purpose CPUs today. ...
Acknowledgement We thank the reviewers for their valuable feedback. ...
arXiv:1910.00948v1
fatcat:lajfppfs55f2vd3mj4wcat77ly
Speeding up decimal multiplication
[article]
2020
arXiv
pre-print
We also present a simple cache-efficient algorithm for in-place 2n × n or n × 2n matrix transposition, the need for which arises in the "six-step algorithm" variation of the matrix Fourier algorithm, and ...
Decimal multiplication is the task of multiplying two numbers in base 10^N. Specifically, we focus on the number-theoretic transform (NTT) family of algorithms. ...
Algorithm for bit-reversal permutation See [17] for overview of algorithms for bit-reversal permutation. ...
arXiv:2011.11524v4
fatcat:smi7l3qehbfwhi673pyinegu64
LPCP: An efficient Privacy-Preserving Protocol for Polynomial Calculation Based on CRT
2022
Applied Sciences
For practical purpose, we describe a distance measurement application for mobile devices based on LPCP. ...
To solve this problem, we propose an efficient two-party computation protocol secure against semi-honest adversary based on the Chinese remainder theorem (CRT). ...
Table 3 shows the practical performance of the two-party LPCP protocol on additive and multiplicative arithmetic operations in both the ARM architecture and the X86 architecture environment. ...
doi:10.3390/app12063117
fatcat:b6ltvmnvmfhtrdd6dr6rtd4rfq
TIVA: Trusted Integrity Verification Architecture
[chapter]
2006
Lecture Notes in Computer Science
We propose one such novel solution, TIVA, in this paper. ...
Verifying the integrity of the software running on these devices in such a scenario is an interesting and difficult problem. ...
Flexibility of TIVA We explained TIVA for 10-bit permutation functions in a 32-bit architecture producing a 64-bit checksum, thus handling a memory size of 4KB. ...
doi:10.1007/11787952_2
fatcat:zqtyk4dxtrf7ldkv6i5epmweq4
Gadge me if you can
2013
Proceedings of the 8th ACM SIGSAC symposium on Information, computer and communications security - ASIA CCS '13
Our prototype implementation supports the Linux ELF file format and covers both mainstream processor architectures x86 and ARM. ...
Our evaluation demonstrates that XIFER performs efficiently at load-and during run-time (1.2% overhead). ...
To evaluate the efficiency, we used the SPEC CPU2006 integer benchmark suite for the x86 version.
Runtime Overhead on Intel x86. ...
doi:10.1145/2484313.2484351
dblp:conf/ccs/DaviDNS13
fatcat:wqm3u4vlibh5ndzkrbb3ilhqrm
Vectorized and performance-portable Quicksort
[article]
2022
arXiv
pre-print
This paper focuses on the practical engineering aspects enabling the speed and portability, which we have not yet seen demonstrated for a Quicksort implementation. ...
To the best of our knowledge, this is the fastest sort for non-tuple keys on CPUs, up to 20 times as fast as the sorting algorithms implemented in standard libraries. ...
In contrast to previous works, which can be seen as proofs of concept, we have focused on practical usability (support for 32/64-bit floatingpoint and 16/32/64/128-bit integer keys in ascending or descending ...
arXiv:2205.05982v1
fatcat:zuykzjuq75hl5fwqzdfft4eokm
Record Setting Software Implementation of DES Using CUDA
2010
2010 Seventh International Conference on Information Technology: New Generations
improvement in the cost efficiency of the attack. ...
This turns out in a better cost-availability tradeoff and minimizes the required setup time for such an attack to be mounted. ...
Acknowledgements This work was partially supported by MIUR in the framework of the PRIN SESAME project. ...
doi:10.1109/itng.2010.43
dblp:conf/itng/AgostaBSP10
fatcat:put2eqy7lzeglczkipdlkbgs7m
Scalable validation of binary lifters
2020
Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation
My work is the first to do translation validation of single instructions on an ii architecture as extensive as x86-64, uses the most precise formal semantics available, and has the widest coverage in terms ...
Second, I show that formal translation validation of single instructions for a complex ISA like x86-64 is not only practical but can be used as a building block for scalable full-program validation. ...
x86-64 Instruction Set Architecture x86-64 is the 64-bit extension of x86, a family of backward-compatible ISAs. ...
doi:10.1145/3385412.3385964
dblp:conf/pldi/DasguptaDVAF20
fatcat:3khjl5gbmnetjay23fk3sc2ktu
Security through amnesia
2011
Proceedings of the 27th Annual Computer Security Applications Conference on - ACSAC '11
Loop-Amnesia is written for x86-64, but our technique is applicable to other register-based architectures. ...
We offer theoretical justification of Loop-Amnesia's invulnerability to the attack, verify that our implementation is not vulnerable in practice, and present measurements showing our impact on I/O accesses ...
Acknowledgements We thank Andrew Lenharth of the University of Texas at Austin for his invaluable inspiration and advice in the early stages of this work. ...
doi:10.1145/2076732.2076743
dblp:conf/acsac/Simmons11
fatcat:igxzta2rgfhk5i7o4jokmhvic4
Security Through Amnesia: A Software-Based Solution to the Cold Boot Attack on Disk Encryption
[article]
2011
arXiv
pre-print
Loop-Amnesia is written for x86-64, but our technique is applicable to other register-based architectures. ...
We offer theoretical justification of Loop-Amnesia's invulnerability to the attack, verify that our implementation is not vulnerable in practice, and present measurements showing our impact on I/O accesses ...
Acknowledgements We thank Andrew Lenharth of the University of Texas at Austin for his invaluable inspiration and advice in the early stages of this work. ...
arXiv:1104.4843v1
fatcat:aekgqlim2fevrmgp3zx3maxybq
Studying Security Issues in HPC (Super Computer) Environment
2012
International journal of computer and communication technology
It is the purpose of this paper to present some practical security issues related to High Performance Computing Environment. ...
Due to cluster type architecture and high processing speed, we have experienced that it works far better and handles the loads in much more efficient manner then series of desktop with normal configuration ...
Using Gluster-HPC ver 1.3 x86-64 bit edition the complete cluster along with Intel and gcc and gcc4 compiler makes it one of the most stable HPC. ...
doi:10.47893/ijcct.2012.1134
fatcat:ideuy3tlpngopf3pnnlzfsf3qu
« Previous
Showing results 1 — 15 out of 325 results