A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is application/pdf
.
Filters
Flexible Deep Neural Network Processing
[article]
2018
arXiv
pre-print
The recent success of Deep Neural Networks (DNNs) has drastically improved the state of the art for many application domains. While achieving high accuracy performance, deploying state-of-the-art DNNs is a challenge since they typically require billions of expensive arithmetic computations. In addition, DNNs are typically deployed in ensemble to boost accuracy performance, which further exacerbates the system requirements. This computational overhead is an issue for many platforms, e.g. data
arXiv:1801.07353v1
fatcat:ycwiiwjeure2bepieqbknw4yza
more »
... ters and embedded systems, with tight latency and energy budgets. In this article, we introduce flexible DNNs ensemble processing technique, which achieves large reduction in average inference latency while incurring small to negligible accuracy drop. Our technique is flexible in that it allows for dynamic adaptation between quality of results (QoR) and execution runtime. We demonstrate the effectiveness of the technique on AlexNet and ResNet-50 using the ImageNet dataset. This technique can also easily handle other types of networks.
Noninvasive Blockade of Action Potential by Electromagnetic Induction
[article]
2018
arXiv
pre-print
Conventional anesthesia methods such as injective anesthetic agents may cause various side effects such as injuries, allergies, and infections. We aim to investigate a noninvasive scheme of an electromagnetic radiator system to block action potential (AP) in neuron fibers. We achieved a high-gradient and unipolar tangential electric field by designing circular geometric coils on an electric rectifier filter layer. An asymmetric sawtooth pulse shape supplied the coils in order to create an
arXiv:1809.06199v1
fatcat:tevo5rof4vfk5c2whyfegk4bd4
more »
... ive blockage. The entire setup was placed 5 cm above 50 motor and sensory neurons of the spinal cord. A validated time-domain full-wave analysis code Based on cable model of the neurons and the electric and magnetic potentials is used to simulate and investigate the proposed scheme. We observed action potential blockage on both motor and sensory neurons. In addition, the introduced approach shows promising potential for AP manipulation in the spinal cord.
Look into details
2010
SIGPLAN notices
Infinite sequence of data items At any given time, operates on a small window of this sequence Moves forward in data space 5 5 2 6 4 1 8 9 3 input output -1 7 2 0.4 7.2 1 //53°around the z axis const R[3][3]={ {0.6,-0.8, 0.0}, {0.8, 0.6, 0.0}, {0.0, 0.0, 1.0}} Rotation3D { for (i=0; i<3; i++) for (j=0; j<3; j++) B[i] += R[i][j] * A[j] } Application Model Data Flow Graph Vertices or Actors functions, computations Edges data dependency, communication between actors Execution
doi:10.1145/1755951.1755894
fatcat:2jtxbiluzverpepwm7oxnwqjlm
more »
... l any actor can perform its computation whenever all necessary input data are available on incoming edges. Application Model An example Data Flow Graph: Vocoder Duplicate splitter DFT Round robin joiner DFT DFT DFT DFT DFT Round robin splitter Duplicate splitter FIR Smoothing Identity Round robin joiner Deconvolve Round robin splitter Liner Interpolator Round robin joiner Multiplier Decimator Liner Interpolator Decimator Round robin joiner Phase unwrapper Const Multiplier Linear Interpolator Decimator
New Heuristics for Rooted Triplet Consistency
2013
Algorithms
Rooted triplets are becoming one of the most important types of input for reconstructing rooted phylogenies. A rooted triplet is a phylogenetic tree on three leaves and shows the evolutionary relationship of the corresponding three species. In this paper, we investigate the problem of inferring the maximum consensus evolutionary tree from a set of rooted triplets. This problem is known to be APX-hard. We present two new heuristic algorithms. For a given set of m triplets on n species, the
doi:10.3390/a6030396
fatcat:r47k6ajk4zhojbood6fjy4rfji
more »
... ee algorithm runs in O(m + α(n)n 2 ) time, where α(n) is the functional inverse of Ackermann's function. This is faster than any other previously known algorithms, although the outcome is less satisfactory. The Best Pair Merge with Total Reconstruction (BPMTR) algorithm runs in O(mn 3 ) time and, on average, performs better than any other previously known algorithms for this problem.
Hardware-Software Codesign of Accurate, Multiplier-free Deep Neural Networks
[article]
2017
arXiv
pre-print
On the other hand, Hashemi et al. [9] provide a broad evaluation of different precisions and quantizations on both the hardware metrics and network accuracy. ...
arXiv:1705.04288v1
fatcat:wd4z3lzxubht7b3nyluigymwne
MODELLING DISPERSIVE BEHAVIOR OF EXCITABLE CELLS
2017
Progress In Electromagnetics Research M
Most of the materials have nearly constant electromagnetic characteristics at low frequencies. Nonetheless, biological tissues are not the same; they are highly dispersive, even at low frequencies. Cable theory is the most famous method for analyzing nerves though a common mistake when studying the model is to consider a constant parameter versus frequency. This issue is discussed in the present article, and the analysis of how to model the dispersion in the cable model is proposed and
doi:10.2528/pierm17030102
fatcat:xjmdnhnrg5b6lk6ikietbldgky
more »
... . The proposed dispersive model can predict the behavior of excitable cells versus stimulations with single frequency or wide-band signals. In this article, the nondestructive external stimulation by a coil is modeled and computed by finite difference method to survey the dispersion impact. Also, 5% to 80% difference is shown between the results of dispersive and nondispersive models in the 5 Hz to 4 kHz investigation. The disagreement expresses the dispersion notability. The proposed dispersive method assists in accurate device design and signal form optimization. Noise analysis is also achieved by this model, unlike the conventional models, which is essential in the analysis of single neurons or central nervous system, EEG and MEG records.
DRiLLS: Deep Reinforcement Learning for Logic Synthesis
[article]
2019
arXiv
pre-print
Logic synthesis requires extensive tuning of the synthesis optimization flow where the quality of results (QoR) depends on the sequence of optimizations used. Efficient design space exploration is challenging due to the exponential number of possible optimization permutations. Therefore, automating the optimization process is necessary. In this work, we propose a novel reinforcement learning-based methodology that navigates the optimization space without human intervention. We demonstrate the
arXiv:1911.04021v2
fatcat:m5yvlqzkwrfo5przeacvs6taaa
more »
... aining of an Advantage Actor Critic (A2C) agent that seeks to minimize area subject to a timing constraint. Using the proposed methodology, designs can be optimized autonomously with no-humans in-loop. Evaluation on the comprehensive EPFL benchmark suite shows that the agent outperforms existing exploration methodologies and improves QoRs by an average of 13%.
gIM: GPU Accelerated RIS-based Influence Maximization Algorithm
[article]
2021
arXiv
pre-print
Given a social network modeled as a weighted graph G, the influence maximization problem seeks k vertices to become initially influenced, to maximize the expected number of influenced nodes under a particular diffusion model. The influence maximization problem has been proven to be NP-hard, and most proposed solutions to the problem are approximate greedy algorithms, which can guarantee a tunable approximation ratio for their results with respect to the optimal solution. The state-of-the-art
arXiv:2009.07325v2
fatcat:fjvj7mszpjgs7beuer2ownjkwu
more »
... orithms are based on Reverse Influence Sampling (RIS) technique, which can offer both computational efficiency and non-trivial (1-1/e-ϵ)-approximation ratio guarantee for any ϵ >0. RIS-based algorithms, despite their lower computational cost compared to other methods, still require long running times to solve the problem in large-scale graphs with low values of ϵ. In this paper, we present a novel and efficient parallel implementation of a RIS-based algorithm, namely IMM, on GPU. The proposed GPU-accelerated influence maximization algorithm, named gIM, can significantly reduce the running time on large-scale graphs with low values of ϵ. Furthermore, we show that gIM algorithm can solve other variations of the IM problem, only by applying minor modifications. Experimental results show that the proposed solution reduces the runtime by a factor up to 220 ×. The source code of gIM is publicly available online.
Zero reflection from metamaterial sphere
2014
AEU - International Journal of Electronics and Communications
Abdolali), s hashemy@iust.ac.ir (S. Hashemi), n vaseghi@yahoo.com (N. Vaseghi).
jk 1 zx 1434-8411/$ -see front matter © 2013 Elsevier GmbH. ...
doi:10.1016/j.aeue.2013.07.003
fatcat:nxwmoqvzizdxdh22ec5t74xynu
New Algorithms on Rooted Triplet Consistency
[article]
2013
arXiv
pre-print
An evolutionary tree (phylogenetic tree) is a binary, rooted, unordered tree that models the evolutionary history of currently living species in which leaves are labeled by species. In this paper, we investigate the problem of finding the maximum consensus evolutionary tree from a set of given rooted triplets. A rooted triplet is a phylogenetic tree on three leaves and shows the evolutionary relationship of the corresponding three species. The mentioned problem is known to be APX-hard. We
arXiv:1205.3532v3
fatcat:nersd3gejfenbbohhkhla5r4ee
more »
... t two new heuristic algorithms. For a given set of m triplets on n species, the FastTree algorithm runs in O(mn^2) which is faster than any other previously known algorithms, although, the outcome is less satisfactory. The BPMTR algorithm runs in O(mn^3) and in average performs better than any other previously known approximation algorithms for this problem.
Approximate computing is an emerging paradigm where design accuracy can be traded off for benefits in design metrics such as design area, power consumption or circuit complexity. In this work, we present a novel paradigm to synthesize approximate circuits using Boolean matrix factorization (BMF). In our methodology the truth table of a sub-circuit of the design is approximated using BMF to a controllable approximation degree, and the results of the factorization are used to synthesize a less
doi:10.1145/3195970.3196001
dblp:conf/dac/HashemiTR18
fatcat:fbrbxusnijbuzgfpemp3gic66i
more »
... plex subcircuit. To scale our technique to large circuits, we devise a circuit decomposition method and a subcircuit design-space exploration technique to identify the best order for subcircuit approximations. Our method leads to a smooth trade-off between accuracy and full circuit complexity as measured by design area and power consumption. Using an industrial strength design flow, we extensively evaluate our methodology on a number of testcases, where we demonstrate that the proposed methodology can achieve up to 63% in power savings, while introducing an average relative error of 5%. We also compare our work to previous works in Boolean circuit synthesis and demonstrate significant improvements in design metrics for same accuracy targets.
Understanding the Impact of Precision Quantization on the Accuracy and Energy of Neural Networks
[article]
2016
arXiv
pre-print
Deep neural networks are gaining in popularity as they are used to generate state-of-the-art results for a variety of computer vision and machine learning applications. At the same time, these networks have grown in depth and complexity in order to solve harder problems. Given the limitations in power budgets dedicated to these networks, the importance of low-power, low-memory solutions has been stressed in recent years. While a large number of dedicated hardware using different precisions has
arXiv:1612.03940v1
fatcat:c432ganlkjdpjjghhi3cddqncm
more »
... ecently been proposed, there exists no comprehensive study of different bit precisions and arithmetic in both inputs and network parameters. In this work, we address this issue and perform a study of different bit-precisions in neural networks (from floating-point to fixed-point, powers of two, and binary). In our evaluation, we consider and analyze the effect of precision scaling on both network accuracy and hardware metrics including memory footprint, power and energy consumption, and design area. We also investigate training-time methodologies to compensate for the reduction in accuracy due to limited bit precision and demonstrate that in most cases, precision scaling can deliver significant benefits in design metrics at the cost of very modest decreases in network accuracy. In addition, we propose that a small portion of the benefits achieved when using lower precisions can be forfeited to increase the network size and therefore the accuracy. We evaluate our experiments, using three well-recognized networks and datasets to show its generality. We investigate the trade-offs and highlight the benefits of using lower precisions in terms of energy and memory footprint.
Exact and Approximate Task Assignment Algorithms for Pipelined Software Synthesis
2008
2008 Design, Automation and Test in Europe
Pipelined execution of streaming applications enable processing of high-throughput data under performance constraint. We present an integrated approach to synthesizing pipelined software for dual-core architectures. We target streaming applications modeled as task graphs that are amenable to static analysis. We develop a versatile task assignment algorithm that considers the combined effect of workload imbalance between processors and inter-processor communication. Our technique, which runs in
doi:10.1109/date.2008.4484768
dblp:conf/date/HashemiG08
fatcat:23qaqwqqtjcpnkugdgm5wp2tzi
more »
... seudo-linear time, provably maximizes application throughput. Furthermore, we develop an approximation algorithm for task assignment whose complexity is strictly polynomial. It provides the designer with an adjustable knob to controllably trade solution quality with algorithm runtime and memory requirement. Empirical throughput measurements using an FPGA-based dual-core system validate our theoretical results. Our exact algorithm consistently outperforms a recent competitor. Compared to exact task assignment, the approximate method runs about 3 times faster, requires about 20 times less memory, and results in only 1% to 5% throughput loss.
Puzzle solver accelerators make excellent capstone design projects
2011
2011 IEEE International Conference on Microelectronic Systems Education
We present a computer engineering capstone design project course focused on accelerating intensive computations via integration of application-specific co-processors with digital processor systems. We propose utilization of puzzle solvers as attractive, scalable and simple-to-understand applications to engage students with practicing a number of fundamental concepts in algorithm design, HW-SW co-design, computer architecture, and beyond. While advocating a contest setup for the course, we
doi:10.1109/mse.2011.5937082
dblp:conf/mse/GhiasiHKK11
fatcat:vxf3c6r625gynkvkdiq6rz4dbe
more »
... s several well-specified milestones that enable balancing students' creativity and freedom in design choices with ensuring timely progress towards the end goal of the class. We report our observations with the only offering of the class so far, which resulted in successful project completion by all students, and their supportive feedback.
System-Level Performance Estimation for Application-Specific MPSoC Interconnect Synthesis
2008
2008 Symposium on Application Specific Processors
We present a framework for development of streaming applications as concurrent software modules running on multi-processors system-on-chips (MPSoC). We propose an iterative design space exploration mechanism to customize MPSoC architecture for given applications. Central to the exploration engine is our system-level performance estimation methodology, that both quickly and accurately determine quality of candidate architectures. We implemented a number of streaming applications on candidate
doi:10.1109/sasp.2008.4570792
dblp:conf/sasp/HuangHG08
fatcat:gh2mwdfghjgp7je4n6l42tvgge
more »
... itectures that were emulated on an FPGA. Hardware measurements show that our systemlevel performance estimation method incurs only 15% error in predicting application throughput. More importantly, it always correctly guides design space exploration by acheiving 100% fidelity in quality-ranking candidate architectures. Compared to behavioral simulation of compiled code, our system-level estimator runs more than 12 times faster, and requires 7 times less memory.
« Previous
Showing results 1 — 15 out of 76 results