76 Hits in 1.3 sec

Flexible Deep Neural Network Processing [article]

Hokchhay Tann, Soheil Hashemi, Sherief Reda
2018 arXiv   pre-print
The recent success of Deep Neural Networks (DNNs) has drastically improved the state of the art for many application domains. While achieving high accuracy performance, deploying state-of-the-art DNNs is a challenge since they typically require billions of expensive arithmetic computations. In addition, DNNs are typically deployed in ensemble to boost accuracy performance, which further exacerbates the system requirements. This computational overhead is an issue for many platforms, e.g. data
more » ... ters and embedded systems, with tight latency and energy budgets. In this article, we introduce flexible DNNs ensemble processing technique, which achieves large reduction in average inference latency while incurring small to negligible accuracy drop. Our technique is flexible in that it allows for dynamic adaptation between quality of results (QoR) and execution runtime. We demonstrate the effectiveness of the technique on AlexNet and ResNet-50 using the ImageNet dataset. This technique can also easily handle other types of networks.
arXiv:1801.07353v1 fatcat:ycwiiwjeure2bepieqbknw4yza

Noninvasive Blockade of Action Potential by Electromagnetic Induction [article]

Soheil Hashemi, Amirhossein Hajiaghajani, Ali Abdolali
2018 arXiv   pre-print
Conventional anesthesia methods such as injective anesthetic agents may cause various side effects such as injuries, allergies, and infections. We aim to investigate a noninvasive scheme of an electromagnetic radiator system to block action potential (AP) in neuron fibers. We achieved a high-gradient and unipolar tangential electric field by designing circular geometric coils on an electric rectifier filter layer. An asymmetric sawtooth pulse shape supplied the coils in order to create an
more » ... ive blockage. The entire setup was placed 5 cm above 50 motor and sensory neurons of the spinal cord. A validated time-domain full-wave analysis code Based on cable model of the neurons and the electric and magnetic potentials is used to simulate and investigate the proposed scheme. We observed action potential blockage on both motor and sensory neurons. In addition, the introduced approach shows promising potential for AP manipulation in the spinal cord.
arXiv:1809.06199v1 fatcat:tevo5rof4vfk5c2whyfegk4bd4

Look into details

Mohammad H. Foroozannejad, Matin Hashemi, Trevor L. Hodges, Soheil Ghiasi
2010 SIGPLAN notices  
 Infinite sequence of data items  At any given time, operates on a small window of this sequence  Moves forward in data space 5 5 2 6 4 1 8 9 3 input output -1 7 2 0.4 7.2 1 //53°around the z axis const R[3][3]={ {0.6,-0.8, 0.0}, {0.8, 0.6, 0.0}, {0.0, 0.0, 1.0}} Rotation3D { for (i=0; i<3; i++) for (j=0; j<3; j++) B[i] += R[i][j] * A[j] } Application Model  Data Flow Graph  Vertices or Actors  functions, computations  Edges  data dependency, communication between actors  Execution
more » ... l  any actor can perform its computation whenever all necessary input data are available on incoming edges. Application Model  An example Data Flow Graph: Vocoder Duplicate splitter DFT Round robin joiner DFT DFT DFT DFT DFT Round robin splitter Duplicate splitter FIR Smoothing Identity Round robin joiner Deconvolve Round robin splitter Liner Interpolator Round robin joiner Multiplier Decimator Liner Interpolator Decimator Round robin joiner Phase unwrapper Const Multiplier Linear Interpolator Decimator
doi:10.1145/1755951.1755894 fatcat:2jtxbiluzverpepwm7oxnwqjlm

New Heuristics for Rooted Triplet Consistency

Soheil Jahangiri, Seyed Hashemi, Hadi Poormohammadi
2013 Algorithms  
Rooted triplets are becoming one of the most important types of input for reconstructing rooted phylogenies. A rooted triplet is a phylogenetic tree on three leaves and shows the evolutionary relationship of the corresponding three species. In this paper, we investigate the problem of inferring the maximum consensus evolutionary tree from a set of rooted triplets. This problem is known to be APX-hard. We present two new heuristic algorithms. For a given set of m triplets on n species, the
more » ... ee algorithm runs in O(m + α(n)n 2 ) time, where α(n) is the functional inverse of Ackermann's function. This is faster than any other previously known algorithms, although the outcome is less satisfactory. The Best Pair Merge with Total Reconstruction (BPMTR) algorithm runs in O(mn 3 ) time and, on average, performs better than any other previously known algorithms for this problem.
doi:10.3390/a6030396 fatcat:r47k6ajk4zhojbood6fjy4rfji

Hardware-Software Codesign of Accurate, Multiplier-free Deep Neural Networks [article]

Hokchhay Tann, Soheil Hashemi, Iris Bahar, Sherief Reda
2017 arXiv   pre-print
On the other hand, Hashemi et al. [9] provide a broad evaluation of different precisions and quantizations on both the hardware metrics and network accuracy.  ... 
arXiv:1705.04288v1 fatcat:wd4z3lzxubht7b3nyluigymwne


Soheil Hashemi, Ali Abdolali
2017 Progress In Electromagnetics Research M  
Most of the materials have nearly constant electromagnetic characteristics at low frequencies. Nonetheless, biological tissues are not the same; they are highly dispersive, even at low frequencies. Cable theory is the most famous method for analyzing nerves though a common mistake when studying the model is to consider a constant parameter versus frequency. This issue is discussed in the present article, and the analysis of how to model the dispersion in the cable model is proposed and
more » ... . The proposed dispersive model can predict the behavior of excitable cells versus stimulations with single frequency or wide-band signals. In this article, the nondestructive external stimulation by a coil is modeled and computed by finite difference method to survey the dispersion impact. Also, 5% to 80% difference is shown between the results of dispersive and nondispersive models in the 5 Hz to 4 kHz investigation. The disagreement expresses the dispersion notability. The proposed dispersive method assists in accurate device design and signal form optimization. Noise analysis is also achieved by this model, unlike the conventional models, which is essential in the analysis of single neurons or central nervous system, EEG and MEG records.
doi:10.2528/pierm17030102 fatcat:xjmdnhnrg5b6lk6ikietbldgky

DRiLLS: Deep Reinforcement Learning for Logic Synthesis [article]

Abdelrahman Hosny, Soheil Hashemi, Mohamed Shalan, Sherief Reda
2019 arXiv   pre-print
Logic synthesis requires extensive tuning of the synthesis optimization flow where the quality of results (QoR) depends on the sequence of optimizations used. Efficient design space exploration is challenging due to the exponential number of possible optimization permutations. Therefore, automating the optimization process is necessary. In this work, we propose a novel reinforcement learning-based methodology that navigates the optimization space without human intervention. We demonstrate the
more » ... aining of an Advantage Actor Critic (A2C) agent that seeks to minimize area subject to a timing constraint. Using the proposed methodology, designs can be optimized autonomously with no-humans in-loop. Evaluation on the comprehensive EPFL benchmark suite shows that the agent outperforms existing exploration methodologies and improves QoRs by an average of 13%.
arXiv:1911.04021v2 fatcat:m5yvlqzkwrfo5przeacvs6taaa

gIM: GPU Accelerated RIS-based Influence Maximization Algorithm [article]

Soheil Shahrouz, Saber Salehkaleybar, Matin Hashemi
2021 arXiv   pre-print
Given a social network modeled as a weighted graph G, the influence maximization problem seeks k vertices to become initially influenced, to maximize the expected number of influenced nodes under a particular diffusion model. The influence maximization problem has been proven to be NP-hard, and most proposed solutions to the problem are approximate greedy algorithms, which can guarantee a tunable approximation ratio for their results with respect to the optimal solution. The state-of-the-art
more » ... orithms are based on Reverse Influence Sampling (RIS) technique, which can offer both computational efficiency and non-trivial (1-1/e-ϵ)-approximation ratio guarantee for any ϵ >0. RIS-based algorithms, despite their lower computational cost compared to other methods, still require long running times to solve the problem in large-scale graphs with low values of ϵ. In this paper, we present a novel and efficient parallel implementation of a RIS-based algorithm, namely IMM, on GPU. The proposed GPU-accelerated influence maximization algorithm, named gIM, can significantly reduce the running time on large-scale graphs with low values of ϵ. Furthermore, we show that gIM algorithm can solve other variations of the IM problem, only by applying minor modifications. Experimental results show that the proposed solution reduces the runtime by a factor up to 220 ×. The source code of gIM is publicly available online.
arXiv:2009.07325v2 fatcat:fjvj7mszpjgs7beuer2ownjkwu

Zero reflection from metamaterial sphere

Ali Abdolali, Soheil Hashemi, Noushin Vaseghi
2014 AEU - International Journal of Electronics and Communications  
Abdolali), s (S. Hashemi), n (N. Vaseghi). jk 1 zx 1434-8411/$ -see front matter © 2013 Elsevier GmbH.  ... 
doi:10.1016/j.aeue.2013.07.003 fatcat:nxwmoqvzizdxdh22ec5t74xynu

New Algorithms on Rooted Triplet Consistency [article]

Soheil Jahangiri Tazehkand, Seyed Naser Hashemi, Hadi Poormohammadi
2013 arXiv   pre-print
An evolutionary tree (phylogenetic tree) is a binary, rooted, unordered tree that models the evolutionary history of currently living species in which leaves are labeled by species. In this paper, we investigate the problem of finding the maximum consensus evolutionary tree from a set of given rooted triplets. A rooted triplet is a phylogenetic tree on three leaves and shows the evolutionary relationship of the corresponding three species. The mentioned problem is known to be APX-hard. We
more » ... t two new heuristic algorithms. For a given set of m triplets on n species, the FastTree algorithm runs in O(mn^2) which is faster than any other previously known algorithms, although, the outcome is less satisfactory. The BPMTR algorithm runs in O(mn^3) and in average performs better than any other previously known approximation algorithms for this problem.
arXiv:1205.3532v3 fatcat:nersd3gejfenbbohhkhla5r4ee


Soheil Hashemi, Hokchhay Tann, Sherief Reda
2018 Proceedings of the 55th Annual Design Automation Conference on - DAC '18  
Approximate computing is an emerging paradigm where design accuracy can be traded off for benefits in design metrics such as design area, power consumption or circuit complexity. In this work, we present a novel paradigm to synthesize approximate circuits using Boolean matrix factorization (BMF). In our methodology the truth table of a sub-circuit of the design is approximated using BMF to a controllable approximation degree, and the results of the factorization are used to synthesize a less
more » ... plex subcircuit. To scale our technique to large circuits, we devise a circuit decomposition method and a subcircuit design-space exploration technique to identify the best order for subcircuit approximations. Our method leads to a smooth trade-off between accuracy and full circuit complexity as measured by design area and power consumption. Using an industrial strength design flow, we extensively evaluate our methodology on a number of testcases, where we demonstrate that the proposed methodology can achieve up to 63% in power savings, while introducing an average relative error of 5%. We also compare our work to previous works in Boolean circuit synthesis and demonstrate significant improvements in design metrics for same accuracy targets.
doi:10.1145/3195970.3196001 dblp:conf/dac/HashemiTR18 fatcat:fbrbxusnijbuzgfpemp3gic66i

Understanding the Impact of Precision Quantization on the Accuracy and Energy of Neural Networks [article]

Soheil Hashemi, Nicholas Anthony, Hokchhay Tann, R. Iris Bahar, Sherief Reda
2016 arXiv   pre-print
Deep neural networks are gaining in popularity as they are used to generate state-of-the-art results for a variety of computer vision and machine learning applications. At the same time, these networks have grown in depth and complexity in order to solve harder problems. Given the limitations in power budgets dedicated to these networks, the importance of low-power, low-memory solutions has been stressed in recent years. While a large number of dedicated hardware using different precisions has
more » ... ecently been proposed, there exists no comprehensive study of different bit precisions and arithmetic in both inputs and network parameters. In this work, we address this issue and perform a study of different bit-precisions in neural networks (from floating-point to fixed-point, powers of two, and binary). In our evaluation, we consider and analyze the effect of precision scaling on both network accuracy and hardware metrics including memory footprint, power and energy consumption, and design area. We also investigate training-time methodologies to compensate for the reduction in accuracy due to limited bit precision and demonstrate that in most cases, precision scaling can deliver significant benefits in design metrics at the cost of very modest decreases in network accuracy. In addition, we propose that a small portion of the benefits achieved when using lower precisions can be forfeited to increase the network size and therefore the accuracy. We evaluate our experiments, using three well-recognized networks and datasets to show its generality. We investigate the trade-offs and highlight the benefits of using lower precisions in terms of energy and memory footprint.
arXiv:1612.03940v1 fatcat:c432ganlkjdpjjghhi3cddqncm

Exact and Approximate Task Assignment Algorithms for Pipelined Software Synthesis

Matin Hashemi, Soheil Ghiasi
2008 2008 Design, Automation and Test in Europe  
Pipelined execution of streaming applications enable processing of high-throughput data under performance constraint. We present an integrated approach to synthesizing pipelined software for dual-core architectures. We target streaming applications modeled as task graphs that are amenable to static analysis. We develop a versatile task assignment algorithm that considers the combined effect of workload imbalance between processors and inter-processor communication. Our technique, which runs in
more » ... seudo-linear time, provably maximizes application throughput. Furthermore, we develop an approximation algorithm for task assignment whose complexity is strictly polynomial. It provides the designer with an adjustable knob to controllably trade solution quality with algorithm runtime and memory requirement. Empirical throughput measurements using an FPGA-based dual-core system validate our theoretical results. Our exact algorithm consistently outperforms a recent competitor. Compared to exact task assignment, the approximate method runs about 3 times faster, requires about 20 times less memory, and results in only 1% to 5% throughput loss.
doi:10.1109/date.2008.4484768 dblp:conf/date/HashemiG08 fatcat:23qaqwqqtjcpnkugdgm5wp2tzi

Puzzle solver accelerators make excellent capstone design projects

Soheil Ghiasi, Matin Hashemi, Volodymyr Khibin, Faisal Khan
2011 2011 IEEE International Conference on Microelectronic Systems Education  
We present a computer engineering capstone design project course focused on accelerating intensive computations via integration of application-specific co-processors with digital processor systems. We propose utilization of puzzle solvers as attractive, scalable and simple-to-understand applications to engage students with practicing a number of fundamental concepts in algorithm design, HW-SW co-design, computer architecture, and beyond. While advocating a contest setup for the course, we
more » ... s several well-specified milestones that enable balancing students' creativity and freedom in design choices with ensuring timely progress towards the end goal of the class. We report our observations with the only offering of the class so far, which resulted in successful project completion by all students, and their supportive feedback.
doi:10.1109/mse.2011.5937082 dblp:conf/mse/GhiasiHKK11 fatcat:vxf3c6r625gynkvkdiq6rz4dbe

System-Level Performance Estimation for Application-Specific MPSoC Interconnect Synthesis

Po-Kuan Huang, Matin Hashemi, Soheil Ghiasi
2008 2008 Symposium on Application Specific Processors  
We present a framework for development of streaming applications as concurrent software modules running on multi-processors system-on-chips (MPSoC). We propose an iterative design space exploration mechanism to customize MPSoC architecture for given applications. Central to the exploration engine is our system-level performance estimation methodology, that both quickly and accurately determine quality of candidate architectures. We implemented a number of streaming applications on candidate
more » ... itectures that were emulated on an FPGA. Hardware measurements show that our systemlevel performance estimation method incurs only 15% error in predicting application throughput. More importantly, it always correctly guides design space exploration by acheiving 100% fidelity in quality-ranking candidate architectures. Compared to behavioral simulation of compiled code, our system-level estimator runs more than 12 times faster, and requires 7 times less memory.
doi:10.1109/sasp.2008.4570792 dblp:conf/sasp/HuangHG08 fatcat:gh2mwdfghjgp7je4n6l42tvgge
« Previous Showing results 1 — 15 out of 76 results