454 Hits in 1.6 sec

A New Routing Algorithm: Mirad

Amir Gholami Pastaki, Ali Reza Sahab, Seyed Mehdi Sadeghi
2011 Zenodo  
Fig. 1 1 MIRA topology [4] Fig. 3 3 Mean value of path length of MIRAD algorithm conducted on MIRA topology Amir Gholami Pastaki, Ali Reza Sahab and Seyed Mehdi Sadeghi International Science Index  ... 
doi:10.5281/zenodo.1077558 fatcat:hinzogoxobbnpekz4pzwdo7kou

Distributed-memory large deformation diffeomorphic 3D image registration [article]

Andreas Mang and Amir Gholami and George Biros
2016 arXiv   pre-print
We present a parallel distributed-memory algorithm for large deformation diffeomorphic registration of volumetric images that produces large isochoric deformations (locally volume preserving). Image registration is a key technology in medical image analysis. Our algorithm uses a partial differential equation constrained optimal control formulation. Finding the optimal deformation map requires the solution of a highly nonlinear problem that involves pseudo-differential operators, biharmonic
more » ... tors, and pure advection operators both forward and back- ward in time. A key issue is the time to solution, which poses the demand for efficient optimization methods as well as an effective utilization of high performance computing resources. To address this problem we use a preconditioned, inexact, Gauss-Newton- Krylov solver. Our algorithm integrates several components: a spectral discretization in space, a semi-Lagrangian formulation in time, analytic adjoints, different regularization functionals (including volume-preserving ones), a spectral preconditioner, a highly optimized distributed Fast Fourier Transform, and a cubic interpolation scheme for the semi-Lagrangian time-stepping. We demonstrate the scalability of our algorithm on images with resolution of up to 1024^3 on the "Maverick" and "Stampede" systems at the Texas Advanced Computing Center (TACC). The critical problem in the medical imaging application domain is strong scaling, that is, solving registration problems of a moderate size of 256^3---a typical resolution for medical images. We are able to solve the registration problem for images of this size in less than five seconds on 64 x86 nodes of TACC's "Maverick" system.
arXiv:1608.03630v1 fatcat:c3tt47epzzac3ozp7cujhwu7gi

PDE-constrained optimization in medical image analysis [article]

Andreas Mang and Amir Gholami and Christos Davatzikos and George Biros
2018 arXiv   pre-print
GHOLAMI, C. DAVATZIKOS, AND G. BIROS bottleneck of our solver.  ...  GHOLAMI, C. DAVATZIKOS, AND G. BIROS p of m I in (4b). A more complete picture can be found in [70] .  ...  GHOLAMI, C. DAVATZIKOS, AND G. BIROS with Neumann boundary conditions on ∂Ω B . For the Gauss-Newton approximation to the true Hessian λ in (15c) needs to be dropped.  ... 
arXiv:1803.00058v1 fatcat:eqg2e76acfa2tl5eiknkqlyl3m

ZeroQ: A Novel Zero Shot Quantization Framework [article]

Yaohui Cai, Zhewei Yao, Zhen Dong, Amir Gholami, Michael W. Mahoney, Kurt Keutzer
2020 arXiv   pre-print
Quantization is a promising approach for reducing the inference time and memory footprint of neural networks. However, most existing quantization methods require access to the original training dataset for retraining during quantization. This is often not possible for applications with sensitive or proprietary data, e.g., due to privacy and security concerns. Existing zero-shot quantization methods use different heuristics to address this, but they result in poor performance, especially when
more » ... ntizing to ultra-low precision. Here, we propose ZeroQ , a novel zero-shot quantization framework to address this. ZeroQ enables mixed-precision quantization without any access to the training or validation data. This is achieved by optimizing for a Distilled Dataset, which is engineered to match the statistics of batch normalization across different layers of the network. ZeroQ supports both uniform and mixed-precision quantization. For the latter, we introduce a novel Pareto frontier based method to automatically determine the mixed-precision bit setting for all layers, with no manual search involved. We extensively test our proposed method on a diverse set of models, including ResNet18/50/152, MobileNetV2, ShuffleNet, SqueezeNext, and InceptionV3 on ImageNet, as well as RetinaNet-ResNet50 on the Microsoft COCO dataset. In particular, we show that ZeroQ can achieve 1.71% higher accuracy on MobileNetV2, as compared to the recently proposed DFQ method. Importantly, ZeroQ has a very low computational overhead, and it can finish the entire quantization process in less than 30s (0.5% of one epoch training time of ResNet50 on ImageNet). We have open-sourced the ZeroQ framework[%s].
arXiv:2001.00281v1 fatcat:iyheue7fybfyxpqhrou2hkdpsq

Bone-added periodontal plastic surgery: a new approach in esthetic dentistry

Gholam Ali Gholami, Hadi Gholami, Reza Amid, Mahdi Kadkhodazadeh, Amir Reza Mehdizadeh, Navid Youssefi
2015 Annals of Surgical Innovation and Research  
This article proposes a combined technique including bone grafting, connective tissue graft, and coronally advanced flap to create some space for simultaneous bone regrowth and root coverage. A 23 year-old female was referred to our private clinic with a severe class II Miller recession and lack of attached gingiva. The suggested treatment plan comprised of root coverage combined with xenograft bone particles. The grafted area healed well and full coverage was achieved at 12-month follow-up
more » ... t. Bone-added periodontal plastic surgery can be considered as a practical procedure for management of deep gingival recession without buccal bone plate.
doi:10.1186/s13022-015-0010-5 pmid:25763099 pmcid:PMC4355546 fatcat:cffesny7efb2fk2xfdogt2tttq

Adaptive Self-supervision Algorithms for Physics-informed Neural Networks [article]

Shashank Subramanian, Robert M. Kirby, Michael W. Mahoney, Amir Gholami
2022 arXiv   pre-print
Physics-informed neural networks (PINNs) incorporate physical knowledge from the problem domain as a soft constraint on the loss function, but recent work has shown that this can lead to optimization difficulties. Here, we study the impact of the location of the collocation points on the trainability of these models. We find that the vanilla PINN performance can be significantly boosted by adapting the location of the collocation points as training proceeds. Specifically, we propose a novel
more » ... tive collocation scheme which progressively allocates more collocation points (without increasing their number) to areas where the model is making higher errors (based on the gradient of the loss function in the domain). This, coupled with a judicious restarting of the training during any optimization stalls (by simply resampling the collocation points in order to adjust the loss landscape) leads to better estimates for the prediction error. We present results for several problems, including a 2D Poisson and diffusion-advection system with different forcing functions. We find that training vanilla PINNs for these problems can result in up to 70% prediction error in the solution, especially in the regime of low collocation points. In contrast, our adaptive schemes can achieve up to an order of magnitude smaller error, with similar computational complexity as the baseline. Furthermore, we find that the adaptive methods consistently perform on-par or slightly better than vanilla PINN method, even for large collocation point regimes. The code for all the experiments has been open sourced.
arXiv:2207.04084v1 fatcat:wrrgfoqfnfb7dpxohtjqk6b4uq

A Fast Post-Training Pruning Framework for Transformers [article]

Woosuk Kwon, Sehoon Kim, Michael W. Mahoney, Joseph Hassoun, Kurt Keutzer, Amir Gholami
2022 arXiv   pre-print
Pruning is an effective way to reduce the huge inference cost of large Transformer models. However, prior work on model pruning requires retraining the model. This can add high cost and complexity to model deployment, making it difficult to use in many practical situations. To address this, we propose a fast post-training pruning framework for Transformers that does not require any retraining. Given a resource constraint and a sample dataset, our framework automatically prunes the Transformer
more » ... del using structured sparsity methods. To retain high accuracy without retraining, we introduce three novel techniques: (i) a lightweight mask search algorithm that finds which heads and filters to prune based on the Fisher information; (ii) mask rearrangement that complements the search algorithm; and (iii) mask tuning that reconstructs the output activations for each layer. We apply our method to BERT-BASE and DistilBERT, and we evaluate its effectiveness on GLUE and SQuAD benchmarks. Our framework achieves up to 2.0x reduction in FLOPs and 1.56x speedup in inference latency, while maintaining < 1% loss in accuracy. Importantly, our framework prunes Transformers in less than 3 minutes on a single GPU, which is over two orders of magnitude faster than existing pruning approaches that retrain. Our code is publicly available.
arXiv:2204.09656v1 fatcat:72n2sfb7nfhpxogw2t3rlojlbe

PowerNorm: Rethinking Batch Normalization in Transformers [article]

Sheng Shen, Zhewei Yao, Amir Gholami, Michael W. Mahoney, Kurt Keutzer
2020 arXiv   pre-print
., Gholami, A., Keutzer, K., and Mahoney, M. W. Py-Hessian: Neural networks through the lens of the Hessian. arXiv preprint arXiv:1912.07145, 2019. Zagoruyko, S. and Komodakis, N.  ... 
arXiv:2003.07845v2 fatcat:sm4xdud32vaefbv7tgwccxnty4

Trust Region Based Adversarial Attack on Neural Networks [article]

Zhewei Yao and Amir Gholami and Peng Xu and Kurt Keutzer and Michael Mahoney
2018 arXiv   pre-print
Deep Neural Networks are quite vulnerable to adversarial perturbations. Current state-of-the-art adversarial attack methods typically require very time consuming hyper-parameter tuning, or require many iterations to solve an optimization based adversarial attack. To address this problem, we present a new family of trust region based adversarial attacks, with the goal of computing adversarial perturbations efficiently. We propose several attacks based on variants of the trust region optimization
more » ... method. We test the proposed methods on Cifar-10 and ImageNet datasets using several different models including AlexNet, ResNet-50, VGG-16, and DenseNet-121 models. Our methods achieve comparable results with the Carlini-Wagner (CW) attack, but with significant speed up of up to 37×, for the VGG-16 model on a Titan Xp GPU. For the case of ResNet-50 on ImageNet, we can bring down its classification accuracy to less than 0.1% with at most 1.5% relative L_∞ (or L_2) perturbation requiring only 1.02 seconds as compared to 27.04 seconds for the CW attack. We have open sourced our method which can be accessed at [1].
arXiv:1812.06371v1 fatcat:fewoe6odfzg6vgxeoiwtnedcie

HAWQ: Hessian AWare Quantization of Neural Networks with Mixed-Precision [article]

Zhen Dong, Zhewei Yao, Amir Gholami, Michael Mahoney, Kurt Keutzer
2019 arXiv   pre-print
Model size and inference speed/power have become a major challenge in the deployment of Neural Networks for many applications. A promising approach to address these problems is quantization. However, uniformly quantizing a model to ultra low precision leads to significant accuracy degradation. A novel solution for this is to use mixed-precision quantization, as some parts of the network may allow lower precision as compared to other layers. However, there is no systematic way to determine the
more » ... ecision of different layers. A brute force approach is not feasible for deep networks, as the search space for mixed-precision is exponential in the number of layers. Another challenge is a similar factorial complexity for determining block-wise fine-tuning order when quantizing the model to a target precision. Here, we introduce Hessian AWare Quantization (HAWQ), a novel second-order quantization method to address these problems. HAWQ allows for the automatic selection of the relative quantization precision of each layer, based on the layer's Hessian spectrum. Moreover, HAWQ provides a deterministic fine-tuning order for quantizing layers, based on second-order information. We show the results of our method on Cifar-10 using ResNet20, and on ImageNet using Inception-V3, ResNet50 and SqueezeNext models. Comparing HAWQ with state-of-the-art shows that we can achieve similar/better accuracy with 8× activation compression ratio on ResNet20, as compared to DNAS wu2018mixed, and up to 1% higher accuracy with up to 14% smaller models on ResNet50 and Inception-V3, compared to recently proposed methods of RVQuant park2018value and HAQ wang2018haq. Furthermore, we show that we can quantize SqueezeNext to just 1MB model size while achieving above 68% top1 accuracy on ImageNet.
arXiv:1905.03696v1 fatcat:t34dzcwgbbcbxfhim4pwkkygdy

Parameter Re-Initialization through Cyclical Batch Size Schedules [article]

Norman Mu and Zhewei Yao and Amir Gholami and Kurt Keutzer and Michael Mahoney
2018 arXiv   pre-print
Optimal parameter initialization remains a crucial problem for neural network training. A poor weight initialization may take longer to train and/or converge to sub-optimal solutions. Here, we propose a method of weight re-initialization by repeated annealing and injection of noise in the training process. We implement this through a cyclical batch size schedule motivated by a Bayesian perspective of neural network training. We evaluate our methods through extensive experiments on tasks in
more » ... age modeling, natural language inference, and image classification. We demonstrate the ability of our method to improve language modeling performance by up to 7.91 perplexity and reduce training iterations by up to 61%, in addition to its flexibility in enabling snapshot ensembling and use with adversarial training.
arXiv:1812.01216v1 fatcat:mkk5auxhunerdhd6u5zrrgbjmq

Optimal Controller With Backstepping And Belbic For Single-Link Flexible Manipulator

Ali Reza Sahab, Amir Gholami Pastaki
2011 Zenodo  
Amir Gholami Pastaki is a staff member of Electrical Group, Department of Engineering, Islamic Azad University, Lahijan Branch, Lahijan, Guilan, Iran. (e-mail:  ... 
doi:10.5281/zenodo.1086238 fatcat:u2afrl4rdver3dkrci2r37e354

An Observer-Based Direct Adaptive Fuzzy Sliding Control With Adjustable Membership Functions

Alireza Gholami, Amir H. D. Markazi
2018 Zenodo  
Generally, direct AFSM algorithms are more effective than the indirect ones, because Alireza Gholami PhD graduated from Iran University of Science and Technology and now is a researcher in electromechanical  ... 
doi:10.5281/zenodo.1340432 fatcat:h6wnhexbvnbrtaeubhhickirqi

The Used Of Environmental Ethics In Methods And Techniques Of Environmental Management

Amir Hossein Davami, Ali Gholami, Ebrahim Panahpour
2011 Zenodo  
Index, Humanities and Social Sciences Vol:5, No:9, 2011 Amir Hossein Davami, Ali Gholami, Ebrahim Panahpour The Used of Environmental Ethics in Methods and Techniques of Environmental  ... 
doi:10.5281/zenodo.1079334 fatcat:4jpz3rncqjaufo3lfynoqtsv5u

Squeezeformer: An Efficient Transformer for Automatic Speech Recognition [article]

Sehoon Kim, Amir Gholami, Albert Shaw, Nicholas Lee, Karttikeya Mangalam, Jitendra Malik, Michael W. Mahoney, Kurt Keutzer
2022 arXiv   pre-print
Amir Gholami was supported through funding from Samsung SAIT. Michael W. Mahoney would also like to acknowledge the UC Berkeley CLTC, ARO, NSF, and ONR.  ... 
arXiv:2206.00888v1 fatcat:yodjj7po3rafdp4p3ka6vijc2a
« Previous Showing results 1 — 15 out of 454 results