369 Hits in 1.4 sec

Professor Gholam-Abbas Dehghani: A Never-to-Be-Forgotten Teacher

Seyed Mostafa Moosavi
2022 Shiraz E Medical Journal  
Professor Gholam-Abbas Dehghani (January 1, 1949 -February 21, 2022).  ...  Dr Dehghani arrived at the John Hopkins University, Baltimore, USA, in 1975 and received his PhD degree in Physiology in 1980.  ... 
doi:10.5812/semj.124105 fatcat:j2him5mzyra6pa4kdip34edm3y

Intracoronary Versus Intravenous Administration of Eptifibatide During Percutaneous Coronary Intervention in Patients With Acute Coronary Syndromes

Soroush Dehghani Dashtabi, Homa Falsoleiman, Mostafa Dastani, Mohsen Mouhebati, Mostafa Ahmadi, Ramin Khameneh Bagheri, Amin Dehghan, Mashallah Dehghani Dashtabi
2020 Acta Medica Iranica  
Platelet activation and aggregation play a major role in thrombosis formation of coronary arteries in patients with Acute Coronary Syndrome (ACS) and is responsible for most ischemic complications during PCI. There is little information on the benefits and side effects of intracoronary and intravenous injection of Eptifibatide, a potent antiplatelet agent; therefore, this study was performed with the aim to compare coronary blood flow velocity by measurement of TIMI frame count. In intravenous
more » ... ersus intracoronary bolus administration of Eptifibatide during PCI in ACS patients. This non-randomized clinical trial study was performed on 103 patients with acute coronary syndromes who referred to the cardiac emergency ward of Ghaem hospital, Mashhad University of Medical Sciences, and were candidates for urgent coronary angiography and PCI. Forty-eight cases received intracoronary bolus Eptifibatide and 55 intravenous Eptifibatide. TIMI Frame Count and Corrected TIMI Frame Count were used to comparing the effect of these two methods on coronary blood flow velocity. Data were analyzed by SPSS software (version 22). To compare the quantitative variables in the two groups, according to the distribution of variables, the t-test was used if it was normal or the Mann-Whitney test was used if it was not normal. A Chi-square test was also used to compare qualitative variables into two groups. P<0.05 was considered statistically significant. Mean of age, gender, and cardiovascular risk factors were similar in the two groups. There was no significant difference in terms of serum Creatine Kinase MB (CKMB) level, Left Ventricular Ejection Fraction (LVEF), coronary artery lesion length, coronary artery diameter, coronary thrombosis, and coronary artery thrombectomy in two groups. Based on Student's t-test, there was no significant difference between mean TIMI Frame Count in different coronary arteries in the intracoronary and intravenous injection groups (In LAD, P=0.518; For LCX, P=0.576; and in RCA, P=0.964). The complications were observed in 11 patients (22.9%) of the intracoronary injection group and 9 (16.4%) of the intravenous injection group; the difference was not significant (P=0.402). The effects and complications of Eptifibatide were not significantly different in Intracoronary and intravenous administration in ACS patients during PCI and at the time of patients' hospitalization.
doi:10.18502/acta.v58i7.4421 fatcat:zbjn42bytjc2nordkjm6tjyu6u

Learning to Rank from Samples of Variable Quality [article]

Mostafa Dehghani, Jaap Kamps
2018 arXiv   pre-print
Training deep neural networks requires many training samples, but in practice, training labels are expensive to obtain and may be of varying quality, as some may be from trusted expert labelers while others might be from heuristics or other sources of weak supervision such as crowd-sourcing. This creates a fundamental quality-versus quantity trade-off in the learning process. Do we learn from the small amount of high-quality data or the potentially large amount of weakly-labeled data? We argue
more » ... hat if the learner could somehow know and take the label-quality into account when learning the data representation, we could get the best of both worlds. To this end, we introduce "fidelity-weighted learning" (FWL), a semi-supervised student-teacher approach for training deep neural networks using weakly-labeled data. FWL modulates the parameter updates to a student network (trained on the task we care about) on a per-sample basis according to the posterior confidence of its label-quality estimated by a teacher (who has access to the high-quality labels). Both student and teacher are learned from the data. We evaluate FWL on document ranking where we outperform state-of-the-art alternative semi-supervised methods.
arXiv:1806.08694v1 fatcat:qn7njxdfyvct7ceh3yub7jewg4

Fidelity-Weighted Learning [article]

Mostafa Dehghani, Arash Mehrjou, Stephan Gouws, Jaap Kamps, Bernhard Schölkopf
2018 arXiv   pre-print
The student follows the architecture proposed in (Dehghani et al., 2017d) .  ...  B DETAILED ARCHITECTURE OF THE STUDENTS B.1 RANKING TASK For the ranking task, the employed student is proposed in (Dehghani et al., 2017d) .  ... 
arXiv:1711.02799v2 fatcat:a7noewnkwzgbrcmcozmttbsfhu

Universal Transformers [article]

Mostafa Dehghani, Stephan Gouws, Oriol Vinyals, Jakob Uszkoreit, Łukasz Kaiser
2019 arXiv   pre-print
Recurrent neural networks (RNNs) sequentially process data by updating their state with each new data point, and have long been the de facto choice for sequence modeling tasks. However, their inherently sequential computation makes them slow to train. Feed-forward and convolutional architectures have recently been shown to achieve superior results on some sequence modeling tasks such as machine translation, with the added advantage that they concurrently process all inputs in the sequence,
more » ... ng to easy parallelization and faster training times. Despite these successes, however, popular feed-forward sequence models like the Transformer fail to generalize in many simple tasks that recurrent models handle with ease, e.g. copying strings or even simple logical inference when the string or formula lengths exceed those observed at training time. We propose the Universal Transformer (UT), a parallel-in-time self-attentive recurrent sequence model which can be cast as a generalization of the Transformer model and which addresses these issues. UTs combine the parallelizability and global receptive field of feed-forward sequence models like the Transformer with the recurrent inductive bias of RNNs. We also add a dynamic per-position halting mechanism and find that it improves accuracy on several tasks. In contrast to the standard Transformer, under certain assumptions, UTs can be shown to be Turing-complete. Our experiments show that UTs outperform standard Transformers on a wide range of algorithmic and language understanding tasks, including the challenging LAMBADA language modeling task where UTs achieve a new state of the art, and machine translation where UTs achieve a 0.9 BLEU improvement over Transformers on the WMT14 En-De dataset.
arXiv:1807.03819v3 fatcat:dqfdfwlevbezlhjfoik437fzoa

The Benchmark Lottery [article]

Mostafa Dehghani, Yi Tay, Alexey A. Gritsenko, Zhe Zhao, Neil Houlsby, Fernando Diaz, Donald Metzler, Oriol Vinyals
2021 arXiv   pre-print
., 2018 , Dehghani et al., 2019 , Abnar et al., 2020 . Active and in-progress. This stage may take a long time depending on how much progress is being made on the proposed task.  ...  functions, normalization and parameter initialization schemes, and also architectural extensions (e.g., Evolved Transformers [So et al., 2019] , Synthesizers [Tay et al., 2020a] , Universal Transformer [Dehghani  ... 
arXiv:2107.07002v1 fatcat:vxskgohsfzhjtalmvlkullic34

Intersection of Parallels as an Early Stopping Criterion [article]

Ali Vardasbi, Maarten de Rijke, Mostafa Dehghani
2022 arXiv   pre-print
A common way to avoid overfitting in supervised learning is early stopping, where a held-out set is used for iterative evaluation during training to find a sweet spot in the number of training steps that gives maximum generalization. However, such a method requires a disjoint validation set, thus part of the labeled data from the training set is usually left out for this purpose, which is not ideal when training data is scarce. Furthermore, when the training labels are noisy, the performance of
more » ... the model over a validation set may not be an accurate proxy for generalization. In this paper, we propose a method to spot an early stopping point in the training iterations without the need for a validation set. We first show that in the overparameterized regime the randomly initialized weights of a linear model converge to the same direction during training. Using this result, we propose to train two parallel instances of a linear model, initialized with different random seeds, and use their intersection as a signal to detect overfitting. In order to detect intersection, we use the cosine distance between the weights of the parallel models during training iterations. Noticing that the final layer of a NN is a linear map of pre-last layer activations to output logits, we build on our criterion for linear models and propose an extension to multi-layer networks, using the new notion of counterfactual weights. We conduct experiments on two areas that early stopping has noticeable impact on preventing overfitting of a NN: (i) learning from noisy labels; and (ii) learning to rank in IR. Our experiments on four widely used datasets confirm the effectiveness of our method for generalization. For a wide range of learning rates, our method, called Cosine-Distance Criterion (CDC), leads to better generalization on average than all the methods that we compare against in almost all of the tested cases.
arXiv:2208.09529v1 fatcat:t2p3htiberaurmswzwaxdesocm

Efficient Transformers: A Survey

Yi Tay, Mostafa Dehghani, Dara Bahri, Donald Metzler
2022 ACM Computing Surveys  
Transformer model architectures have garnered immense interest lately due to their effectiveness across a range of domains like language, vision and reinforcement learning. In the field of natural language processing for example, Transformers have become an indispensable staple in the modern deep learning stack. Recently, a dizzying number of "X-former" models have been proposed - Reformer, Linformer, Performer, Longformer, to name a few - which improve upon the original Transformer
more » ... , many of which make improvements around computational and memory efficiency . With the aim of helping the avid researcher navigate this flurry, this paper characterizes a large and thoughtful selection of recent efficiency-flavored "X-former" models, providing an organized and comprehensive overview of existing work and models across multiple domains.
doi:10.1145/3530811 fatcat:sil36wgiz5c23hnr5iwzu26pwi

Neural Networks for Information Retrieval [article]

Tom Kenter, Alexey Borisov, Christophe Van Gysel, Mostafa Dehghani, Maarten de Rijke, Bhaskar Mitra
2017 arXiv   pre-print
Machine learning plays a role in many aspects of modern IR systems, and deep learning is applied in all of them. The fast pace of modern-day research has given rise to many different approaches for many different IR problems. The amount of information available can be overwhelming both for junior students and for experienced researchers looking for new research topics and directions. Additionally, it is interesting to see what key insights into IR problems the new technologies are able to give
more » ... s. The aim of this full-day tutorial is to give a clear overview of current tried-and-trusted neural methods in IR and how they benefit IR research. It covers key architectures, as well as the most promising future directions.
arXiv:1707.04242v1 fatcat:4idscmq26fa5bjupldwuyghq4m

The effectiveness of home-based cardiac rehabilitation program on cardiovascular stress indices in men and women with myocardial infarction: a randomised controlled clinical trial

Mostafa Dehghani, Mostafa Cheragi, Mehrdad Namdari, Valiollah Dabidi Roshan, Morteza Dehghani
2021 Revista Colombiana de Cardiología  
La efectividad del programa de rehabilitación cardíaca domiciliaria sobre los índices de estrés cardiovascular en hombres y mujeres con infarto de miocardio: un ensayo clínico controlado aleatorizado Abstract Background: cardiac rehabilitation is a structured program to prevent secondary cardiovascular diseases. Objective: to investigate and compare the effectiveness of home-based cardiac rehabilitation program (HBCRP) on improving cardiovascular stress indices in men and women who had
more » ... ed myocardial infarction (MI). Methods: in this randomized controlled clinical trial, 80 patients with MI were divided into two groups of intervention and control (n = 40 per group). Analyses were erformed separately in females and males in the both groups. The HBCRP included receiving routine medications along with walking for 8 weeks. The control group only received the routine care along with counseling about having adequate physical activity. Cardiovascular stress indicators including heart rate at rest (HR rest ), maximum heart rate (HR max ), recovery heart rate (RHR)at 1 and 2 minutes after the exercise test (i.e. RHR1 and RHR2), systolic and diastolic blood pressures at rest (SBPR and DBPR), and rate pressure product (RPP) were measured by a researcher blinded to the intervention before and after the test. Results: the results showed significant reductions in RHR1 (p<0.001), RHR2 (p<0.01), SBPR (p<0.01), DBPR (p<0.01), and RPP (p<0.001) in both males and females in the intervention group. A significant increase was also observed in HR max (p<0.001) in the intervention group. However, there were no significant differences in HR max and other variables comparing per-and post-experiment values in the control group. Conclusion: our results showed that 8 weeks of HBCRP sex-independently reduced cardiovascular stress indices in both men and women with MI. Resumen Antecedentes: la rehabilitación cardíaca es un programa estructurado para prevenir las enfermedades cardiovasculares secundarias. Objetivo: estudiar y comparar la efectividad de un programa de rehabilitación cardíaca en casa (HBCRP, por sus siglas en inglés) en la mejoría de los índices de estrés cardiovascular en hombres y mujeres que habían sufrido
doi:10.24875/rccar.m21000025 fatcat:li25hwox35hn5owotmz4runnti

Effectiveness of Social Skills Training on Tendency to Addiction in High School Female Students

Yousef Dehghani, Mostafa Dehghani
2014 Jentashapir Journal of Health Research  
doi:10.17795/jjhr-23223 fatcat:nanjctox6zdsth3qlvfti2njky

Confident Adaptive Language Modeling [article]

Tal Schuster, Adam Fisch, Jai Gupta, Mostafa Dehghani, Dara Bahri, Vinh Q. Tran, Yi Tay, Donald Metzler
2022 arXiv   pre-print
Recent advances in Transformer-based large language models (LLMs) have led to significant performance improvements across many tasks. These gains come with a drastic increase in the models' size, potentially leading to slow and costly use at inference time. In practice, however, the series of generations made by LLMs is composed of varying levels of difficulty. While certain predictions truly benefit from the models' full capacity, other continuations are more trivial and can be solved with
more » ... ced compute. In this work, we introduce Confident Adaptive Language Modeling (CALM), a framework for dynamically allocating different amounts of compute per input and generation timestep. Early exit decoding involves several challenges that we address here, such as: (1) what confidence measure to use; (2) connecting sequence-level constraints to local per-token exit decisions; and (3) attending back to missing hidden representations due to early exits in previous tokens. Through theoretical analysis and empirical experiments on three diverse text generation tasks, we demonstrate the efficacy of our framework in reducing compute – potential speedup of up to × 3 – while provably maintaining high performance.
arXiv:2207.07061v1 fatcat:4klh7mnvovas7fyz5rporg2rbi

The Efficiency Misnomer [article]

Mostafa Dehghani and Anurag Arnab and Lucas Beyer and Ashish Vaswani and Yi Tay
2022 arXiv   pre-print
An example is the comparison of the Universal Transformer (UT) (Dehghani et al., 2018) with vanilla Transformer (Vaswani et al., 2017) Figure 1 .  ...  ., 2019; Dehghani et al., 2018) . While the number of trainable parameters can often be insightful to decide if a model fits in memory, it is unlikely to be useful as a standalone cost indicator.  ... 
arXiv:2110.12894v2 fatcat:2u57ehjyufdwjawpdk3fizssoy

Matricial Radius: A Relation of Numerical radius with Matricial Range [article]

Mohsen Kian, Mahdi Dehghani, Mostafa Sattari
2019 arXiv   pre-print
It has been shown that if T is a complex matrix, then ω(T) =1/nsup{|Tr X|; X∈ W^n(T)} =1/nsup{X_1; X∈ W^n(T)} = sup{ω(X); X∈ W^n(T)} where n is a positive integer, ω(T) is the numerical radius and W^n(T) is the n'th matricial range of T.
arXiv:1911.10748v1 fatcat:v65wsjmt3zhuhffomppr5tzymy

Learning to Learn from Weak Supervision by Full Supervision [article]

Mostafa Dehghani, Aliaksei Severyn, Sascha Rothe, Jaap Kamps
2017 arXiv   pre-print
This is usually done by pre-training the network on weak data and fine tuning it with true labels [Dehghani et al., 2017b, Severyn and Moschitti, 2015a] .  ...  Introduction Using weak or noisy supervision is a straightforward approach to increase the size of the training data [Dehghani et al., 2017b , Patrini et al., 2016 , Beigman and Klebanov, 2009 , Zeng  ... 
arXiv:1711.11383v1 fatcat:pcqvc2tw5vfclkeg3hemyary6u
« Previous Showing results 1 — 15 out of 369 results