Filters








22 Hits in 1.5 sec

TEASEL: A Transformer-Based Speech-Prefixed Language Model [article]

Mehdi Arjmand, Mohammad Javad Dousti, Hadi Moradi
2021 arXiv   pre-print
Multimodal language analysis is a burgeoning field of NLP that aims to simultaneously model a speaker's words, acoustical annotations, and facial expressions. In this area, lexicon features usually outperform other modalities because they are pre-trained on large corpora via Transformer-based models. Despite their strong performance, training a new self-supervised learning (SSL) Transformer on any modality is not usually attainable due to insufficient data, which is the case in multimodal
more » ... in multimodal language learning. This work proposes a Transformer-Based Speech-Prefixed Language Model called TEASEL to approach the mentioned constraints without training a complete Transformer model. TEASEL model includes speech modality as a dynamic prefix besides the textual modality compared to a conventional language model. This method exploits a conventional pre-trained language model as a cross-modal Transformer model. We evaluated TEASEL for the multimodal sentiment analysis task defined by CMU-MOSI dataset. Extensive experiments show that our model outperforms unimodal baseline language models by 4% and outperforms the current multimodal state-of-the-art (SoTA) model by 1% in F1-score. Additionally, our proposed method is 72% smaller than the SoTA model.
arXiv:2109.05522v1 fatcat:52em4kfwujgjlao5cxtuhdmory

LEQA

Mohammad Javad Dousti, Massoud Pedram
2013 Proceedings of the 50th Annual Design Automation Conference on - DAC '13  
This paper presents LEQA, a fast latency estimation tool for evaluating the performance of a quantum algorithm mapped to a quantum fabric. The actual quantum algorithm latency can be computed by performing detailed scheduling, placement and routing of the quantum instructions and qubits in a quantum operation dependency graph on a quantum circuit fabric. This is, however, a very expensive proposition that requires large amounts of processing time. Instead, LEQA, which is based on computing the
more » ... d on computing the neighborhood population counts of qubits, can produce estimates of the circuit latency with good accuracy (i.e., an average of less than 3% error) with up to two orders of magnitude speedup for mid-size benchmarks. This speedup is expected to increase superlinearly as a function of circuit size (operation count).
doi:10.1145/2463209.2488786 dblp:conf/dac/DoustiP13 fatcat:vjhgmeep2vaw5mzrnayhc4pndi

SimulEval: An Evaluation Toolkit for Simultaneous Translation [article]

Xutai Ma, Mohammad Javad Dousti, Changhan Wang, Jiatao Gu, Juan Pino
2020 arXiv   pre-print
Simultaneous translation on both text and speech focuses on a real-time and low-latency scenario where the model starts translating before reading the complete source input. Evaluating simultaneous translation models is more complex than offline models because the latency is another factor to consider in addition to translation quality. The research community, despite its growing focus on novel modeling approaches to simultaneous translation, currently lacks a universal evaluation procedure.
more » ... ation procedure. Therefore, we present SimulEval, an easy-to-use and general evaluation toolkit for both simultaneous text and speech translation. A server-client scheme is introduced to create a simultaneous translation scenario, where the server sends source input and receives predictions for evaluation and the client executes customized policies. Given a policy, it automatically performs simultaneous decoding and collectively reports several popular latency metrics. We also adapt latency metrics from text simultaneous translation to the speech task. Additionally, SimulEval is equipped with a visualization interface to provide better understanding of the simultaneous decoding process of a system. SimulEval has already been extensively used for the IWSLT 2020 shared task on simultaneous speech translation. Code will be released upon publication.
arXiv:2007.16193v1 fatcat:j7zkmzmqxvei5gx5ect3f6pucm

Streaming Simultaneous Speech Translation with Augmented Memory Transformer [article]

Xutai Ma, Yongqiang Wang, Mohammad Javad Dousti, Philipp Koehn, Juan Pino
2020 arXiv   pre-print
Transformer-based models have achieved state-of-the-art performance on speech translation tasks. However, the model architecture is not efficient enough for streaming scenarios since self-attention is computed over an entire input sequence and the computational cost grows quadratically with the length of the input sequence. Nevertheless, most of the previous work on simultaneous speech translation, the task of generating translations from partial audio input, ignores the time spent in
more » ... spent in generating the translation when analyzing the latency. With this assumption, a system may have good latency quality trade-offs but be inapplicable in real-time scenarios. In this paper, we focus on the task of streaming simultaneous speech translation, where the systems are not only capable of translating with partial input but are also able to handle very long or continuous input. We propose an end-to-end transformer-based sequence-to-sequence model, equipped with an augmented memory transformer encoder, which has shown great success on the streaming automatic speech recognition task with hybrid or transducer-based models. We conduct an empirical evaluation of the proposed model on segment, context and memory sizes and we compare our approach to a transformer with a unidirectional mask.
arXiv:2011.00033v1 fatcat:i5jzmqvhmnhpthl62u4tok4c4m

Squash 2: A Hierarchical Scalable Quantum Mapper Considering Ancilla Sharing [article]

Mohammad Javad Dousti, Alireza Shafaei, Massoud Pedram
2015 arXiv   pre-print
We present a multi-core reconfigurable quantum processor architecture, called Requp, which supports a hierarchical approach to mapping a quantum algorithm while sharing physical and logical ancilla qubits. Each core is capable of performing any quantum instruction. Moreover, we introduce a scalable quantum mapper, called Squash 2, which divides a given quantum circuit into a number of quantum modules---each module is divided into k parts such that each part will run on one of k available cores.
more » ... k available cores. Experimental results demonstrate that Squash~2 can handle large-scale quantum algorithms while providing an effective mechanism for sharing ancilla qubits.
arXiv:1512.07402v1 fatcat:jjowg6fturcs7g4v2cqgz5sjve

Therminator

Qing Xie, Mohammad Javad Dousti, Massoud Pedram
2014 Proceedings of the 2014 international symposium on Low power electronics and design - ISLPED '14  
Maintaining safe chip and device skin temperatures in small form-factor mobile devices (such as smartphones and tablets) while continuing to add new functionalities and provide higher performance has emerged as a key challenge. This paper presents Therminator, an early stage, fast, full-device thermal analyzer, which generates accurate steady-state temperature maps of the entire smartphone starting from the Application Processor and other key device components, extending to the skin of the
more » ... he skin of the device itself. The thermal analysis is sensitive to detailed device specifications (including its material composition and 3-D layout) as well as different use cases (each case specifying the set of active device components and their activity levels). Therminator considers all major components within the device, builds a corresponding compact thermal model for each component and the whole device, and produces their steady-state temperature maps. Temperature results obtained by using Therminator have been validated against a commercial computational fluid dynamicsbased tool, i.e., Autodesk Simulation CFD, and thermocouple measurements on a Qualcomm Mobile Developer Platform. A case study on a Samsung Galaxy S4 using Therminator is provided to relate the device performance to the skin temperature and investigate the thermal path design.
doi:10.1145/2627369.2627641 dblp:conf/islped/XieDP14 fatcat:szwfirsqobfyzgehf4lxb7qoja

Self-Training for End-to-End Speech Translation

Juan Pino, Qiantong Xu, Xutai Ma, Mohammad Javad Dousti, Yun Tang
2020 Interspeech 2020  
One of the main challenges for end-to-end speech translation is data scarcity. We leverage pseudo-labels generated from unlabeled audio by a cascade and an end-to-end speech translation model. This provides 8.3 and 5.7 BLEU gains over a strong semi-supervised baseline on the MuST-C English-French and English-German datasets, reaching state-of-the art performance. The effect of the quality of the pseudo-labels is investigated. Our approach is shown to be more effective than simply pre-training
more » ... mply pre-training the encoder on the speech recognition task. Finally, we demonstrate the effectiveness of self-training by directly generating pseudo-labels with an end-to-end model instead of a cascade model.
doi:10.21437/interspeech.2020-2938 dblp:conf/interspeech/PinoXMDT20 fatcat:yvxp2r2zsbceth76zv344yn5ay

Self-Training for End-to-End Speech Translation [article]

Juan Pino and Qiantong Xu and Xutai Ma and Mohammad Javad Dousti and Yun Tang
2020 arXiv   pre-print
One of the main challenges for end-to-end speech translation is data scarcity. We leverage pseudo-labels generated from unlabeled audio by a cascade and an end-to-end speech translation model. This provides 8.3 and 5.7 BLEU gains over a strong semi-supervised baseline on the MuST-C English-French and English-German datasets, reaching state-of-the art performance. The effect of the quality of the pseudo-labels is investigated. Our approach is shown to be more effective than simply pre-training
more » ... mply pre-training the encoder on the speech recognition task. Finally, we demonstrate the effectiveness of self-training by directly generating pseudo-labels with an end-to-end model instead of a cascade model.
arXiv:2006.02490v2 fatcat:opq3fnq6hfbs7ohvs66hekrwui

SESOS: A Verifiable Searchable Outsourcing Scheme for Ordered Structured Data in Cloud Computing

Javad Ghareh Chamani, Mohammad Sadeq Dousti, Rasool Jalili, Dimitrios Papadopoulos
2019 Isecure  
Mohammad Sadeq Dousti got his Ph.D. from Sharif University of Technology in software engineering, and his M.S. and B.S. from Sharif University of Technology in IT engineering.  ...  Javad Ghareh Chamani received his B.S. degree in software engineering from University of Tehran, Tehran, Iran, in 2012, and received his M.S. degree in software engineering from Sharif University of Technology  ... 
doi:10.22042/isecure.2019.148637.430 dblp:journals/isecure/ChamaniDJP19 fatcat:zckyq5ktxba3biod4aevor4m4q

Squash

Mohammad Javad Dousti, Alireza Shafaei, Massoud Pedram
2014 Proceedings of the 24th edition of the great lakes symposium on VLSI - GLSVLSI '14  
Quantum algorithms for solving problems of interesting size often result in circuits with a very large number of qubits and quantum gates. Fortunately, these algorithms also tend to contain a small number of repetitively-used quantum kernels. Identifying the quantum logic blocks that implement such quantum kernels is critical to the complexity management for realizing the corresponding quantum circuit. Moreover, quantum computation requires some type of quantum error correction coding to combat
more » ... on coding to combat decoherence, which in turn results in a large number of ancilla qubits in the circuit. Sharing the ancilla qubits among quantum operations (even though this sharing can increase the overall circuit latency) is important in order to curb the resource demand of the quantum algorithm. This paper presents a multi-core reconfigurable quantum processor architecture, called Requp, which supports a layered approach to mapping a quantum algorithm and ancilla sharing. More precisely, a scalable quantum mapper, called Squash, is introduced, which divides a given quantum circuit into a number of quantum kernels- each kernel comprises k parts such that each part will run on exactly one of k available cores. Experimental results demonstrate that Squash can handle large-scale quantum algorithms while providing an effective mechanism for sharing ancilla qubits.
doi:10.1145/2591513.2591523 dblp:conf/glvlsi/DoustiSP14 fatcat:7oicroa6ajgobm6iyn2tc4fu4q

Power-Aware Deployment and Control of Forced-Convection and Thermoelectric Coolers

Mohammad Javad Dousti, Massoud Pedram
2014 Proceedings of the The 51st Annual Design Automation Conference on Design Automation Conference - DAC '14  
Advances in the thermoelectric cooling technology have made it one of the promising solutions for spot cooling in VLSI circuits. Thermoelectric coolers (TECs) generate heat during their operation. This heat plus the heat generated in the circuit should be transferred to the ambient environment in order to avoid high die temperatures. This paper describes a hybrid cooling solution in which TECs are augmented with forced-convection coolers (fans). Precisely, an optimization framework called OFTEC
more » ... mework called OFTEC is presented which finds the optimum TEC driving current and the fan speed to minimize the overall power consumption of the cooling system while maintaining safe die temperatures. Simulation results on a set of eight benchmarks show the benefits of the proposed approach. In particular, a baseline system without TECs but with a fan could meet the thermal constraint for only three of the benchmarks whereas the OFTEC solution satisfied thermal constraints for all benchmarks. In addition, OFTEC resulted in 5.4% less average power consumption for the aforesaid three benchmarks while lowering the maximum die temperature by an average of 3.7℃.
doi:10.1145/2593069.2593186 dblp:conf/dac/DoustiP14 fatcat:gwg4dzk2b5ewhors7zffqm6t7a

Platform-dependent, leakage-aware control of the driving current of embedded thermoelectric coolers

Mohammad Javad Dousti, Massoud Pedram
2013 International Symposium on Low Power Electronics and Design (ISLPED)  
One of the biggest stumbling blocks for the successful continuation of the Moore's law is the substrate temperature of VLSI circuits. Thermoelectric cooling is one of the promising cooling methods to combat high die temperatures. This method provides key benefits such as compactness, high reliability, and exceptionally high heat-pumping capability. On the other hand, even with the recent advances in the fabrication techniques, thermoelectric coolers (TECs) are suffering from a poor coefficient
more » ... a poor coefficient of performance (COP), which denotes the ratio of heat removed per second to the power needed to drive the TEC, is rather low. In this paper, different techniques to improve the performance of a TEC, when it is embedded inside a processor package, are investigated. In particular, first the COP of TECs is reformulated to consider the leakage power, which is exponentially dependent on the die temperature. Next it is demonstrated that the TEC driving current that yields the maximum decrease in the die temperature is quite different from the one that runs the TEC in its highest COP state. Based on these observations, a platform-dependent, leakage-aware cooling policy in which the TEC driving current is set based on the target specs (high-performance vs. low-power) and actual conditions of the chip (emergency vs. preventive thermal management) is proposed. Experimental results show that, with this policy, one can reduce the temperature of chip hotspots while achieving a high COP.
doi:10.1109/islped.2013.6629315 dblp:conf/islped/DoustiP13 fatcat:x74venxy6jhavhxqg3qxnl3akq

ThermTap: An online power analyzer and thermal simulator for Android devices

Mohammad Javad Dousti, Majid Ghasemi-Gol, Mahdi Nazemi, Massoud Pedram
2015 2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)  
This paper introduces ThermTap, which enables system and software developers to monitor the power consumption and temperature of various hardware components in an Android device as a function of running applications and processes. ThermTap comprises of a power analyzer, called PowerTap, and an online thermal simulator, called Therminator 2. With accurate power macro-models, PowerTap collates activity profiles of major components of a portable device from the OS kernel device drivers in an
more » ... drivers in an event-driven manner to generate power traces. In turn, Therminator 2 reads these traces and, using a compact thermal model of the device, generates various temperature maps including those for the device components and device skin. Fast thermal simulation techniques enable Therminator 2 to be executed in realtime. With precise per-process and per-application temperature maps that ThermTap produces, it enables software and system developers to find thermal bugs in their software. A case study is presented on identifying a thermal bug in the software running on an Android device.
doi:10.1109/islped.2015.7273537 dblp:conf/islped/DoustiGNP15 fatcat:uzdhhzpmobgzzcwsu44asy54nm

Performance Comparisons Between 7-nm FinFET and Conventional Bulk CMOS Standard Cell Libraries

Qing Xie, Xue Lin, Yanzhi Wang, Shuang Chen, Mohammad Javad Dousti, Massoud Pedram
2015 IEEE Transactions on Circuits and Systems - II - Express Briefs  
Javad Dousti, Student, IEEE, and Massoud Pedram, Fellow, IEEE E format [21], which is widely used for logic synthesis and static timing analysis.  ...  Performance Comparisons between 7nm FinFET and Conventional Bulk CMOS Standard Cell Libraries Qing Xie, Student, IEEE, Xue Lin, Student, IEEE, Yanzhi Wang, Student, IEEE, Shuang Chen, Student, IEEE, Mohammad  ... 
doi:10.1109/tcsii.2015.2391632 fatcat:n4ec75pwubgkncol2edljifz64

5nm FinFET Standard Cell Library Optimization and Circuit Synthesis in Near-and Super-Threshold Voltage Regimes

Qing Xie, Xue Lin, Yanzhi Wang, Mohammad Javad Dousti, Alireza Shafaei, Majid Ghasemi-Gol, Massoud Pedram
2014 2014 IEEE Computer Society Annual Symposium on VLSI  
FinFET device has been proposed as a promising substitute for the traditional bulk CMOS-based device at the nanoscale, due to its extraordinary properties such as improved channel controllability, high ON/OFF current ratio, reduced short-channel effects, and relative immunity to gate line-edge roughness. In addition, the near-ideal subthreshold behavior indicates the potential application of FinFET circuits in the nearthreshold supply voltage regime, which consumes an order of magnitude less
more » ... f magnitude less energy than the regular strong-inversion circuits operating in the super-threshold supply voltage regime. This paper presents a design flow of creating standard cells by using the FinFET 5nm technology node, including both near-threshold and super-threshold operations, and building a Liberty-format standard cell library. The circuit synthesis results of various combinational and sequential circuits based on the 5nm FinFET standard cell library show up to 40X circuit speed improvement and three orders of magnitude energy reduction compared to those of 45nm bulk CMOS technology.
doi:10.1109/isvlsi.2014.101 dblp:conf/isvlsi/XieLWDSGP14 fatcat:akj4npap7rc73mooho3uw7p6ry
« Previous Showing results 1 — 15 out of 22 results