Filters








177 Hits in 8.6 sec

Intra-Utterance Similarity Preserving Knowledge Distillation for Audio Tagging

Chun-Chieh Chang, Chieh-Chi Kao, Ming Sun, Chao Wang
2020 Interspeech 2020  
This novel KD method, Intra-Utterance Similarity Preserving KD (IUSP), shows promising results for the audio tagging task.  ...  Knowledge Distillation (KD) is a popular area of research for reducing the size of large models while still maintaining good performance.  ...  The rest of the paper will be as follows: Section 2, an explanation of both Similarity Preserving KD (SP) and Intra-Utterance Similarity Preserving KD (IUSP); Section 3, a description of the audio tagging  ... 
doi:10.21437/interspeech.2020-2835 dblp:conf/interspeech/ChangKSW20 fatcat:ejgfudhgxnae3hl6iodlfs36nm

Open-set Short Utterance Forensic Speaker Verification using Teacher-Student Network with Explicit Inductive Bias [article]

Mufan Sang, Wei Xia, John H.L. Hansen
2020 arXiv   pre-print
By leveraging large-scale out-of-domain datasets, a knowledge distillation based objective function is proposed for teacher-student learning, which is applied for short utterance forensic speaker verification  ...  The objective function collectively considers speaker classification loss, Kullback-Leibler divergence, and similarity of embeddings.  ...  We process the raw data with the following steps: (1) extract audio at 16kHz sample rate; (2) using human manual annotation, perform diarization on the audio stream (tag speaker identity); (3) tag audio  ... 
arXiv:2009.09556v1 fatcat:ftfbb4pgw5ehfcwxx3n7ldy7jm

Open-Set Short Utterance Forensic Speaker Verification Using Teacher-Student Network with Explicit Inductive Bias

Mufan Sang, Wei Xia, John H.L. Hansen
2020 Interspeech 2020  
By leveraging large-scale out-of-domain datasets, a knowledge distillation based objective function is proposed for teacher-student learning, which is applied for short utterance forensic speaker verification  ...  The objective function collectively considers speaker classification loss, Kullback-Leibler divergence, and similarity of embeddings.  ...  We process the raw data with the following steps: (1) extract audio at 16kHz sample rate; (2) using human manual annotation, perform diarization on the audio stream (tag speaker identity); (3) tag audio  ... 
doi:10.21437/interspeech.2020-2868 dblp:conf/interspeech/SangXH20 fatcat:zyhbs2x6e5aephdojohhi6dwuu

Joint Weakly Supervised AT and AED Using Deep Feature Distillation and Adaptive Focal Loss [article]

Yunhao Liang, Yanhua Long, Yijie Li, Jiaen Liang
2021 arXiv   pre-print
A good joint training framework is very helpful to improve the performances of weakly supervised audio tagging (AT) and acoustic event detection (AED) simultaneously.  ...  In this study, we propose three methods to improve the best teacher-student framework of DCASE2019 Task 4 for both AT and AED tasks.  ...  Most related works choose to distill the intermediate information between teacher and student model using the whole feature maps directly, including the intra-utterance similarity preserving KD that proposed  ... 
arXiv:2103.12388v1 fatcat:v7ibnginh5b53hpn3cudyj2lsy

Multimodal Co-learning: Challenges, Applications with Datasets, Recent Advances and Future Directions [article]

Anil Rahate, Rahee Walambe, Sheela Ramanna, Ketan Kotecha
2021 arXiv   pre-print
The modeling of a (resource-poor) modality is aided by exploiting knowledge from another (resource-rich) modality using transfer of knowledge between modalities, including their representations and predictive  ...  Our final goal is to discuss challenges and perspectives along with the important ideas and directions for future work that we hope to be beneficial for the entire research community focusing on this exciting  ...  The issue of preserving intra-modality similarity or dissimilarity structure is addressed using a discriminative algorithm for each modality.  ... 
arXiv:2107.13782v2 fatcat:s4spofwxjndb7leqbcqnwbifq4

Paper Titles

2019 2019 IEEE 8th Global Conference on Consumer Electronics (GCCE)  
V2 Based Real-Time Motion Comparison with Re-targeting and Color Code Feedback Knowledge Distillation Using Soft and Hard Labels and Annealing for Acoustic Model Training L 3 A B C D E F G H I  ...  for UWB Systems Detailed Evaluation of a Wind Noise Reduction Method Using DNN for 3D Audio Navigation System Audio Augmented Reality for Bicycles Detection of Bending Motion at Waist of Kitchen Workers  ... 
doi:10.1109/gcce46687.2019.9015409 fatcat:6k3r6jixrvglrkrkzek636gb54

A methodology for audio ingestion, restoration and analysis in the sound archiving field

Enric Giné, Jordi Janer
2013 Zenodo  
Reviewed knowledge has been applied to a specific patrimonial collection, allowing thorough documentation, mechanical restoration and preservation, signal extraction and digital processing for relevant  ...  It also aims at evaluating some current tools available for the analysis and restoration of degraded audio signals.  ...  , audio inpainting 75 , (blind) source separation, audio enhancement and similar approaches in the world of audio archiving?  ... 
doi:10.5281/zenodo.3754227 fatcat:tyahsfkfivcfjgxrjub7no2kee

Compression of Deep Learning Models for Text: A Survey [article]

Manish Gupta, Puneet Agrawal
2021 arXiv   pre-print
In this survey, wediscuss six different types of methods (Pruning, Quantization, Knowledge Distillation, Parameter Sharing, Tensor Decomposition, andSub-quadratic Transformer based methods) for compression  ...  building applications with efficient and small models, and the large amount of recently published work inthis area, we believe that this survey organizes the plethora of work done by the 'deep learning for  ...  First, they investigate the efficacy of various Knowledge Distillation techniques to significantly reduce the size of the models with respect to the depth and hidden state sizes while preserving the accuracy  ... 
arXiv:2008.05221v4 fatcat:6frf2wzi7zganaqgkuvy4szgmq

A Metaverse: taxonomy, components, applications, and open challenges

Sang-Min Park, Young-Gab Kim
2022 IEEE Access  
Finally, we summarize the limitations and directions for implementing the immersive Metaverse as social influences, constraints, and open challenges.  ...  The integration of enhanced social activities and neural-net methods requires a new definition of Metaverse suitable for the present, different from the previous Metaverse.  ...  It is a new attention-based regularization for encoders and an online knowledge distillation method to improve knowledge transfer.  ... 
doi:10.1109/access.2021.3140175 fatcat:fnraeaz74vh33knfvhzrynesli

Survey of Generative Methods for Social Media Analysis [article]

Stan Matwin, Aristides Milios, Paweł Prałat, Amilcar Soares, François Théberge
2021 arXiv   pre-print
This survey draws a broad-stroke, panoramic picture of the State of the Art (SoTA) of the research in generative methods for the analysis of social media data.  ...  Social dynamics are important for understanding the spreading of influence or diseases, formation of friendships, the productivity of teams, etc.  ...  DistilBERT uses a technique known as knowledge distillation to transfer the learned knowledge from the full BERT network onto a smaller model, seeking to preserve as much accuracy as possible.  ... 
arXiv:2112.07041v1 fatcat:xgmduwctpbddfo67y6ack5s2um

Speech-gesture driven multimodal interfaces for crisis management

R. Sharma, M. Yeasin, N. Krahntoever, I. Rauschert, Guoray Cai, I. Brewer, A.M. Maceachren, K. Sengupta
2003 Proceedings of the IEEE  
In particular it describes, the evolution and implementation details of two representative systems, called crisis management (XISM) and Dialog Assisted Visual Environment for Geoinformation (DAVE_G).  ...  This paper establishes the importance of multimodal interfaces in various aspects of crisis management and explores many issues in realizing successful speech-gesture driven, dialog-enabled interfaces for  ...  , such as for head tracking [68] ) and because of intra-individual shape variability.  ... 
doi:10.1109/jproc.2003.817145 fatcat:flbaisvreresla7wufztzpnvfq

Sentiment analysis using deep learning approaches: an overview

Olivier Habimana, Yuhua Li, Ruixuan Li, Xiwu Gu, Ge Yu
2019 Science China Information Sciences  
Moreover, based on knowledge learned from previous studies, the future work subsection shows the suggestions that can be incorporated into new deep learning models to yield better performance.  ...  Suggestions include the use of bidirectional encoder representations from transformers (BERT), sentiment-specific word embedding models, cognition-based attention models, common sense knowledge, reinforcement  ...  [127] proposed a graph memory fusion network (graph-MFN) that learns different features of utterances by applying three parallel LSTMs for visual, audio and acoustic modalities.  ... 
doi:10.1007/s11432-018-9941-6 fatcat:nbevrfiyybhszirol2af26c6ve

A Comprehensive Survey on Deep Music Generation: Multi-level Representations, Algorithms, Evaluations, and Future Directions [article]

Shulei Ji, Jing Luo, Xinyu Yang
2020 arXiv   pre-print
generation converts scores with performance characteristics into audio by assigning timbre or generates music in audio format directly.  ...  In addition, we summarize the datasets suitable for diverse tasks, discuss the music representations, the evaluation methods as well as the challenges under different levels, and finally point out several  ...  It provides full-length and high-quality audio, pre-calculated features, together with track-and user-level metadata, tags, and free-form text (as biographies).  ... 
arXiv:2011.06801v1 fatcat:cixou3d2jzertlcpb7kb5x5ery

Neural Natural Language Processing for Unstructured Data in Electronic Health Records: a Review [article]

Irene Li, Jessica Pan, Jeremy Goldwasser, Neha Verma, Wai Pan Wong, Muhammed Yavuz Nuzumlalı, Benjamin Rosand, Yixin Li, Matthew Zhang, David Chang, R. Andrew Taylor, Harlan M. Krumholz (+1 others)
2021 arXiv   pre-print
In this survey paper, we summarize current neural NLP methods for EHR applications.  ...  We focus on a broad scope of tasks, namely, classification and prediction, word embeddings, extraction, generation, and other topics such as question answering, phenotyping, knowledge graphs, medical dialogue  ...  [29] , who introduced a knowledge-distillation approach called interpretable mimic learning.  ... 
arXiv:2107.02975v1 fatcat:nayhw7gadfdzrovycdkvzy75pi

Deep Learning in Information Security [article]

Stefan Thaler, Vlado Menkovski, Milan Petkovic
2018 arXiv   pre-print
If DL-methods succeed to solve problems on a data type in one domain, they most likely will also succeed on similar data from another domain.  ...  They validate their approach on their own dataset with 101 persons and 4584 utterances and their method achieves an utterance classification accuracy of 63.5%.  ...  They can be used to learn functions that map se- quences to other sequences (many-to-many), for example, to tag sequences with specific labels or for natural language translation [29].  ... 
arXiv:1809.04332v1 fatcat:xfb7lgrkw5cirdl3qvmg3ssnbi
« Previous Showing results 1 — 15 out of 177 results