A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Intra-Utterance Similarity Preserving Knowledge Distillation for Audio Tagging
2020
Interspeech 2020
This novel KD method, Intra-Utterance Similarity Preserving KD (IUSP), shows promising results for the audio tagging task. ...
Knowledge Distillation (KD) is a popular area of research for reducing the size of large models while still maintaining good performance. ...
The rest of the paper will be as follows: Section 2, an explanation of both Similarity Preserving KD (SP) and Intra-Utterance Similarity Preserving KD (IUSP); Section 3, a description of the audio tagging ...
doi:10.21437/interspeech.2020-2835
dblp:conf/interspeech/ChangKSW20
fatcat:ejgfudhgxnae3hl6iodlfs36nm
Open-set Short Utterance Forensic Speaker Verification using Teacher-Student Network with Explicit Inductive Bias
[article]
2020
arXiv
pre-print
By leveraging large-scale out-of-domain datasets, a knowledge distillation based objective function is proposed for teacher-student learning, which is applied for short utterance forensic speaker verification ...
The objective function collectively considers speaker classification loss, Kullback-Leibler divergence, and similarity of embeddings. ...
We process the raw data with the following steps: (1) extract audio at 16kHz sample rate; (2) using human manual annotation, perform diarization on the audio stream (tag speaker identity); (3) tag audio ...
arXiv:2009.09556v1
fatcat:ftfbb4pgw5ehfcwxx3n7ldy7jm
Open-Set Short Utterance Forensic Speaker Verification Using Teacher-Student Network with Explicit Inductive Bias
2020
Interspeech 2020
By leveraging large-scale out-of-domain datasets, a knowledge distillation based objective function is proposed for teacher-student learning, which is applied for short utterance forensic speaker verification ...
The objective function collectively considers speaker classification loss, Kullback-Leibler divergence, and similarity of embeddings. ...
We process the raw data with the following steps: (1) extract audio at 16kHz sample rate; (2) using human manual annotation, perform diarization on the audio stream (tag speaker identity); (3) tag audio ...
doi:10.21437/interspeech.2020-2868
dblp:conf/interspeech/SangXH20
fatcat:zyhbs2x6e5aephdojohhi6dwuu
Joint Weakly Supervised AT and AED Using Deep Feature Distillation and Adaptive Focal Loss
[article]
2021
arXiv
pre-print
A good joint training framework is very helpful to improve the performances of weakly supervised audio tagging (AT) and acoustic event detection (AED) simultaneously. ...
In this study, we propose three methods to improve the best teacher-student framework of DCASE2019 Task 4 for both AT and AED tasks. ...
Most related works choose to distill the intermediate information between teacher and student model using the whole feature maps directly, including the intra-utterance similarity preserving KD that proposed ...
arXiv:2103.12388v1
fatcat:v7ibnginh5b53hpn3cudyj2lsy
Multimodal Co-learning: Challenges, Applications with Datasets, Recent Advances and Future Directions
[article]
2021
arXiv
pre-print
The modeling of a (resource-poor) modality is aided by exploiting knowledge from another (resource-rich) modality using transfer of knowledge between modalities, including their representations and predictive ...
Our final goal is to discuss challenges and perspectives along with the important ideas and directions for future work that we hope to be beneficial for the entire research community focusing on this exciting ...
The issue of preserving intra-modality similarity or dissimilarity structure is addressed using a discriminative algorithm for each modality. ...
arXiv:2107.13782v2
fatcat:s4spofwxjndb7leqbcqnwbifq4
Paper Titles
2019
2019 IEEE 8th Global Conference on Consumer Electronics (GCCE)
V2 Based Real-Time Motion Comparison with Re-targeting and Color Code Feedback Knowledge Distillation Using Soft and Hard Labels and Annealing for Acoustic Model Training
L 3 A B C D E F G H I ...
for UWB Systems Detailed Evaluation of a Wind Noise Reduction Method Using DNN for 3D Audio Navigation System Audio Augmented Reality for Bicycles Detection of Bending Motion at Waist of Kitchen Workers ...
doi:10.1109/gcce46687.2019.9015409
fatcat:6k3r6jixrvglrkrkzek636gb54
A methodology for audio ingestion, restoration and analysis in the sound archiving field
2013
Zenodo
Reviewed knowledge has been applied to a specific patrimonial collection, allowing thorough documentation, mechanical restoration and preservation, signal extraction and digital processing for relevant ...
It also aims at evaluating some current tools available for the analysis and restoration of degraded audio signals. ...
, audio inpainting 75 , (blind) source separation, audio enhancement and similar approaches in the world of audio archiving? ...
doi:10.5281/zenodo.3754227
fatcat:tyahsfkfivcfjgxrjub7no2kee
Compression of Deep Learning Models for Text: A Survey
[article]
2021
arXiv
pre-print
In this survey, wediscuss six different types of methods (Pruning, Quantization, Knowledge Distillation, Parameter Sharing, Tensor Decomposition, andSub-quadratic Transformer based methods) for compression ...
building applications with efficient and small models, and the large amount of recently published work inthis area, we believe that this survey organizes the plethora of work done by the 'deep learning for ...
First, they investigate the efficacy of various Knowledge Distillation techniques to significantly reduce the size of the models with respect to the depth and hidden state sizes while preserving the accuracy ...
arXiv:2008.05221v4
fatcat:6frf2wzi7zganaqgkuvy4szgmq
A Metaverse: taxonomy, components, applications, and open challenges
2022
IEEE Access
Finally, we summarize the limitations and directions for implementing the immersive Metaverse as social influences, constraints, and open challenges. ...
The integration of enhanced social activities and neural-net methods requires a new definition of Metaverse suitable for the present, different from the previous Metaverse. ...
It is a new attention-based regularization for encoders and an online knowledge distillation method to improve knowledge transfer. ...
doi:10.1109/access.2021.3140175
fatcat:fnraeaz74vh33knfvhzrynesli
Survey of Generative Methods for Social Media Analysis
[article]
2021
arXiv
pre-print
This survey draws a broad-stroke, panoramic picture of the State of the Art (SoTA) of the research in generative methods for the analysis of social media data. ...
Social dynamics are important for understanding the spreading of influence or diseases, formation of friendships, the productivity of teams, etc. ...
DistilBERT uses a technique known as knowledge distillation to transfer the learned knowledge from the full BERT network onto a smaller model, seeking to preserve as much accuracy as possible. ...
arXiv:2112.07041v1
fatcat:xgmduwctpbddfo67y6ack5s2um
Speech-gesture driven multimodal interfaces for crisis management
2003
Proceedings of the IEEE
In particular it describes, the evolution and implementation details of two representative systems, called crisis management (XISM) and Dialog Assisted Visual Environment for Geoinformation (DAVE_G). ...
This paper establishes the importance of multimodal interfaces in various aspects of crisis management and explores many issues in realizing successful speech-gesture driven, dialog-enabled interfaces for ...
, such as for head tracking [68] ) and because of intra-individual shape variability. ...
doi:10.1109/jproc.2003.817145
fatcat:flbaisvreresla7wufztzpnvfq
Sentiment analysis using deep learning approaches: an overview
2019
Science China Information Sciences
Moreover, based on knowledge learned from previous studies, the future work subsection shows the suggestions that can be incorporated into new deep learning models to yield better performance. ...
Suggestions include the use of bidirectional encoder representations from transformers (BERT), sentiment-specific word embedding models, cognition-based attention models, common sense knowledge, reinforcement ...
[127] proposed a graph memory fusion network (graph-MFN) that learns different features of utterances by applying three parallel LSTMs for visual, audio and acoustic modalities. ...
doi:10.1007/s11432-018-9941-6
fatcat:nbevrfiyybhszirol2af26c6ve
A Comprehensive Survey on Deep Music Generation: Multi-level Representations, Algorithms, Evaluations, and Future Directions
[article]
2020
arXiv
pre-print
generation converts scores with performance characteristics into audio by assigning timbre or generates music in audio format directly. ...
In addition, we summarize the datasets suitable for diverse tasks, discuss the music representations, the evaluation methods as well as the challenges under different levels, and finally point out several ...
It provides full-length and high-quality audio, pre-calculated features, together with track-and user-level metadata, tags, and free-form text (as biographies). ...
arXiv:2011.06801v1
fatcat:cixou3d2jzertlcpb7kb5x5ery
Neural Natural Language Processing for Unstructured Data in Electronic Health Records: a Review
[article]
2021
arXiv
pre-print
In this survey paper, we summarize current neural NLP methods for EHR applications. ...
We focus on a broad scope of tasks, namely, classification and prediction, word embeddings, extraction, generation, and other topics such as question answering, phenotyping, knowledge graphs, medical dialogue ...
[29] , who introduced a knowledge-distillation approach called interpretable mimic learning. ...
arXiv:2107.02975v1
fatcat:nayhw7gadfdzrovycdkvzy75pi
Deep Learning in Information Security
[article]
2018
arXiv
pre-print
If DL-methods succeed to solve problems on a data type in one domain, they most likely will also succeed on similar data from another domain. ...
They validate their approach
on their own dataset with 101 persons and 4584 utterances and their method achieves an utterance classification
accuracy of 63.5%. ...
They can be used to learn functions that map se-
quences to other sequences (many-to-many), for example, to tag sequences with specific labels or for natural language
translation [29]. ...
arXiv:1809.04332v1
fatcat:xfb7lgrkw5cirdl3qvmg3ssnbi
« Previous
Showing results 1 — 15 out of 177 results