Filters








3,211 Hits in 5.3 sec

Learning Representations from Audio-Visual Spatial Alignment [article]

Pedro Morgado, Yi Li, Nuno Vasconcelos
2020 arXiv   pre-print
To learn from these spatial cues, we tasked a network to perform contrastive audio-visual spatial alignment of 360 video and spatial audio.  ...  Prior work on audio-visual representation learning leverages correspondences at the video level.  ...  Broader Impact Self-supervision reduces the need for human labeling, which is in some sense less affected by human biases. However, deep learning systems are trained from data.  ... 
arXiv:2011.01819v1 fatcat:mjof6zfkrffgnprsll3y5mg75a

Listen to Look: Action Recognition by Previewing Audio [article]

Ruohan Gao, Tae-Hyun Oh, Kristen Grauman, Lorenzo Torresani
2020 arXiv   pre-print
First, we devise an ImgAud2Vid framework that hallucinates clip-level features by distilling from lighter modalities---a single frame and its accompanying audio---reducing short-term temporal redundancy  ...  We propose a framework for efficient action recognition in untrimmed video that uses audio as a preview mechanism to eliminate both short-term and long-term visual redundancies.  ...  We propose to distill the knowledge from an expensive clip-based model to a lightweight image-audio based model.  ... 
arXiv:1912.04487v3 fatcat:w3smjfakfze4pcg7wc2iobvmja

Speech2Video: Cross-Modal Distillation for Speech to Video Generation [article]

Shijing Si, Jianzong Wang, Xiaoyang Qu, Ning Cheng, Wenqi Wei, Xinghua Zhu, Jing Xiao
2021 arXiv   pre-print
The challenge mainly lies in disentangling the distinct visual attributes from audio signals.  ...  The extracted features are then integrated by a generative adversarial network into talking face video clips.  ...  By "recollecting" a face from learned audio features, the proposed framework produces a video with a face observed in the training process.  ... 
arXiv:2107.04806v1 fatcat:lbx74ctptvdtdjiwyrvbq4cnhu

Construction of a Soundscape-Based Media Art Exhibition to Improve User Appreciation Experience by Using Deep Neural Networks

Youngjun Kim, Hayoung Jeong, Jun-Dong Cho, Jitae Shin
2021 Electronics  
The objective of this study was to improve user experience when appreciating visual artworks with soundscape music chosen by a deep neural network based on weakly supervised learning.  ...  Our proposed method can also help spread soundscape-based media art by supporting traditional soundscape design.  ...  Knowledge Distillation Knowledge distillation began from mimic model [54] , and is a method to learn differences among distributions for model compression.  ... 
doi:10.3390/electronics10101170 fatcat:euyp6wqp6rblnmwfegbsbfmc7i

Adaptive Knowledge Visualization Systems: A Proposal and Implementation

Xiaoyan Bai
2011 International Journal of e-Education, e-Business, e-Management and e-Learning  
First, KMS tend to provide weak support for leveraging visualization to accomplish transformation, discovery and learning of knowledge.  ...  Visualizations are crucial to the creation, transfer and sharing of knowledge.  ...  of visual compositions.  ... 
doi:10.7763/ijeeee.2011.v1.30 fatcat:ka5ohc7uuzaytnnxpwyhxvkhky

Human Action Recognition from Various Data Modalities: A Review [article]

Zehua Sun, Qiuhong Ke, Hossein Rahmani, Mohammed Bennamoun, Gang Wang, Jun Liu
2021 arXiv   pre-print
Specifically, we review the current mainstream deep learning methods for single data modalities and multiple data modalities, including the fusion-based and the co-learning-based frameworks.  ...  Human actions can be represented using various data modalities, such as RGB, skeleton, depth, infrared, point cloud, event stream, audio, acceleration, radar, and WiFi signal, which encode different sources  ...  The learned model was then fine-tuned on the HAR datasets for audio-visual HAR. Inspired by TSN [50] , Kazakos et al.  ... 
arXiv:2012.11866v4 fatcat:twjnaur2jzahznci6clkadylay

Lines of net music

Golo Föllmer
2005 Contemporary Music Review  
This article describes major lines of networked music as observed in the course of extended scholarship on the subject done by the author.  ...  The Internet protocols' condition of hosting systems for two-way communication is considered to mark the central difference from earlier media of audio diffusion.  ...  They function as the visible and audible surface of algorithmic sound and image compositions and of the type of audio-visual software art (i.e., of works that reflect the anthropological and socio-political  ... 
doi:10.1080/07494460500296102 fatcat:rwzbi6o7hrg63gmynykf2rphmy

AudioCLIP: Extending CLIP to Image, Text and Audio [article]

Andrey Guzhov, Federico Raue, Jörn Hees, Andreas Dengel
2021 arXiv   pre-print
In this work, we present an extension of the CLIP model that handles audio in addition to text and images.  ...  Our proposed model incorporates the ESResNeXt audio-model into the CLIP framework using the AudioSet dataset.  ...  distillation setup.  ... 
arXiv:2106.13043v1 fatcat:4nlu5mbjzzfhbp65yghriqbqpi

Contrastive Representation Learning: A Framework and Review

Phuc H. Le-Khac, Graham Healy, Alan F. Smeaton
2020 IEEE Access  
Science Foundation Ireland through the SFI Centre for Research Training in Machine Learning (18/CRT/6183) and the Insight Centre for Data Analytics (SFI/12/RC/2289_P2).  ...  Not limited to learning representations, contrastive learning can also be applied to distill knowledge from a large pre-trained teacher network to a smaller student network, as demonstrated in Contrastive  ...  The Audio-Visual Embedding Network (AVE-Net) [4] is an example where contrastive learning is applied to this problem.  ... 
doi:10.1109/access.2020.3031549 fatcat:qohhn2f2tray5ha3iafxbnwp74

Contrastive Representation Learning: A Framework and Review [article]

Phuc H. Le-Khac, Graham Healy, Alan F. Smeaton
2020 arXiv   pre-print
Examples of how contrastive learning has been applied in computer vision, natural language processing, audio processing, and others, as well as in Reinforcement Learning are also presented.  ...  In this paper we provide a comprehensive literature review and we propose a general Contrastive Representation Learning framework that simplifies and unifies many different contrastive learning methods  ...  Not limited to learning representations, contrastive learning can also be applied to distill knowledge from a large pre-trained teacher network to a smaller student network, as demonstrated in Contrastive  ... 
arXiv:2010.05113v1 fatcat:xdegcaoarvevdfl4r22pyzqr4e

Techniques for Symbol Grounding with SATNet [article]

Sever Topan, David Rolnick, Xujie Si
2021 arXiv   pre-print
Many experts argue that the future of artificial intelligence is limited by the field's ability to integrate symbolic logical reasoning into deep learning architectures.  ...  For instance, it can learn the rules of Sudoku purely from image examples.  ...  Knowledge Distillation Knowledge Distillation is a technique for training machine learning models to reach comparable performance at inference time to a larger reference model, or an ensemble of models  ... 
arXiv:2106.11072v1 fatcat:y4di2d4abjh4xhki6bb6mdqww4

Empowering Knowledge Distillation via Open Set Recognition for Robust 3D Point Cloud Classification [article]

Ayush Bhardwaj, Sakshee Pimpale, Saurabh Kumar, Biplab Banerjee
2020 arXiv   pre-print
We propose a joint Knowledge Distillation and Open Set recognition training methodology for three-dimensional object recognition.  ...  Deeper models provide better performance, but are challenging to deploy and knowledge distillation allows us to train smaller models with minimal loss in performance.  ...  Distilling knowledge from a large trained machine learning model to a smaller model was first introduced by [5] .  ... 
arXiv:2010.13114v1 fatcat:zuroei4slrgibiiec6wccumqiq

Learning Sight from Sound: Ambient Sound Provides Supervision for Visual Learning [article]

Andrew Owens, Jiajun Wu, Josh H. McDermott, William T. Freeman, Antonio Torralba
2017 arXiv   pre-print
In this work, we show that ambient sounds can be used as a supervisory signal for learning visual models.  ...  Finally, we show through visualizations that the network learns units that are selective to objects that are often associated with characteristic sounds.  ...  It was also supported by Shell Research, and by a donation of GPUs from NVIDIA.  ... 
arXiv:1712.07271v1 fatcat:qr7kpstgpjhhtb3ypprj3kybnu

Conversations With Expert Users In Music Retrieval And Research Challenges For Creative Mir

Kristina Andersen, Peter Knees
2016 Zenodo  
The Wekinator [22] by Fiebrink is a real-time, interactive machine learning toolkit that can be used in the processes of music composition and performance, as well as to build new musical interfaces  ...  In these systems, the visual aspect is the spatial arrangement of sounds, however, this does not reflect the mental models but rather requires the user to learn the mapping provided by the system.  ... 
doi:10.5281/zenodo.1418323 fatcat:jivskzuvu5ftdjjwcrvsws5hsi

Content-based Recommendations for Radio Stations with Deep Learned Audio Fingerprints [article]

Stefan Langer, Liza Obermeier, André Ebert, Markus Friedrich, Emma Munisamy, Claudia Linnhoff-Popien
2020 arXiv   pre-print
We show that the proposed fingerprints are especially useful for characterizing radio stations by their audio content and thus are an excellent representation for meaningful and reliable radio station  ...  Therefore, we propose a new pipeline for the generation of audio-based radio station fingerprints relying on audio stream crawling and a Deep Autoencoder.  ...  Therefore, we propose a Deep Learning-based audio crawling and fingerprint extraction pipeline for the characterization of radio stations and show visual results for numerous stations.  ... 
arXiv:2007.07486v1 fatcat:rhwrjyohffa6zhie4fjypq6vhi
« Previous Showing results 1 — 15 out of 3,211 results