2,420 Hits in 9.5 sec

Large-Scale Music Annotation and Retrieval: Learning to Rank in Joint Semantic Spaces [article]

Jason Weston, Samy Bengio, Philippe Hamel
2011 arXiv   pre-print
In this work, we propose a method that scales to such datasets which attempts to capture the semantic similarities between the database items by modeling audio, artist names, and tags in a single low-dimensional  ...  semantic space.  ...  Acknowledgements We thank Doug Eck, Ryan Rifkin and Tom Walters for providing us with the Big-data set and extracting the relevant features on it.  ... 
arXiv:1105.5196v1 fatcat:fkyq6gujzfg63fcex6ajkuhhdq

Multi-Tasking with Joint Semantic Spaces for Large-Scale Music Annotation and Retrieval

Jason Weston, Samy Bengio, Philippe Hamel
2011 Journal of New Music Research  
In this work, we propose a method that scales to such datasets which attempts to capture the semantic similarities between the database items by modeling audio, artist names, and tags in a single low-dimensional  ...  That is, we are interested in every semantic relationship between the different musical concepts in our database.  ...  Music Annotation and Retrieval Tasks Task Definitions: In this work, we focus on being able to solve the following annotation and retrieval tasks: 1.  ... 
doi:10.1080/09298215.2011.603834 fatcat:3cuczpbgdrenlntu3wngvyn6ry

Query by Video: Cross-modal Music Retrieval

Bochen Li, Aparna Kumar
2019 Zenodo  
To retrieve music for an input video, the trained model ranks tracks in the music database by cross-modal distances to the query video.  ...  Cross-modal retrieval learns the relationship between the two types of data in a common space so that an input from one modality can retrieve data from a different modality.  ...  Learning cross-modal embeddings end-to-end using cross-modal pairs could result in a deep representations of the relationships and improved performance at scale in a music retrieval setting.  ... 
doi:10.5281/zenodo.3527881 fatcat:cwwcc6objbca7puhyt5rbxln6u

Image retrieval

Ritendra Datta, Dhiraj Joshi, Jia Li, James Z. Wang
2008 ACM Computing Surveys  
In this paper, we survey almost 300 key theoretical and empirical contributions in the current decade related to image retrieval and automatic image annotation, and discuss the spawning of related sub-fields  ...  While the last decade laid foundation to such promise, it also paved the way for a large number of new techniques and systems, got many new people involved, and triggered stronger association of weakly  ...  Once the joint word-blob probabilities have been learned, the annotation problem for a given image is reduced to a likelihood problem relating blobs and words.  ... 
doi:10.1145/1348246.1348248 fatcat:5jbcrsxkkbac5cya3zb7eb22ea

A new approach to cross-modal multimedia retrieval

Nikhil Rasiwasia, Jose Costa Pereira, Emanuele Coviello, Gabriel Doyle, Gert R.G. Lanckriet, Roger Levy, Nuno Vasconcelos
2010 Proceedings of the international conference on Multimedia - MM '10  
Two hypotheses are investigated: that 1) there is a benefit to explicitly modeling correlations between the two components, and 2) this modeling is more effective in feature spaces with higher levels of  ...  It is shown that accounting for crossmodal correlations and semantic abstraction both improve retrieval accuracy.  ...  Left) Mapping of the text and image from their respective natural spaces to a CCA space, Semantic Space and a Semantic space learned using CCA representation.  ... 
doi:10.1145/1873951.1873987 dblp:conf/mm/RasiwasiaPCDLLV10 fatcat:2qph2zemvfhz3n3jxpkeod3o5a

Semantic Annotation and Retrieval of Music and Sound Effects

Douglas Turnbull, Luke Barrington, David Torres, Gert Lanckriet
2008 IEEE Transactions on Audio, Speech, and Language Processing  
Index Terms-Audio annotation and retrieval, music information retrieval, semantic music analysis.  ...  We consider the related tasks of content-based audio annotation and retrieval as one supervised multiclass, multilabel problem in which we model the joint probability of acoustic features and words.  ...  ACKNOWLEDGMENT The authors would like to thank A. Chan, A. Cont, G. W. Cottrell, S. Dubnov, C. Elkan, L. Saul, N. Vasconcelos, and our anonymous reviewers for their helpful comments.  ... 
doi:10.1109/tasl.2007.913750 fatcat:6iqzxwufg5fp3eus6ghhm424hi

Canonical contextual distance for large-scale image annotation and retrieval

Hideki Nakayama, Tatsuya Harada, Yasuo Kuniyoshi
2009 Proceedings of the First ACM workshop on Large-scale multimedia retrieval and mining - LS-MMRM '09  
Because our learning method is highly scalable, it is even effective in a large web-scale dataset. Therefore, our similarity measure will be helpful to many other search-based methods.  ...  In this paper, we propose a method of image annotation and retrieval based on the new similarity measure, Canonical Contextual Distance.  ...  Also, it costs 285 minutes to fit PCCA in Flickr3.5M. Even in a large-scale web dataset as Flickr3.5M, the learning process can be done in a practical amount of time.  ... 
doi:10.1145/1631058.1631062 dblp:conf/mm/NakayamaHK09 fatcat:c3fvflp5hzasjpsfcdbxdl6h4u

Deep Cross-Modal Correlation Learning for Audio and Lyrics in Music Retrieval [article]

Yi Yu, Suhua Tang, Francisco Raposo, Lei Chen
2017 arXiv   pre-print
Experimental results, using audio to retrieve lyrics or using lyrics to retrieve audio, verify the effectiveness of the proposed deep correlation learning architectures in cross-modal music retrieval.  ...  Stemming from the characteristic of temporal structures of music in nature, we are motivated to learn the deep sequential correlation between audio and lyrics.  ...  Cross-modal ranking analysis is suggested to learn semantic similarity between music and image, with the aim of obtaining the optimal embedding spaces for music and image.  ... 
arXiv:1711.08976v2 fatcat:m5uk6lbadrcanpb3prxfv7lueu

Learning Audio–Sheet Music Correspondences for Cross-Modal Retrieval and Piece Identification

Matthias Dorfer, Jan Hajič jr., Andreas Arzt, Harald Frostel, Gerhard Widmer
2018 Transactions of the International Society for Music Information Retrieval  
(2) audio-to-sheet music alignment using Dynamic Time Warping (DTW) in the learned joint embedding space.  ...  In particular, we aim to learn a joint embedding space of the two modalities in which to perform nearest-neighbour search.  ... 
doi:10.5334/tismir.12 fatcat:4rrvpmp3l5brvhfetjhn7k732u

Learning Audio–Sheet Music Correspondences for Cross-Modal Retrieval and Piece Identification

Matthias Dorfer, Jan Hajič jr., Andreas Arzt, Harald Frostel, Gerhard Widmer
2018 Transactions of the International Society for Music Information Retrieval  
We propose a method that learns joint embedding spaces for short excerpts of audio and their respective counterparts in sheet music images, using multimodal convolutional neural networks.  ...  All retrieval models are trained and evaluated on a new, large scale multimodal audio-sheet music dataset which is made publicly available along with this article.  ...  Jan Hajič Jr. wishes to acknowledge support by the Czech Science Foundation grant no. P103/12/G084 and Charles University Grant Agency grant no. 1444217.  ... 
doi:10.5334/timsir.12 fatcat:6ke27ryhkvhdhas6e6zoto2vni

Deep Multimodal Learning for Affective Analysis and Retrieval

Lei Pang, Shiai Zhu, Chong-Wah Ngo
2015 IEEE transactions on multimedia  
Extensive experiments on web videos and images show that the learnt joint representation could be very compact and be complementary to hand-crafted features, leading to performance improvement in both  ...  While the model learns a joint representation over multimodal inputs, training samples in absence of certain modalities can also be leveraged.  ...  In [22] , a ListNet layer is added on top of the RBF layer for ranking the music in valence and arousal in Cartesian coordinates.  ... 
doi:10.1109/tmm.2015.2482228 fatcat:7tozmatnhvbj7hjjohkofngecq

Deep Triplet Neural Networks with Cluster-CCA for Audio-Visual Cross-modal Retrieval [article]

Donghuo Zeng, Yi Yu, Keizo Oyama
2021 arXiv   pre-print
in the shared subspace. ii) positive examples and negative examples are used in the learning stage to improve the capability of embedding learning between audio and video.  ...  datasets and semantic information.  ...  This dataset is a small subset of large-scale video dataset YouTube-8M 4 which contains 10,000 (10k) videos with a "music video" label, and each video ranging from 213 to 219 seconds.  ... 
arXiv:1908.03737v3 fatcat:qgldi32rrng27gltfbefqay4rq

Audio-Visual Embedding for Cross-Modal MusicVideo Retrieval through Supervised Deep CCA [article]

Donghuo Zeng, Yi Yu, Keizo Oyama
2019 arXiv   pre-print
Music video retrieval by given musical audio is a natural way to search and interact with music contents. In this work, we study cross-modal music video retrieval in terms of emotion similarity.  ...  Deep learning has successfully shown excellent performance in learning joint representations between different data modalities.  ...  The first Author would like to thank Francisco Raposo for discussing how to implement CCA.  ... 
arXiv:1908.03744v1 fatcat:2l2vdm7a7zatvdbk6ja2tfdmqi

Content Based Video Retrieval Systems

B V Patel
2012 International Journal of UbiComp  
Good features selection also allows the time and space costs of the retrieval process to be reduced.  ...  These features are intended for selecting, indexing and ranking according to their potential interest to the user.  ...  Dimension reduction is a popular technique to overcome this problem and support efficient retrieval in large-scale databases.  ... 
doi:10.5121/iju.2012.3202 fatcat:rrtofvbjfbakhacn565qre5enq

Latent Collaborative Retrieval [article]

Jason Weston, Ron Weiss
2012 arXiv   pre-print
Retrieval tasks typically require a ranking of items given a query. Collaborative filtering tasks, on the other hand, learn to model user's preferences over items.  ...  In this paper we study the joint problem of recommending items to a user with respect to a given query, which is a surprisingly common task.  ...  Training To Optimize Retrieval For The Top k We are interested in learning a ranking function where the top k retrieved items are of particular interest as they will be presented to the user.  ... 
arXiv:1206.4603v1 fatcat:ihjs2caelfgb3auvd2ibfatgqi
« Previous Showing results 1 — 15 out of 2,420 results