Filters








831 Hits in 5.6 sec

Fine-Grained Image Analysis with Deep Learning: A Survey [article]

Xiu-Shen Wei and Yi-Zhe Song and Oisin Mac Aodha and Jianxin Wu and Yuxin Peng and Jinhui Tang and Jian Yang and Serge Belongie
2021 arXiv   pre-print
., species of birds or models of cars. The small inter-class and large intra-class variation inherent to fine-grained image analysis makes it a challenging problem.  ...  Capitalizing on advances in deep learning, in recent years we have witnessed remarkable progress in deep learning powered FGIA.  ...  ACKNOWLEDGMENTS The authors would like to thank the editor and the anonymous reviewers for their constructive comments.  ... 
arXiv:2111.06119v2 fatcat:ninawxsjtnf4lndtqquuwl3weq

Unsupervised Cross-Modal Audio Representation Learning from Unstructured Multilingual Text [article]

Alexander Schindler, Sergiu Gordea, Peter Knees
2020 arXiv   pre-print
Based on a triplet neural network architecture, we harnesses semantically related cross-modal information to estimate audio track-relatedness.  ...  This LSI topic modelling facilitates fine-grained selection of similar and dissimilar audio-track pairs to learn the audio representation using a Convolution Recurrent Neural Network (CRNN).  ...  Representation learning using Deep Neural Networks (DNN) has been actively explored in recent years [27, 28] as an alternative to feature engineering.  ... 
arXiv:2003.12265v1 fatcat:s7hw33hhk5ho3jsvpgqhi4hfvm

Learning Neural Textual Representations for Citation Recommendation

Binh Thanh Kieu, Inigo Jauregi Unanue, Son Bao Pham, Hieu Xuan Phan, Massimo Piccardi
2021 2020 25th International Conference on Pattern Recognition (ICPR)  
Yap, Deep Yap; Chiang, Cheng-Ming 486 Answer-Checking in Context: A Multi-Modal Fully Attention Network for Visual Question Answering DAY 2 -Jan 13, 2021 Song, Hang; Song, Yonghong; Zhang, Yuanlin  ...  for Musical Onset Detection DAY 2 -Jan 13, 2021 Hou, Zejiang; Kung, SY 2636 A Discriminant Information Approach to Deep Neural Network Pruning DAY 2 -Jan 13, 2021 Lee, Wei-Han; Millman, Steve  ... 
doi:10.1109/icpr48806.2021.9412725 fatcat:3vge2tpd2zf7jcv5btcixnaikm

Toward a Computational Neuroethology of Vocal Communication: From Bioacoustics to Neurophysiology, Emerging Tools and Future Directions

Tim Sainburg, Timothy Q. Gentner
2021 Frontiers in Behavioral Neuroscience  
This review describes emerging techniques that can be applied to acoustic and vocal communication signals with the goal of enabling study beyond a small number of model species.  ...  Along with a discussion of recent advances and techniques, we include challenges and broader goals in establishing a framework for the computational neuroethology of vocal communication.  ...  Both authors contributed to the article and approved the submitted version.  ... 
doi:10.3389/fnbeh.2021.811737 pmid:34987365 pmcid:PMC8721140 fatcat:7f6kdcuuxneapcuxfrqbvzdcfi

Multimodal Image Synthesis and Editing: A Survey [article]

Fangneng Zhan, Yingchen Yu, Rongliang Wu, Jiahui Zhang, Shijian Lu
2021 arXiv   pre-print
We start with an introduction to different types of guidance modalities in image synthesis and editing.  ...  vision and deep learning research.  ...  new paradigm for multi- CUB-200 Birds, COCO) are collected from the related litera- modal image synthesis and editing.  ... 
arXiv:2112.13592v1 fatcat:hxkfyxbtbfgltju323os3xompe

Selective and Efficient Neural Coding of Communication Signals Depends on Early Acoustic and Social Environment

Noopur Amin, Michael Gastpar, Frédéric E. Theunissen, Ehsan Arabzadeh
2013 PLoS ONE  
and for shaping neural spiking precision in superficial and deep cortical laminae, and for creating efficient neural representations of song and a less redundant ensemble code in all the laminae.  ...  Here we examined the effects of noise-rearing and social isolation on the neural processing of communication sounds such as species-specific song, in the primary auditory cortex analog of adult zebra finches  ...  Example neural responses from a control bird and a wn-reared bird to a subset of the selectivity stimuli.  ... 
doi:10.1371/journal.pone.0061417 pmid:23630587 pmcid:PMC3632581 fatcat:us66ubpivzaadim6cgtw7miuke

A Framework to Enhance Generalization of Deep Metric Learning methods using General Discriminative Feature Learning and Class Adversarial Neural Networks [article]

Karrar Al-Kaabi, Reza Monsefi, Davood Zabihzadeh
2021 arXiv   pre-print
and employing a class adversarial neural network.  ...  To learn a more general representation, we propose to employ feature maps of intermediate layers in a deep neural network and enhance their discrimination power through an attention mechanism.  ...  Acknowledgment We would like to acknowledge the Machine Learning Lab in the Engineering Faculty of FUM for their kind and technical support.  ... 
arXiv:2106.06420v1 fatcat:z54jt6itkfgtzpmaofyn2unmzu

Overview of LifeCLEF 2018: A Large-Scale Evaluation of Species Identification and Recommendation Algorithms in the Era of AI [chapter]

Alexis Joly, Hervé Goëau, Christophe Botella, Hervé Glotin, Pierre Bonnet, Willem-Pier Vellinga, Robert Planqué, Henning Müller
2018 Lecture Notes in Computer Science  
We believe this is the beginning of a new integrative approach to environmental modelling, involving multi-task deep learning models trained on very big multi-modal datasets.  ...  The baseline package offers a tools and a workflow to assist the participants in the development of their system: spectrograms extraction, deep neural network training, audio classification task, local  ... 
doi:10.1007/978-3-319-98932-7_24 fatcat:sszdqtnh3zctra4rzdppe4iyhm

Multimodal One-shot Learning of Speech and Images

Ryan Eloff, Herman A. Engelbrecht, Herman Kamper
2019 ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
This model outperforms our other approaches on our most difficult benchmark with a cross-modal matching accuracy of 40.3% for 10-way 5-shot learning.  ...  We hope to achieve this by defining a standard problem setting with tasks which may be used to benchmark other approaches. ii Stellenbosch University https://scholar.sun.ac.za Tesis: MEng (E&E) Maart 2020  ...  A Siamese neural network consists of two identical neural network branches with shared parameters-a set of twin networks, hence the name "Siamese".  ... 
doi:10.1109/icassp.2019.8683587 dblp:conf/icassp/EloffEK19 fatcat:47yfbmhsg5bbbdeiivcglj3vtu

An Ensemble of Convolutional Neural Networks for Audio Classification [article]

Loris Nanni, Gianluca Maguolo, Sheryl Brahnam, Michelangelo Paci
2021 arXiv   pre-print
In this paper, ensembles of classifiers that exploit several data augmentation techniques and four signal representations for training Convolutional Neural Networks (CNNs) for audio classification are  ...  The approach proposed here obtains state-of-the-art results in the widely used ESC-50 dataset.  ...  Their system, which trained deep learners on each modality, was shown to outperform the best unimodal methods.  ... 
arXiv:2007.07966v2 fatcat:bq37jv3qsrbhtkyk44i7bnwzfm

LifeCLEF 2015: Multimedia Life Species Identification Challenges [chapter]

Alexis Joly, Hervé Goëau, Hervé Glotin, Concetto Spampinato, Pierre Bonnet, Willem-Pier Vellinga, Robert Planqué, Andreas Rauber, Simone Palazzo, Bob Fisher, Henning Müller
2015 Lecture Notes in Computer Science  
As a final comment on this evaluation study, it is worth noting that none of the participants attempted to evaluate deep learning approaches such as using deep convolutional neural networks (CNN) that  ...  Convolutional Neural Network.  ... 
doi:10.1007/978-3-319-24027-5_46 fatcat:lq6ug6mrhbh3lpatr34mvuyxwy

Motion-Attentive Transition for Zero-Shot Video Object Segmentation [article]

Tianfei Zhou, Shunzhou Wang, Yi Zhou, Yazhou Yao, Jianwu Li, Ling Shao
2020 arXiv   pre-print
Additionally, a bridge network is proposed to obtain a compact, discriminative and scale-sensitive representation for multi-level encoder features, which is further fed into a decoder to achieve segmentation  ...  In this paper, we present a novel Motion-Attentive Transition Network (MATNet) for zero-shot video object segmentation, which provides a new way of leveraging motion information to reinforce spatio-temporal  ...  As illustrated in Fig. 1 , MATNet is an end-to-end deep neural network for ZVOS, consisting of three concatenated networks, i.e. an interleaved encoder, a bridge network and a decoder.  ... 
arXiv:2003.04253v3 fatcat:brqwhydguvg2dmp2fqxnlsyr6m

Motion-Attentive Transition for Zero-Shot Video Object Segmentation

Tianfei Zhou, Shunzhou Wang, Yi Zhou, Yazhou Yao, Jianwu Li, Ling Shao
2020 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
Additionally, a bridge network is proposed to obtain a compact, discriminative and scale-sensitive representation for multi-level encoder features, which is further fed into a decoder to achieve segmentation  ...  In this paper, we present a novel Motion-Attentive Transition Network (MATNet) for zero-shot video object segmentation, which provides a new way of leveraging motion information to reinforce spatio-temporal  ...  Proposed Method Network Overview As illustrated in Fig. 1 , MATNet is an end-to-end deep neural network for ZVOS, consisting of three concatenated networks, i.e. an interleaved encoder, a bridge network  ... 
doi:10.1609/aaai.v34i07.7008 fatcat:j2v6foceerdkhorx25ptg2pkfa

Deep Image Synthesis from Intuitive User Input: A Review and Perspectives [article]

Yuan Xue, Yuan-Chen Guo, Han Zhang, Tao Xu, Song-Hai Zhang, Xiaolei Huang
2021 arXiv   pre-print
In many applications of computer graphics, art and design, it is desirable for a user to provide intuitive non-image input, such as text, sketch, stroke, graph or layout, and have a computer system automatically  ...  While classic works that allow such automatic image content generation have followed a framework of image retrieval and composition, recent advances in deep generative models such as generative adversarial  ...  Deep Learning based Approaches. In recent years, deep convolutional neural networks (CNNs) have achieved significant progress in image-related tasks.  ... 
arXiv:2107.04240v2 fatcat:ticrsi27nzhozmw7dp7wwja2ni

Deep image synthesis from intuitive user input: A review and perspectives

Yuan Xue, Yuan-Chen Guo, Han Zhang, Tao Xu, Song-Hai Zhang, Xiaolei Huang
2021 Computational Visual Media  
networks (GANs), variational autoencoders (VAEs), and flow-based methods have enabled more powerful and versatile image generation approaches.  ...  AbstractIn many applications of computer graphics, art, and design, it is desirable for a user to provide intuitive non-image input, such as text, sketch, stroke, graph, or layout, and have a computer  ...  Zhang were supported by the National Natural Science Foundation of China (Project Nos. 61521002 and 61772298), a Research Grant of Beijing Higher Institution Engineering Research Center, and the Tsinghua-Tencent  ... 
doi:10.1007/s41095-021-0234-8 fatcat:ot6dyrrrsnakxob4jzw4zld7zu
« Previous Showing results 1 — 15 out of 831 results