A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
Fine-Grained Image Analysis with Deep Learning: A Survey
[article]
2021
arXiv
pre-print
., species of birds or models of cars. The small inter-class and large intra-class variation inherent to fine-grained image analysis makes it a challenging problem. ...
Capitalizing on advances in deep learning, in recent years we have witnessed remarkable progress in deep learning powered FGIA. ...
ACKNOWLEDGMENTS The authors would like to thank the editor and the anonymous reviewers for their constructive comments. ...
arXiv:2111.06119v2
fatcat:ninawxsjtnf4lndtqquuwl3weq
Unsupervised Cross-Modal Audio Representation Learning from Unstructured Multilingual Text
[article]
2020
arXiv
pre-print
Based on a triplet neural network architecture, we harnesses semantically related cross-modal information to estimate audio track-relatedness. ...
This LSI topic modelling facilitates fine-grained selection of similar and dissimilar audio-track pairs to learn the audio representation using a Convolution Recurrent Neural Network (CRNN). ...
Representation learning using Deep Neural Networks (DNN) has been actively explored in recent years [27, 28] as an alternative to feature engineering. ...
arXiv:2003.12265v1
fatcat:s7hw33hhk5ho3jsvpgqhi4hfvm
Learning Neural Textual Representations for Citation Recommendation
2021
2020 25th International Conference on Pattern Recognition (ICPR)
Yap, Deep Yap; Chiang,
Cheng-Ming
486
Answer-Checking in Context: A Multi-Modal Fully Attention
Network for Visual Question Answering
DAY 2 -Jan 13, 2021
Song, Hang; Song, Yonghong;
Zhang, Yuanlin ...
for Musical
Onset Detection
DAY 2 -Jan 13, 2021
Hou, Zejiang; Kung, SY
2636
A Discriminant Information Approach to Deep Neural Network
Pruning
DAY 2 -Jan 13, 2021
Lee, Wei-Han; Millman, Steve ...
doi:10.1109/icpr48806.2021.9412725
fatcat:3vge2tpd2zf7jcv5btcixnaikm
Toward a Computational Neuroethology of Vocal Communication: From Bioacoustics to Neurophysiology, Emerging Tools and Future Directions
2021
Frontiers in Behavioral Neuroscience
This review describes emerging techniques that can be applied to acoustic and vocal communication signals with the goal of enabling study beyond a small number of model species. ...
Along with a discussion of recent advances and techniques, we include challenges and broader goals in establishing a framework for the computational neuroethology of vocal communication. ...
Both authors contributed to the article and approved the submitted version. ...
doi:10.3389/fnbeh.2021.811737
pmid:34987365
pmcid:PMC8721140
fatcat:7f6kdcuuxneapcuxfrqbvzdcfi
Multimodal Image Synthesis and Editing: A Survey
[article]
2021
arXiv
pre-print
We start with an introduction to different types of guidance modalities in image synthesis and editing. ...
vision and deep learning research. ...
new paradigm for multi-
CUB-200 Birds, COCO) are collected from the related litera- modal image synthesis and editing. ...
arXiv:2112.13592v1
fatcat:hxkfyxbtbfgltju323os3xompe
Selective and Efficient Neural Coding of Communication Signals Depends on Early Acoustic and Social Environment
2013
PLoS ONE
and for shaping neural spiking precision in superficial and deep cortical laminae, and for creating efficient neural representations of song and a less redundant ensemble code in all the laminae. ...
Here we examined the effects of noise-rearing and social isolation on the neural processing of communication sounds such as species-specific song, in the primary auditory cortex analog of adult zebra finches ...
Example neural responses from a control bird and a wn-reared bird to a subset of the selectivity stimuli. ...
doi:10.1371/journal.pone.0061417
pmid:23630587
pmcid:PMC3632581
fatcat:us66ubpivzaadim6cgtw7miuke
A Framework to Enhance Generalization of Deep Metric Learning methods using General Discriminative Feature Learning and Class Adversarial Neural Networks
[article]
2021
arXiv
pre-print
and employing a class adversarial neural network. ...
To learn a more general representation, we propose to employ feature maps of intermediate layers in a deep neural network and enhance their discrimination power through an attention mechanism. ...
Acknowledgment We would like to acknowledge the Machine Learning Lab in the Engineering Faculty of FUM for their kind and technical support. ...
arXiv:2106.06420v1
fatcat:z54jt6itkfgtzpmaofyn2unmzu
Overview of LifeCLEF 2018: A Large-Scale Evaluation of Species Identification and Recommendation Algorithms in the Era of AI
[chapter]
2018
Lecture Notes in Computer Science
We believe this is the beginning of a new integrative approach to environmental modelling, involving multi-task deep learning models trained on very big multi-modal datasets. ...
The baseline package offers a tools and a workflow to assist the participants in the development of their system: spectrograms extraction, deep neural network training, audio classification task, local ...
doi:10.1007/978-3-319-98932-7_24
fatcat:sszdqtnh3zctra4rzdppe4iyhm
Multimodal One-shot Learning of Speech and Images
2019
ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
This model outperforms our other approaches on our most difficult benchmark with a cross-modal matching accuracy of 40.3% for 10-way 5-shot learning. ...
We hope to achieve this by defining a standard problem setting with tasks which may be used to benchmark other approaches. ii Stellenbosch University https://scholar.sun.ac.za Tesis: MEng (E&E) Maart 2020 ...
A Siamese neural network consists of two identical neural network branches with shared parameters-a set of twin networks, hence the name "Siamese". ...
doi:10.1109/icassp.2019.8683587
dblp:conf/icassp/EloffEK19
fatcat:47yfbmhsg5bbbdeiivcglj3vtu
An Ensemble of Convolutional Neural Networks for Audio Classification
[article]
2021
arXiv
pre-print
In this paper, ensembles of classifiers that exploit several data augmentation techniques and four signal representations for training Convolutional Neural Networks (CNNs) for audio classification are ...
The approach proposed here obtains state-of-the-art results in the widely used ESC-50 dataset. ...
Their system, which trained deep learners on each modality, was shown to outperform the best unimodal methods. ...
arXiv:2007.07966v2
fatcat:bq37jv3qsrbhtkyk44i7bnwzfm
LifeCLEF 2015: Multimedia Life Species Identification Challenges
[chapter]
2015
Lecture Notes in Computer Science
As a final comment on this evaluation study, it is worth noting that none of the participants attempted to evaluate deep learning approaches such as using deep convolutional neural networks (CNN) that ...
Convolutional Neural Network. ...
doi:10.1007/978-3-319-24027-5_46
fatcat:lq6ug6mrhbh3lpatr34mvuyxwy
Motion-Attentive Transition for Zero-Shot Video Object Segmentation
[article]
2020
arXiv
pre-print
Additionally, a bridge network is proposed to obtain a compact, discriminative and scale-sensitive representation for multi-level encoder features, which is further fed into a decoder to achieve segmentation ...
In this paper, we present a novel Motion-Attentive Transition Network (MATNet) for zero-shot video object segmentation, which provides a new way of leveraging motion information to reinforce spatio-temporal ...
As illustrated in Fig. 1 , MATNet is an end-to-end deep neural network for ZVOS, consisting of three concatenated networks, i.e. an interleaved encoder, a bridge network and a decoder. ...
arXiv:2003.04253v3
fatcat:brqwhydguvg2dmp2fqxnlsyr6m
Motion-Attentive Transition for Zero-Shot Video Object Segmentation
2020
PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE
Additionally, a bridge network is proposed to obtain a compact, discriminative and scale-sensitive representation for multi-level encoder features, which is further fed into a decoder to achieve segmentation ...
In this paper, we present a novel Motion-Attentive Transition Network (MATNet) for zero-shot video object segmentation, which provides a new way of leveraging motion information to reinforce spatio-temporal ...
Proposed Method Network Overview As illustrated in Fig. 1 , MATNet is an end-to-end deep neural network for ZVOS, consisting of three concatenated networks, i.e. an interleaved encoder, a bridge network ...
doi:10.1609/aaai.v34i07.7008
fatcat:j2v6foceerdkhorx25ptg2pkfa
Deep Image Synthesis from Intuitive User Input: A Review and Perspectives
[article]
2021
arXiv
pre-print
In many applications of computer graphics, art and design, it is desirable for a user to provide intuitive non-image input, such as text, sketch, stroke, graph or layout, and have a computer system automatically ...
While classic works that allow such automatic image content generation have followed a framework of image retrieval and composition, recent advances in deep generative models such as generative adversarial ...
Deep Learning based Approaches. In recent years, deep convolutional neural networks (CNNs) have achieved significant progress in image-related tasks. ...
arXiv:2107.04240v2
fatcat:ticrsi27nzhozmw7dp7wwja2ni
Deep image synthesis from intuitive user input: A review and perspectives
2021
Computational Visual Media
networks (GANs), variational autoencoders (VAEs), and flow-based methods have enabled more powerful and versatile image generation approaches. ...
AbstractIn many applications of computer graphics, art, and design, it is desirable for a user to provide intuitive non-image input, such as text, sketch, stroke, graph, or layout, and have a computer ...
Zhang were supported by the National Natural Science Foundation of China (Project Nos. 61521002 and 61772298), a Research Grant of Beijing Higher Institution Engineering Research Center, and the Tsinghua-Tencent ...
doi:10.1007/s41095-021-0234-8
fatcat:ot6dyrrrsnakxob4jzw4zld7zu
« Previous
Showing results 1 — 15 out of 831 results