4 Hits in 5.5 sec

Layered Hypernetwork Models for Cross-Modal Associative Text and Image Keyword Generation in Multimodal Information Retrieval [chapter]

Jung-Woo Ha, Byoung-Hee Kim, Bado Lee, Byoung-Tak Zhang
2010 Lecture Notes in Computer Science  
Here, we propose a novel text and image keyword generation method by cross-modal associative learning and inference with multimodal queries.  ...  Conventional methods for multimodal data retrieval use text-tag based or cross-modal approaches such as tag-image co-occurrence and canonical correlation analysis.  ...  Cross-Modal Inference for Image and Text Keyword Generation Trained LHNs can generate both text terms and visual words with given multimodal queries by cross-modal associative inference.  ... 
doi:10.1007/978-3-642-15246-7_10 fatcat:yvileqmhfnf6jhmvjwfuzzgmsm

MultiBench: Multiscale Benchmarks for Multimodal Representation Learning [article]

Paul Pu Liang, Yiwei Lyu, Xiang Fan, Zetian Wu, Yun Cheng, Jason Wu, Leslie Chen, Peter Wu, Michelle A. Lee, Yuke Zhu, Ruslan Salakhutdinov, Louis-Philippe Morency
2021 arXiv   pre-print
Therefore, MultiBench presents a milestone in unifying disjoint efforts in multimodal research and paves the way towards a better understanding of the capabilities and limitations of multimodal models,  ...  Unfortunately, multimodal research has seen limited resources to study (1) generalization across domains and modalities, (2) complexity during training and inference, and (3) robustness to noisy and missing  ...  Multimodal models outperform unimodal models when it comes to robustness (and initial performance). This is especially true for imperfections in the image modality.  ... 
arXiv:2107.07502v2 fatcat:ls47dr7lpfhkbfry4r6dtqjtua

Neural Fields in Visual Computing and Beyond [article]

Yiheng Xie, Towaki Takikawa, Shunsuke Saito, Or Litany, Shiqin Yan, Numair Khan, Federico Tombari, James Tompkin, Vincent Sitzmann, Srinath Sridhar
2022 arXiv   pre-print
These methods, which we call neural fields, have seen successful application in the synthesis of 3D shapes and image, animation of human bodies, 3D reconstruction, and pose estimation.  ...  In Part I, we focus on techniques in neural fields by identifying common components of neural field methods, including different representations, architectures, forward mapping, and generalization methods  ...  We thank Sunny Li for their help in designing the website, Jayden Yi for a conceptual readthrough, and Alexander Rush and Hendrik Strobelt for the Mini-Conf project.  ... 
arXiv:2111.11426v4 fatcat:yteqzbu6gvgdzobnfzuqohix2e

Beyond AlphaGo 2016 International Symposium on Perception, Action, and Cognitive Systems PACS2016

Date & Location
Grand Hall(5F), aT Center   unpublished
The model successfully generated plausible images from sentences and it generalized dual representation of texts and images. IV.  ...  CONCLUSION The model generalized dual representation and connected both modality to generate images from sentences. Yet output image is not clear enough and the connection is ambiguous.  ...  Keywords-chatbot; conversational model; seq2seq; In this experiment, children asked unsuspected questions to our model.  ...