Filters








2,111 Hits in 3.9 sec

Cross-Modal Retrieval in the Cooking Context: Learning Semantic Text-Image Embeddings [article]

Micael Carvalho, Rémi Cadène, David Picard, Laure Soulier, Nicolas Thome, Matthieu Cord
2018 arXiv   pre-print
In this paper, we propose a cross-modal retrieval model aligning visual and textual data (like pictures of dishes and their recipes) in a shared representation space.  ...  Designing powerful tools that support cooking activities has rapidly gained popularity due to the massive amounts of available data, as well as recent advances in machine learning that are capable of analyzing  ...  within the Investissements d'Avenir program under reference ANR-11-LABX-65.  ... 
arXiv:1804.11146v1 fatcat:kihkqzsbqbebdkcuvcqdrq34ie

Cross-Modal Retrieval in the Cooking Context

Micael Carvalho, Rémi Cadène, David Picard, Laure Soulier, Nicolas Thome, Matthieu Cord
2018 The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval - SIGIR '18  
In this paper, we propose a cross-modal retrieval model aligning visual and textual data (like pictures of dishes and their recipes) in a shared representation space.  ...  Designing powerful tools that support cooking activities has rapidly gained popularity due to the massive amounts of available data, as well as recent advances in machine learning that are capable of analyzing  ...  within the Investissements d'Avenir program under reference ANR-11-LABX-65.  ... 
doi:10.1145/3209978.3210036 dblp:conf/sigir/CarvalhoCPSTC18 fatcat:lue266vpufhpveg6shnbgw3lfm

TextTopicNet - Self-Supervised Learning of Visual Features Through Embedding Images on Semantic Text Spaces [article]

Yash Patel, Lluis Gomez, Raul Gomez, Marçal Rusiñol, Dimosthenis Karatzas, C.V. Jawahar
2018 arXiv   pre-print
We show that adequate visual features can be learned efficiently by training a CNN to predict the semantic textual context in which a particular image is more probable to appear as an illustration.  ...  Our experiments demonstrate state-of-the-art performance in image classification, object detection, and multi-modal retrieval compared to recent self-supervised or naturally-supervised approaches.  ...  We gratefully acknowledge the support of the NVIDIA Corporation with the donation of the Titan X Pascal GPU used for this research.  ... 
arXiv:1807.02110v1 fatcat:3qe3xgsuzfem5j5doiak5bexeq

Images and Recipes: Retrieval in the Cooking Context

Micael Carvalho, Remi Cadene, David Picard, Laure Soulier, Matthieu Cord
2018 2018 IEEE 34th International Conference on Data Engineering Workshops (ICDEW)  
Recent advances in the machine learning community allowed different use cases to emerge, as its association to domains like cooking which created the computational cuisine.  ...  In this paper, we tackle the picture-recipe alignment problem, having as target application the large-scale retrieval task (finding a recipe given a picture, and vice versa).  ...  In this paper, we are interested in smart retrieval between recipe component modalities (namely recipe texts and cooked dish pictures) in the cooking context.  ... 
doi:10.1109/icdew.2018.00035 dblp:conf/icde/CarvalhoCPSC18 fatcat:43smvsvqzvdj7dwd7susx7z6wm

Food recognition and recipe analysis: integrating visual content, context and external knowledge [article]

Luis Herranz, Weiqing Min, Shuqiang Jiang
2018 arXiv   pre-print
the restaurant context as emerging directions.  ...  as the exploration and retrieval of food-related information.  ...  Cross-modal recipe modeling and retrieval Modeling the cross-modal correlation between recipes and images has multiple applications in recognition and retrieval.  ... 
arXiv:1801.07239v1 fatcat:kbcpto5iznhkddvdklwxxbtehm

Efficient Deep Feature Calibration for Cross-Modal Joint Embedding Learning [article]

Zhongwei Xie, Ling Liu, Lin Li, Luo Zhong
2021 arXiv   pre-print
This paper introduces a two-phase deep feature calibration framework for efficient learning of semantics enhanced text-image cross-modal joint embedding, which clearly separates the deep feature calibration  ...  We leverage wideResNet50 to extract and encode the image category semantics to help semantic alignment of the learned recipe and image embeddings in the joint latent space.  ...  in terms of both image-to-recipe and recipe-to-image cross-modal retrieval performance.  ... 
arXiv:2108.00705v1 fatcat:cggupnupfbehbfhzxdlcx3hp4m

Learning TFIDF Enhanced Joint Embedding for Recipe-Image Cross-Modal Retrieval Service

Zhongwei Xie, Ling Liu, Yanzhao Wu, Lin Li, Luo Zhong
2021 IEEE Transactions on Services Computing  
We present a Multi-modal Semantics enhanced Joint Embedding approach (MSJE) for learning a common feature space between the two modalities (text and image), with the ultimate goal of providing high-performance  ...  cross-modal retrieval services.  ...  The first author Zhongwei Xie has performed this work as a two-year visiting PhD student at Georgia Institute of Technology (2019-2021, under the support from China Scholarship Council (CSC) and Wuhan  ... 
doi:10.1109/tsc.2021.3098834 fatcat:p6qstgiejbe53p7gnyl2mrfxce

Variational Recurrent Sequence-to-Sequence Retrieval for Stepwise Illustration [chapter]

Vishwash Batra, Aparajita Haldar, Yulan He, Hakan Ferhatosmanoglu, George Vogiatzis, Tanaya Guha
2020 Lecture Notes in Computer Science  
Unlike most cross-modal methods, we generate an image vector corresponding to the latent topic obtained from combining the text semantics and context.  ...  This new task extends the traditional cross-modal retrieval, where each image-text pair is treated independently ignoring broader context.  ...  Related Work Our work is related to: cross-modal retrieval, story picturing, variational recurrent neural networks, and cooking recipe datasets. Cross-Modal Retrieval.  ... 
doi:10.1007/978-3-030-45439-5_4 fatcat:bjd23a7kfnednokmphtjpg6ttm

R²GAN: Cross-Modal Recipe Retrieval With Generative Adversarial Network

Bin Zhu, Chong-Wah Ngo, Jingjing Chen, Yanbin Hao
2019 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)  
The motivation of using GAN is twofold: learning compatible cross-modal features in an adversarial way, and explanation of search results by showing the images generated from recipes.  ...  Furthermore, empowered by the generated images, a two-level ranking loss in both embedding and image spaces are considered.  ...  Acknowledgement The work described in this paper was fully supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (CityU 11203517).  ... 
doi:10.1109/cvpr.2019.01174 dblp:conf/cvpr/ZhuNCH19 fatcat:o2h5oqmohzcpdd2plfimfj5mxy

MCEN: Bridging Cross-Modal Gap between Cooking Recipes and Dish Images with Latent Variable Model [article]

Han Fu, Rui Wu, Chenghao Liu, Jianling Sun
2020 arXiv   pre-print
In this paper, we focus on the task of cross-modal retrieval between food images and cooking recipes.  ...  We present Modality-Consistent Embedding Network (MCEN) that learns modality-invariant representations by projecting images and texts to the same embedding space.  ...  Acknowledge We would like to thank the reviewers for their detailed comments and constructive suggestions.  ... 
arXiv:2004.01095v1 fatcat:bgxe4ogkobeqth5uldyettrwiq

Cross-Modal Food Retrieval: Learning a Joint Embedding of Food Images and Recipes with Semantic Consistency and Attention Mechanism [article]

Hao Wang, Doyen Sahoo, Chenghao Liu, Ke Shu, Palakorn Achananuparp, Ee-peng Lim, Steven C. H. Hoi
2021 arXiv   pre-print
In this paper, we investigate cross-modal retrieval between food images and cooking recipes.  ...  The goal is to learn an embedding of images and recipes in a common feature space, such that the corresponding image-recipe embeddings lie close to one another.  ...  Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not reflect the views of National Research Foundation, Singapore.  ... 
arXiv:2003.03955v3 fatcat:aqy7vykr5favzdfhhamkz3k6wa

Learning Joint Embedding with Modality Alignments for Cross-Modal Retrieval of Recipes and Food Images [article]

Zhongwei Xie, Ling Liu, Lin Li, Luo Zhong
2021 arXiv   pre-print
This paper presents a three-tier modality alignment approach to learning text-image joint embedding, coined as JEMA, for cross-modal retrieval of cooking recipes and food images.  ...  The third modality alignment incorporates two types of cross-modality alignments as the auxiliary loss regularizations to further reduce the alignment errors in the joint learning of the two modality-specific  ...  The first author has performed this work as a two-year visiting PhD student at Georgia Institute of Technology (2019-2021), under the support from China Scholarship Council (CSC) and Wuhan University of  ... 
arXiv:2108.03788v1 fatcat:6vi5ileyq5cidk2pimegn4clfq

Learning Cross-Modal Embeddings for Cooking Recipes and Food Images

Amaia Salvador, Nicholas Hynes, Yusuf Aytar, Javier Marin, Ferda Ofli, Ingmar Weber, Antonio Torralba
2017 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)  
We postulate that these embeddings will provide a basis for further exploration of the Recipe1M dataset and food and cooking in general.  ...  In this paper, we introduce Recipe1M, a new large-scale, structured corpus of over 1m cooking recipes and 800k food images.  ...  and the European Regional Development Fund (ERDF).  ... 
doi:10.1109/cvpr.2017.327 dblp:conf/cvpr/SalvadorHAMOW017 fatcat:dganhxaqebdrhfnvngdziivxe4

Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning [article]

Amaia Salvador, Erhan Gundogdu, Loris Bazzani, Michael Donoser
2021 arXiv   pre-print
Cross-modal recipe retrieval has recently gained substantial attention due to the importance of food in people's lives, as well as the availability of vast amounts of digital cooking recipes and food images  ...  In this work, we revisit existing approaches for cross-modal recipe retrieval and propose a simplified end-to-end model based on well established and high performing encoders for text and images.  ...  Cross-Modal Recipe Retrieval Learning cross-modal embeddings for images and text is currently an active research area [19, 15, 18] .  ... 
arXiv:2103.13061v1 fatcat:smg4gd3hevgxtgg2f6swyvlt3a

Out of context: Computer systems that adapt to, and learn from, context

H. Lieberman, T. Selker
2000 IBM Systems Journal  
These operationsmay be dependenton time,place, weather,userpreferences, or the historyof interaction. In otherwords,context.But what, exactly,is context?  ...  We look at perspectivesfrom softwareagents,sensors,and embedded devices,and also contrasttraditional mathematical and formalapproaches.We see how each treatsthe problemof contextand discussthe implications  ...  The availability of semantic knowledge bases such as WordNet 27 also encourages partial understanding of context expressed in natural language text.  ... 
doi:10.1147/sj.393.0617 fatcat:3roilp7exvbk3d3nteukkdyu3u
« Previous Showing results 1 — 15 out of 2,111 results