26 Hits in 0.77 sec

Integrating Image Captioning with Rule-based Entity Masking [article]

Aditya Mogadala and Xiaoyu Shen and Dietrich Klakow
2020 arXiv   pre-print
Given an image, generating its natural language description (i.e., caption) is a well studied problem. Approaches proposed to address this problem usually rely on image features that are difficult to interpret. Particularly, these image features are subdivided into global and local features, where global features are extracted from the global representation of the image, while local features are extracted from the objects detected locally in an image. Although, local features extract rich
more » ... information from the image, existing models generate captions in a blackbox manner and humans have difficulty interpreting which local objects the caption is aimed to represent. Hence in this paper, we propose a novel framework for the image captioning with an explicit object (e.g., knowledge graph entity) selection process while still maintaining its end-to-end training ability. The model first explicitly selects which local entities to include in the caption according to a human-interpretable mask, then generate proper captions by attending to selected entities. Experiments conducted on the MSCOCO dataset demonstrate that our method achieves good performance in terms of the caption quality and diversity with a more interpretable generating process than previous counterparts.
arXiv:2007.11690v1 fatcat:eyfv6fifdvgqpfp2cdsls7464q

Describing Natural Images Containing Novel Objects with Knowledge Guided Assitance [article]

Aditya Mogadala, Umanga Bista, Lexing Xie, Achim Rettinger
2017 arXiv   pre-print
Images in the wild encapsulate rich knowledge about varied abstract concepts and cannot be sufficiently described with models built only using image-caption pairs containing selected objects. We propose to handle such a task with the guidance of a knowledge base that incorporate many abstract concepts. Our method is a two-step process where we first build a multi-entity-label image recognition model to predict abstract concepts as image labels and then leverage them in the second step as an
more » ... rnal semantic attention and constrained inference in the caption generation model for describing images that depict unseen/novel objects. Evaluations show that our models outperform most of the prior work for out-of-domain captioning on MSCOCO and are useful for integration of knowledge and vision in general.
arXiv:1710.06303v1 fatcat:mu6zbevjbvd2jfl6sjd6yqmisy

Fusion Models for Improved Visual Captioning [article]

Marimuthu Kalimuthu, Aditya Mogadala, Marius Mosbach, Dietrich Klakow
2020 arXiv   pre-print
Visual captioning aims to generate textual descriptions given images or videos. Traditionally, image captioning models are trained on human annotated datasets such as Flickr30k and MS-COCO, which are limited in size and diversity. This limitation hinders the generalization capabilities of these models while also rendering them liable to making mistakes. Language models can, however, be trained on vast amounts of freely available unlabelled data and have recently emerged as successful language
more » ... coders and coherent text generators. Meanwhile, several unimodal and multimodal fusion techniques have been proven to work well for natural language generation and automatic speech recognition. Building on these recent developments, and with the aim of improving the quality of generated captions, the contribution of our work in this paper is two-fold: First, we propose a generic multimodal model fusion framework for caption generation as well as emendation where we utilize different fusion strategies to integrate a pretrained Auxiliary Language Model (AuxLM) within the traditional encoder-decoder visual captioning frameworks. Next, we employ the same fusion strategies to integrate a pretrained Masked Language Model (MLM), namely BERT, with a visual captioning model, viz. Show, Attend, and Tell, for emending both syntactic and semantic errors in captions. Our caption emendation experiments on three benchmark image captioning datasets, viz. Flickr8k, Flickr30k, and MSCOCO, show improvements over the baseline, indicating the usefulness of our proposed multimodal fusion strategies. Further, we perform a preliminary qualitative analysis on the emended captions and identify error categories based on the type of corrections.
arXiv:2010.15251v2 fatcat:xs4qgzicyfdyzkotqi6bfndlu4

Linking Tweets with Monolingual and Cross-Lingual News using Transformed Word Embeddings [article]

Aditya Mogadala, Dominik Jung, Achim Rettinger
2017 arXiv   pre-print
Social media platforms have grown into an important medium to spread information about an event published by the traditional media, such as news articles. Grouping such diverse sources of information that discuss the same topic in varied perspectives provide new insights. But the gap in word usage between informal social media content such as tweets and diligently written content (e.g. news articles) make such assembling difficult. In this paper, we propose a transformation framework to bridge
more » ... he word usage gap between tweets and online news articles across languages by leveraging their word embeddings. Using our framework, word embeddings extracted from tweets and news articles are aligned closer to each other across languages, thus facilitating the identification of similarity between news articles and tweets. Experimental results show a notable improvement over baselines for monolingual tweets and news articles comparison, while new findings are reported for cross-lingual comparison.
arXiv:1710.09137v1 fatcat:632evogk3zdonc2kpdsbofzxpm

Image Manipulation with Natural Language using Two-sidedAttentive Conditional Generative Adversarial Network [article]

Dawei Zhu, Aditya Mogadala, Dietrich Klakow
2019 arXiv   pre-print
Acknowledgements Aditya Mogadala is supported by the German Research Foundation (DFG) as part of SFB1102. A.  ...  There also exist some more variations of image manipulation, more details can be referred from the recent surveys Mogadala, Kalimuthu and Klakow (2019) .  ... 
arXiv:1912.07478v1 fatcat:kwk6wafaanfstafp27ozzszi64

Author Profiling using LDA and Maximum Entropy Notebook for PAN at CLEF 2013

Aditya Pavan, Aditya Mogadala, Vasudeva Varma
2013 Conference and Labs of the Evaluation Forum  
This paper describes the traditional authorship attribution subtask of the PAN/CLEF 2013 workshop. In our attempt to classify the documents based on gender and age of an author, we have applied a traditional approach of topic modeling using Latent Dirichlet Allocation [LDA]. We used the content based features like topics and style based features like preposition-frequencies, which act as the efficient markers to demarcate the authorship attributes based on age and gender. We demonstrated
more » ... cross validation and observed that our classification approach using Maxent and LDA gave an accuracy of 53.3% for English language and 52% for Spanish Language.
dblp:conf/clef/PavanMV13 fatcat:xg4dmxsdgbgcperybcl4izotau

Sparse Graph to Sequence Learning for Vision Conditioned Long Textual Sequence Generation [article]

Aditya Mogadala and Marius Mosbach and Dietrich Klakow
2020 arXiv   pre-print
Correspondence to: Aditya Mogadala <>.  ... 
arXiv:2007.06077v1 fatcat:a7kf5fp4xjebfms654tsg2pr3a

Finding Influence by Cross-Lingual Blog Mining through Multiple Language Lists [chapter]

Aditya Mogadala, Vasudeva Varma
2011 Communications in Computer and Information Science  
Blogs has been one of the important resources of information on the internet. Now-a-days lot of Indian language content being generated in the form of blogs. People express their opinions on various situations and events. The content in the blogs may contain named entities-names of people, places, and organizations. Named entities also contain names of eminent personalities who are famous in or out of that language community. The goal of this paper is to find the influence of a personality
more » ... cross-language bloggers. The approach we follow is to collect information from blog pages and index the named entities along with their probabilities of occurrence by removing irrelevant information from the blog. When user searches to find the influence of a personality through a query in Indian language, we use a cross language lexicon in the form of multiple language parallel lists to transliterate the query into other Indian languages and mine blogs to return the influence of the personality across Indian language bloggers. An overview of the system and preliminary results are described.
doi:10.1007/978-3-642-19403-0_9 fatcat:6uag7ibm2nblzbfdur5y4u57kq

IIIT Hyderabad in Summarization and Knowledge Base Population at TAC 2011

Vasudeva Varma, Sudheer Kovelamudi, Arpit Sood, Jayant Gupta, Harshit Jain, Pattisapu Nikhil Priyatam, Aditya Mogadala, Srikanth Reddy Vaddepally
2011 Text Analysis Conference  
In this report, we present details about the participation of IIIT Hyderabad in Guided Summarization and Knowledge Base Population tracks at TAC 2011. we have enhanced our summarization system with knowledge based measures. Wikipedia based extraction methods and topic modelling are used to score sentences in guided summarization track. For multilingual summarization task, we investigated the HAL ( Hyperspace Analogue to Language Model) where we created a semantic space from word co-occurrences.
more » ... We show that the results obtained with this unsupervised language independent method are competitive with other state-of-the-art systems. For monolingual and multilingual entity linking task, we extended our previous year's model to a light weight language independent system without utilizing any other external knowledge or resource.
dblp:conf/tac/VarmaKSGJPMR11 fatcat:yl6gzgxnk5g2bo7wknu7msjpuq

Twitter user behavior understanding with mood transition prediction

Aditya Mogadala, Vasudeva Varma
2012 Proceedings of the 2012 workshop on Data-driven user behavioral modelling and mining from social media - DUBMMSM '12  
Human moods continuously change over time. Tracking moods can provide important information about psychological and health behavior of an individual. Also, history of mood information can be used to predict the future moods of individuals. In this paper, we try to predict the mood transition of a Twitter user by regression analysis on the tweets posted over twitter time line. Initially, user tweets are automatically labeled with mood labels from time 0 to t-1. It is then used to predict user
more » ... d transition information at time t. Experiments show that SVM regression attained less root-mean-square error compared to other regression approaches for mood transition prediction.
doi:10.1145/2390131.2390145 dblp:conf/cikm/MogadalaV12 fatcat:2rapbg2ft5fvnhojklrxryev3i

Discovering Connotations as Labels for Weakly Supervised Image-Sentence Data

Aditya Mogadala, Bhargav Kanuparthi, Achim Rettinger, York Sure-Vetter
2018 Companion of the The Web Conference 2018 on The Web Conference 2018 - WWW '18  
Growth of multimodal content on the web and social media has generated abundant weakly aligned image-sentence pairs. However, it is hard to interpret them directly due to intrinsic "intension". In this paper, we aim to annotate such image-sentence pairs with connotations as labels to capture the intrinsic "intension". We achieve it with a connotation multimodal embedding model (CMEM) using a novel loss function. It's unique characteristics over previous models include: (i) the exploitation of
more » ... ltimodal data as opposed to only visual information, (ii) robustness to outlier labels in a multi-label scenario and (iii) works effectively with large-scale weakly supervised data. With extensive quantitative evaluation, we exhibit the effectiveness of CMEM for detection of multiple labels over other state-of-the-art approaches. Also, we show that in addition to annotation of image-sentence pairs with connotation labels, byproduct of our model inherently supports cross-modal retrieval i.e. image query -sentence retrieval.
doi:10.1145/3184558.3186352 dblp:conf/www/MogadalaKRS18 fatcat:7tvmbfqrv5cg3h4ccj7pahszqe

Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods

Aditya Mogadala, Marimuthu Kalimuthu, Dietrich Klakow
2021 The Journal of Artificial Intelligence Research  
., linguistic and visual information) is usually considered to be a sub-part of multimodal learning models (Mogadala, 2015) .  ...  A few approaches (Mogadala et al., 2018a; Lu et al., 2018) have transferred information both before and during inference.  ... 
doi:10.1613/jair.1.11688 fatcat:kvfdrg3bwrh35fns4z67adqp6i

Bilingual Word Embeddings from Parallel and Non-parallel Corpora for Cross-Language Text Classification

Aditya Mogadala, Achim Rettinger
2016 Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies  
In many languages, sparse availability of resources causes numerous challenges for textual analysis tasks. Text classification is one of such standard tasks that is hindered due to limited availability of label information in lowresource languages. Transferring knowledge (i.e. label information) from high-resource to low-resource languages might improve text classification as compared to the other approaches like machine translation. We introduce BRAVE (Bilingual paRAgraph VEctors), a model to
more » ... earn bilingual distributed representations (i.e. embeddings) of words without word alignments either from sentencealigned parallel or label-aligned non-parallel document corpora to support cross-language text classification. Empirical analysis shows that classification models trained with our bilingual embeddings outperforms other stateof-the-art systems on three different crosslanguage text classification tasks.
doi:10.18653/v1/n16-1083 dblp:conf/naacl/MogadalaR16 fatcat:z2qri3ovsvhwrhjkqfkw4plxhi

Retrieval approach to extract opinions about people from resource scarce language news articles

Aditya Mogadala, Vasudeva Varma
2012 Proceedings of the First International Workshop on Issues of Sentiment Discovery and Opinion Mining - WISDOM '12  
We wish to address the challenging task of opinion mining about organizations, people and places from different languages. It is known that resources and tools for mining opinions are scarce. In our study, we leverage comparable news articles collection to retrieve opinions about people (opinion targets) in resource scarce language like Hindi. Opinions expressed about opinion targets (Named Entities)given by adjectives and verbs known as opinion words are extracted from English collection of
more » ... parable corpora to get transliterated and translated to resource scare languages. Transformed opinion words are then used to create subjective language model (SLM) and structured opinion queries (OQs) using inference network (IN) for retrieval to confirm the opinion about opinion targets in documents. Experiments have shown that OQs and SLM with IN framework are effective for opinion mining tasks in minimal resource languages when compared to other retrieval approaches.
doi:10.1145/2346676.2346680 fatcat:g6gzv47ipnd2ndq4q3g2ykm5em

Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods [article]

Aditya Mogadala and Marimuthu Kalimuthu and Dietrich Klakow
2020 arXiv   pre-print
A few approaches (Mogadala et al., 2018a; Lu et al., 2018) have transferred information both before and during inference.  ...  Additionally, Aditya et al. (2019) used spatial knowledge to aid visual reasoning. Their framework combined knowledge distillation, relational reasoning, and probabilistic logical languages.  ... 
arXiv:1907.09358v2 fatcat:4fyf6kscy5dfbewll3zs7yzsuq
« Previous Showing results 1 — 15 out of 26 results