600 Hits in 1.2 sec

"360° user profiling: past, future, and applications" by Aleksandr Farseev, Mohammad Akbari, Ivan Samborskii and Tat-Seng Chua with Martin Vesely as coordinator

Aleksandr Farseev, Mohammad Akbari, Ivan Samborskii, Tat-Seng Chua
2016 ACM SIGWEB Newsletter  
., 2009] or NUS-WIDE [Chua et al., 2009] ).  ... 
doi:10.1145/2956573.2956577 fatcat:l6eaj76gvnahdkw7wlrhmtiohi

Document Visualization using Topic Clouds [article]

Shaohua Li, Tat-Seng Chua
2017 arXiv   pre-print
Traditionally a document is visualized by a word cloud. Recently, distributed representation methods for documents have been developed, which map a document to a set of topic embeddings. Visualizing such a representation is useful to present the semantics of a document in higher granularity; it is also challenging, as there are multiple topics, each containing multiple words. We propose to visualize a set of topics using Topic Cloud, which is a pie chart consisting of topic slices, where each
more » ... lices, where each slice contains important words in this topic. To make important topics/words visually prominent, the sizes of topic slices and word fonts are proportional to their importance in the document. A topic cloud can help the user quickly evaluate the quality of derived document representations. For NLP practitioners, It can be used to qualitatively compare the topic quality of different document representation algorithms, or to inspect how model parameters impact the derived representations.
arXiv:1702.01520v1 fatcat:hxkzdvdpizflfb6t2w5dix6nou

Attributed Social Network Embedding [article]

Lizi Liao, Xiangnan He, Hanwang Zhang, Tat-Seng Chua
2017 arXiv   pre-print
Embedding network data into a low-dimensional vector space has shown promising performance for many real-world applications, such as node classification and entity retrieval. However, most existing methods focused only on leveraging network structure. For social networks, besides the network structure, there also exists rich information about social actors, such as user profiles of friendship networks and textual content of citation networks. These rich attribute information of social actors
more » ... of social actors reveal the homophily effect, exerting huge impacts on the formation of social networks. In this paper, we explore the rich evidence source of attributes in social networks to improve network embedding. We propose a generic Social Network Embedding framework (SNE), which learns representations for social actors (i.e., nodes) by preserving both the structural proximity and attribute proximity. While the structural proximity captures the global network structure, the attribute proximity accounts for the homophily effect. To justify our proposal, we conduct extensive experiments on four real-world social networks. Compared to the state-of-the-art network embedding approaches, SNE can learn more informative representations, achieving substantial gains on the tasks of link prediction and node classification. Specifically, SNE significantly outperforms node2vec with an 8.2% relative improvement on the link prediction task, and a 12.7% gain on the node classification task.
arXiv:1705.04969v1 fatcat:3hkmsphxibendcrqskaqky45um

Neural Factorization Machines for Sparse Predictive Analytics [article]

Xiangnan He, Tat-Seng Chua
2017 arXiv   pre-print
Many predictive tasks of web applications need to model categorical variables, such as user IDs and demographics like genders and occupations. To apply standard machine learning techniques, these categorical predictors are always converted to a set of binary features via one-hot encoding, making the resultant feature vector highly sparse. To learn from such sparse data effectively, it is crucial to account for the interactions between features. Factorization Machines (FMs) are a popular
more » ... e a popular solution for efficiently using the second-order feature interactions. However, FM models feature interactions in a linear way, which can be insufficient for capturing the non-linear and complex inherent structure of real-world data. While deep neural networks have recently been applied to learn non-linear feature interactions in industry, such as the Wide&Deep by Google and DeepCross by Microsoft, the deep structure meanwhile makes them difficult to train. In this paper, we propose a novel model Neural Factorization Machine (NFM) for prediction under sparse settings. NFM seamlessly combines the linearity of FM in modelling second-order feature interactions and the non-linearity of neural network in modelling higher-order feature interactions. Conceptually, NFM is more expressive than FM since FM can be seen as a special case of NFM without hidden layers. Empirical results on two regression tasks show that with one hidden layer only, NFM significantly outperforms FM with a 7.3% relative improvement. Compared to the recent deep learning methods Wide&Deep and DeepCross, our NFM uses a shallower structure but offers better performance, being much easier to train and tune in practice.
arXiv:1708.05027v1 fatcat:7owdrtpnpbhxpjpmik6gbmrtxq

Meta-Transfer Learning through Hard Tasks [article]

Qianru Sun, Yaoyao Liu, Zhaozheng Chen, Tat-Seng Chua, Bernt Schiele
2019 arXiv   pre-print
Tat-Seng Chua is the KITHCT Chair Professor at the School of Computing, National University of Singapore. He is also the distinguish Visiting Professor of Tsinghua University.  ... 
arXiv:1910.03648v1 fatcat:l2z7dowb5bclzgr2a3ofk3z2za

Neural Collaborative Filtering [article]

Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, Tat-Seng Chua
2017 arXiv   pre-print
In recent years, deep neural networks have yielded immense success on speech recognition, computer vision and natural language processing. However, the exploration of deep neural networks on recommender systems has received relatively less scrutiny. In this work, we strive to develop techniques based on neural networks to tackle the key problem in recommendation -- collaborative filtering -- on the basis of implicit feedback. Although some recent work has employed deep learning for
more » ... ng for recommendation, they primarily used it to model auxiliary information, such as textual descriptions of items and acoustic features of musics. When it comes to model the key factor in collaborative filtering -- the interaction between user and item features, they still resorted to matrix factorization and applied an inner product on the latent features of users and items. By replacing the inner product with a neural architecture that can learn an arbitrary function from data, we present a general framework named NCF, short for Neural network-based Collaborative Filtering. NCF is generic and can express and generalize matrix factorization under its framework. To supercharge NCF modelling with non-linearities, we propose to leverage a multi-layer perceptron to learn the user-item interaction function. Extensive experiments on two real-world datasets show significant improvements of our proposed NCF framework over the state-of-the-art methods. Empirical evidence shows that using deeper layers of neural networks offers better recommendation performance.
arXiv:1708.05031v2 fatcat:gam2aezz2retvlf2cqqrqv7oni

Semantic Graphs for Generating Deep Questions [article]

Liangming Pan, Yuxi Xie, Yansong Feng, Tat-Seng Chua, Min-Yen Kan
2020 arXiv   pre-print
This paper proposes the problem of Deep Question Generation (DQG), which aims to generate complex questions that require reasoning over multiple pieces of information of the input passage. In order to capture the global structure of the document and facilitate reasoning, we propose a novel framework which first constructs a semantic-level graph for the input document and then encodes the semantic graph by introducing an attention-based GGNN (Att-GGNN). Afterwards, we fuse the document-level and
more » ... document-level and graph-level representations to perform joint training of content selection and question decoding. On the HotpotQA deep-question centric dataset, our model greatly improves performance over questions requiring reasoning over multiple facts, leading to state-of-the-art performance. The code is publicly available at
arXiv:2004.12704v1 fatcat:ng7zxxvz3bh7hgluyms24hcdcm

Multi-source Domain Adaptation for Visual Sentiment Classification [article]

Chuang Lin, Sicheng Zhao, Lei Meng, Tat-Seng Chua
2020 arXiv   pre-print
Existing domain adaptation methods on visual sentiment classification typically are investigated under the single-source scenario, where the knowledge learned from a source domain of sufficient labeled data is transferred to the target domain of loosely labeled or unlabeled data. However, in practice, data from a single source domain usually have a limited volume and can hardly cover the characteristics of the target domain. In this paper, we propose a novel multi-source domain adaptation (MDA)
more » ... in adaptation (MDA) method, termed Multi-source Sentiment Generative Adversarial Network (MSGAN), for visual sentiment classification. To handle data from multiple source domains, it learns to find a unified sentiment latent space where data from both the source and target domains share a similar distribution. This is achieved via cycle consistent adversarial learning in an end-to-end manner. Extensive experiments conducted on four benchmark datasets demonstrate that MSGAN significantly outperforms the state-of-the-art MDA approaches for visual sentiment classification.
arXiv:2001.03886v1 fatcat:hthefzghcbfspg3d7dzilkekgm

Assistive tagging

Meng Wang, Bingbing Ni, Xian-Sheng Hua, Tat-Seng Chua
2012 ACM Computing Surveys  
Chua et al. [2009] have employed a filtering process to remove the tags that are out of WordNet or have too low appearance frequencies.  ... 
doi:10.1145/2333112.2333120 fatcat:cvlxcazimvdjxigdvchjinnewe

Deconfounded Video Moment Retrieval with Causal Intervention [article]

Xun Yang, Fuli Feng, Wei Ji, Meng Wang, Tat-Seng Chua
2021 arXiv   pre-print
We tackle the task of video moment retrieval (VMR), which aims to localize a specific moment in a video according to a textual query. Existing methods primarily model the matching relationship between query and moment by complex cross-modal interactions. Despite their effectiveness, current models mostly exploit dataset biases while ignoring the video content, thus leading to poor generalizability. We argue that the issue is caused by the hidden confounder in VMR, i.e., temporal location of
more » ... ral location of moments, that spuriously correlates the model input and prediction. How to design robust matching models against the temporal location biases is crucial but, as far as we know, has not been studied yet for VMR. To fill the research gap, we propose a causality-inspired VMR framework that builds structural causal model to capture the true effect of query and video content on the prediction. Specifically, we develop a Deconfounded Cross-modal Matching (DCM) method to remove the confounding effects of moment location. It first disentangles moment representation to infer the core feature of visual content, and then applies causal intervention on the disentangled multimodal input based on backdoor adjustment, which forces the model to fairly incorporate each possible location of the target into consideration. Extensive experiments clearly show that our approach can achieve significant improvement over the state-of-the-art methods in terms of both accuracy and generalization (Codes:
arXiv:2106.01534v1 fatcat:didn3mesqjfpvgkfcep3pcvz7m


Aleksandr Farseev, Ivan Samborskii, Tat-Seng Chua
2016 Proceedings of the 2016 ACM on Multimedia Conference - MM '16  
In this technical demonstration, we propose a cloud-based Big Data Platform for Social Multimedia Analytics called bBridge [9] that automatically detects and profiles meaningful user communities in a specified geographical region, followed by rich analytics on communities' multimedia streams. The system executes a community detection approach that considers the ability of social networks to complement each other during the process of latent representation learning, while the community profiling
more » ... community profiling is implemented based on the state-of-the-art multi-modal latent topic modeling and personal user profiling techniques. The stream analytics is performed via cloud-based stream analytics engine, while the multi-source data crawler deployed as a distributed cloud jobs. Overall, the bBridge platform integrates all the above techniques to serve both business and personal objectives.
doi:10.1145/2964284.2973836 dblp:conf/mm/FarseevSC16 fatcat:qp5fusg3tnhgdat4llmiu5lq3a

Interactive multimedia computing

Meng Wang, Jinhui Tang, Xian-Sheng Hua, Tat-Seng Chua
2010 Multimedia Systems  
In recent years, we have witnessed the flourish of multimedia data on the Internet. To facilitate humans in accessing and managing the explosively growing multimedia contents, extensive research efforts have been dedicated to automatic multimedia analysis and processing in the past decades, such as categorization, annotation and indexing. However, despite great advances achieved, several key difficulties still exist, such as the well-known semantic gap in multimedia modeling. It is evident from
more » ... It is evident from recent results that, without additional information resources, most of the semantic gap problems can hardly be solved automatically within the near future. On the other hand, we have witnessed the power of collective human efforts in the Web 2.0 era in providing high-quality tags and comments to large amounts of images and videos in sites such as Flickr and YouTube. In fact, a lot more can be accomplished through simple online games such as the ESP. Hence, more and more researchers believe that a possible approach to addressing the semantic gap problem is to incorporate the efforts of humans into the computational process, i.e., by combining human intelligence and automated computer processing to jointly tackle the problems in a collaborative manner. The past decade has witnessed the increase of such efforts, such as relevance feedback in content-based image retrieval, active learning in multimedia modeling, the interactive video search evaluation task in TRECVID, new search and browsing interfaces in VideoOlympics to facilitate humans' interaction, and the recent human computation efforts such as the ESP game on Google image search website. This special issue is organized with the purpose of introducing novel research work on interactive multimedia computing. Submissions have come from an open call for paper. With the assistance of dedicated referees, five papers have been selected after two rounds of rigorous reviews. These papers cover widely subtopics of interactive multimedia computing, including game-based image annotation, interactive TV, interactive cartoon synthesis, and so on. In the first paper "Adding Semantics to Image Region Annotations with the Name-It-Game", Steggink and Snoek introduce a system that accomplishes region-level image annotation with a game. It establishes a set of keywords that describe objects by exploring WordNet, and the keywords are assigned to image regions with a two-player "reveal and guess" game. They also explore WordNet to address the word ambiguity problem. In addition to introducing the system, another contribution of the paper is its review of existing manual image annotation techniques, in particular the comprehensive study of game-based annotation. In the second paper "Interactive Browsing via Diversified Visual Summarization for Image Search Results", Wang et al. introduce a scheme for the summarization and browsing of image search results. It adopts a dynamic absorbing random walk approach to summarize the image search results. The summarization is visualized on a 2D panel and users' browsing is facilitated with dynamic scale change and a browsing path tracking tool. Experiments with a set of diverse queries have demonstrated the effectiveness of the approach. The third paper, "Security and Privacy Requirements in Interactive TV", discusses the security and privacy issues in the context of interactive TV. It introduces an interactive
doi:10.1007/s00530-010-0222-9 fatcat:czw7vahn4vc2jlqkgb5nofcnye

TransNFCM: Translation-Based Neural Fashion Compatibility Modeling [article]

Xun Yang, Yunshan Ma, Lizi Liao, Meng Wang, Tat-Seng Chua
2018 arXiv   pre-print
Identifying mix-and-match relationships between fashion items is an urgent task in a fashion e-commerce recommender system. It will significantly enhance user experience and satisfaction. However, due to the challenges of inferring the rich yet complicated set of compatibility patterns in a large e-commerce corpus of fashion items, this task is still underexplored. Inspired by the recent advances in multi-relational knowledge representation learning and deep neural networks, this paper proposes
more » ... this paper proposes a novel Translation-based Neural Fashion Compatibility Modeling (TransNFCM) framework, which jointly optimizes fashion item embeddings and category-specific complementary relations in a unified space via an end-to-end learning manner. TransNFCM places items in a unified embedding space where a category-specific relation (category-comp-category) is modeled as a vector translation operating on the embeddings of compatible items from the corresponding categories. By this way, we not only capture the specific notion of compatibility conditioned on a specific pair of complementary categories, but also preserve the global notion of compatibility. We also design a deep fashion item encoder which exploits the complementary characteristic of visual and textual features to represent the fashion products. To the best of our knowledge, this is the first work that uses category-specific complementary relations to model the category-aware compatibility between items in a translation-based embedding space. Extensive experiments demonstrate the effectiveness of TransNFCM over the state-of-the-arts on two real-world datasets.
arXiv:1812.10021v1 fatcat:yh2tb6cpyzgshejdhffmmzy4ae

Neural Sparse Voxel Fields [article]

Lingjie Liu, Jiatao Gu, Kyaw Zaw Lin, Tat-Seng Chua, Christian Theobalt
2021 arXiv   pre-print
Photo-realistic free-viewpoint rendering of real-world scenes using classical computer graphics techniques is challenging, because it requires the difficult step of capturing detailed appearance and geometry models. Recent studies have demonstrated promising results by learning scene representations that implicitly encode both geometry and appearance without 3D supervision. However, existing approaches in practice often show blurry renderings caused by the limited network capacity or the
more » ... acity or the difficulty in finding accurate intersections of camera rays with the scene geometry. Synthesizing high-resolution imagery from these representations often requires time-consuming optical ray marching. In this work, we introduce Neural Sparse Voxel Fields (NSVF), a new neural scene representation for fast and high-quality free-viewpoint rendering. NSVF defines a set of voxel-bounded implicit fields organized in a sparse voxel octree to model local properties in each cell. We progressively learn the underlying voxel structures with a differentiable ray-marching operation from only a set of posed RGB images. With the sparse voxel octree structure, rendering novel views can be accelerated by skipping the voxels containing no relevant scene content. Our method is typically over 10 times faster than the state-of-the-art (namely, NeRF(Mildenhall et al., 2020)) at inference time while achieving higher quality results. Furthermore, by utilizing an explicit sparse voxel representation, our method can easily be applied to scene editing and scene composition. We also demonstrate several challenging tasks, including multi-scene learning, free-viewpoint rendering of a moving human, and large-scale scene rendering. Code and data are available at our website:
arXiv:2007.11571v2 fatcat:4bqg6zxnezfelpnd4cmgldnxl4

Outer Product-based Neural Collaborative Filtering [article]

Xiangnan He, Xiaoyu Du, Xiang Wang, Feng Tian, Jinhui Tang and Tat-Seng Chua
2018 arXiv   pre-print
This is consistent with the recent finding of [He and Chua, 2017] in using MLP for sparse data prediction. . Efficacy of CNN.  ...  that, modeling the interaction of feature embeddings explicitly is particularly useful for a deep learning model to generalize well on sparse data, whereas using concatenation is sub-optimal [He and Chua  ... 
arXiv:1808.03912v1 fatcat:s2e7hamhibfadjpzercijcd4q4
« Previous Showing results 1 — 15 out of 600 results