Filters








168 Hits in 1.7 sec

Attentive Memory Networks: Efficient Machine Reading for Conversational Search [article]

Tom Kenter, Maarten de Rijke
2017 arXiv   pre-print
Recent advances in conversational systems have changed the search paradigm. Traditionally, a user poses a query to a search engine that returns an answer based on its index, possibly leveraging external knowledge bases and conditioning the response on earlier interactions in the search session. In a natural conversation, there is an additional source of information to take into account: utterances produced earlier in a conversation can also be referred to and a conversational IR system has to
more » ... ep track of information conveyed by the user during the conversation, even if it is implicit. We argue that the process of building a representation of the conversation can be framed as a machine reading task, where an automated system is presented with a number of statements about which it should answer questions. The questions should be answered solely by referring to the statements provided, without consulting external knowledge. The time is right for the information retrieval community to embrace this task, both as a stand-alone task and integrated in a broader conversational search setting. In this paper, we focus on machine reading as a stand-alone task and present the Attentive Memory Network (AMN), an end-to-end trainable machine reading algorithm. Its key contribution is in efficiency, achieved by having an hierarchical input encoder, iterating over the input only once. Speed is an important requirement in the setting of conversational search, as gaps between conversational turns have a detrimental effect on naturalness. On 20 datasets commonly used for evaluating machine reading algorithms we show that the AMN achieves performance comparable to the state-of-the-art models, while using considerably fewer computations.
arXiv:1712.07229v1 fatcat:uprn6jkvrnfmtp4r5nihhst4sa

Siamese CBOW: Optimizing Word Embeddings for Sentence Representations [article]

Tom Kenter, Alexey Borisov, Maarten de Rijke
2016 arXiv   pre-print
We present the Siamese Continuous Bag of Words (Siamese CBOW) model, a neural network for efficient estimation of high-quality sentence embeddings. Averaging the embeddings of words in a sentence has proven to be a surprisingly successful and efficient way of obtaining sentence embeddings. However, word embeddings trained with the methods currently available are not optimized for the task of sentence representation, and, thus, likely to be suboptimal. Siamese CBOW handles this problem by
more » ... g word embeddings directly for the purpose of being averaged. The underlying neural network learns word embeddings by predicting, from a sentence representation, its surrounding sentences. We show the robustness of the Siamese CBOW model by evaluating it on 20 datasets stemming from a wide variety of sources.
arXiv:1606.04640v1 fatcat:gebil5wpingavjchb5cuequdxa

Neural Networks for Information Retrieval [article]

Tom Kenter, Alexey Borisov, Christophe Van Gysel, Mostafa Dehghani, Maarten de Rijke, Bhaskar Mitra
2017 arXiv   pre-print
Machine learning plays a role in many aspects of modern IR systems, and deep learning is applied in all of them. The fast pace of modern-day research has given rise to many different approaches for many different IR problems. The amount of information available can be overwhelming both for junior students and for experienced researchers looking for new research topics and directions. Additionally, it is interesting to see what key insights into IR problems the new technologies are able to give
more » ... s. The aim of this full-day tutorial is to give a clear overview of current tried-and-trusted neural methods in IR and how they benefit IR research. It covers key architectures, as well as the most promising future directions.
arXiv:1707.04242v1 fatcat:4idscmq26fa5bjupldwuyghq4m

Neural Networks for Information Retrieval [article]

Tom Kenter and Alexey Borisov and Christophe Van Gysel and Mostafa Dehghani and Maarten de Rijke and Bhaskar Mitra
2018 arXiv   pre-print
Machine learning plays a role in many aspects of modern IR systems, and deep learning is applied in all of them. The fast pace of modern-day research has given rise to many approaches to many IR problems. The amount of information available can be overwhelming both for junior students and for experienced researchers looking for new research topics and directions. The aim of this full-day tutorial is to give a clear overview of current tried-and-trusted neural methods in IR and how they benefit IR.
arXiv:1801.02178v1 fatcat:c3kevelcrffodift2vvwnoscjq

Evaluating document filtering systems over time

Tom Kenter, Krisztian Balog, Maarten de Rijke
2015 Information Processing & Management  
Please cite this article in press as: Kenter, T., et al. Evaluating document filtering systems over time.  ...  Fpra 2012 0.49 0.49 0.85 2013 0.67 0.64 0.67 xxx-xxx 13 Please cite this article in press as: Kenter, T., et al. Evaluating document filtering systems over time.  ... 
doi:10.1016/j.ipm.2015.03.005 fatcat:luykqbq27vdjtpt65ohwm3sp2q

Evaluating Long-form Text-to-Speech: Comparing the Ratings of Sentences and Paragraphs [article]

Rob Clark, Hanna Silen, Tom Kenter, Ralph Leith
2019 arXiv   pre-print
Text-to-speech systems are typically evaluated on single sentences. When long-form content, such as data consisting of full paragraphs or dialogues is considered, evaluating sentences in isolation is not always appropriate as the context in which the sentences are synthesized is missing. In this paper, we investigate three different ways of evaluating the naturalness of long-form text-to-speech synthesis. We compare the results obtained from evaluating sentences in isolation, evaluating whole
more » ... ragraphs of speech, and presenting a selection of speech or text as context and evaluating the subsequent speech. We find that, even though these three evaluations are based upon the same material, the outcomes differ per setting, and moreover that these outcomes do not necessarily correlate with each other. We show that our findings are consistent between a single speaker setting of read paragraphs and a two-speaker dialogue scenario. We conclude that to evaluate the quality of long-form speech, the traditional way of evaluating sentences in isolation does not suffice, and that multiple evaluations are required.
arXiv:1909.03965v1 fatcat:5izucgp26fhrrmywsnqvihkf7a

Hierarchical Re-estimation of Topic Models for Measuring Topical Diversity [article]

Hosein Azarbonyad and Mostafa Dehghani and Tom Kenter and Maarten Marx and Jaap Kamps and Maarten de Rijke
2017 arXiv   pre-print
A high degree of topical diversity is often considered to be an important characteristic of interesting text documents. A recent proposal for measuring topical diversity identifies three elements for assessing diversity: words, topics, and documents as collections of words. Topic models play a central role in this approach. Using standard topic models for measuring diversity of documents is suboptimal due to generality and impurity. General topics only include common information from a
more » ... d corpus and are assigned to most of the documents in the collection. Impure topics contain words that are not related to the topic; impurity lowers the interpretability of topic models and impure topics are likely to get assigned to documents erroneously. We propose a hierarchical re-estimation approach for topic models to combat generality and impurity; the proposed approach operates at three levels: words, topics, and documents. Our re-estimation approach for measuring documents' topical diversity outperforms the state of the art on PubMed dataset which is commonly used for diversity experiments.
arXiv:1701.04273v1 fatcat:h2smns34qnc7bglwpm6fmyryly

StrawNet: Self-Training WaveNet for TTS in Low-Data Regimes

Manish Sharma, Tom Kenter, Rob Clark
2020 Interspeech 2020  
Recently, WaveNet has become a popular choice of neural network to synthesize speech audio. Autoregressive WaveNet is capable of producing high-fidelity audio, but is too slow for real-time synthesis. As a remedy, Parallel WaveNet was proposed, which can produce audio faster than real time through distillation of an autoregressive teacher into a feedforward student network. A shortcoming of this approach, however, is that a large amount of recorded speech data is required to produce
more » ... student models, and this data is not always available. In this paper, we propose StrawNet: a self-training approach to train a Parallel WaveNet. Self-training is performed using the synthetic examples generated by the autoregressive WaveNet teacher. We show that, in low-data regimes, training on high-fidelity synthetic data from an autoregressive teacher model is superior to training the student model on (much fewer) examples of recorded speech. We compare StrawNet to a baseline Parallel WaveNet, using both side-by-side tests and Mean Opinion Score evaluations. To our knowledge, synthetic speech has not been used to train neural text-to-speech before. Figure 1 shows a schematic overview of both the baseline and the StrawNet approach. The conventional way of training a Parallel WaveNet [5] is a two-step procedure. In the first step, shown in the top-left corner of Figure 1 , an autoregressive
doi:10.21437/interspeech.2020-1437 dblp:conf/interspeech/SharmaKC20 fatcat:se4ehpmmhjfdbdqy43dp3xqyhy

HiTR: Hierarchical Topic Model Re-estimation for Measuring Topical Diversity of Documents [article]

Hosein Azarbonyad, Mostafa Dehghani, Tom Kenter, Maarten Marx, Jaap Kamps, Maarten de Rijke
2018 arXiv   pre-print
Kenter, M. Marx, and M. de Rijke are with Informatics Institute, University of Amsterdam. • M. Dehghani and J. Kamps are with Institute for Logic, Language, and Computation, University of Amsterdam.  ... 
arXiv:1810.05436v1 fatcat:y7qhgfr62zfhxl4webmf7nugna

Context-Based Entity Linking - University of Amsterdam at TAC 2012

David Graus, Tom Kenter, Marc Bron, Edgar Meij, Maarten de Rijke
2012 Text Analysis Conference  
This paper describes our approach to the 2012 Text Analysis Conference (TAC) Knowledge Base Population (KBP) entity linking track. For this task, we turn to a state-of-the-art system for entity linking in microblog posts. Compared to the little context microblog posts provide, the documents in the TAC KBP track provide context of greater length and of a less noisy nature. In this paper, we adapt the entity linking system for microblog posts to the KBP task by extending it with approaches that
more » ... plicitly rely on the query's context. We show that incorporating novel features that leverage the context on the entity-level can lead to improved performance in the TAC KBP task.
dblp:conf/tac/GrausKBMR12 fatcat:43p3mjgfdbduxjjl6mj5xkhyvm

Personal Knowledge Graphs

Krisztian Balog, Tom Kenter
2019 Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval - ICTIR '19  
Knowledge graphs, organizing structured information about entities, and their attributes and relationships, are ubiquitous today. Entities, in this context, are usually taken to be anyone or anything considered to be globally important. This, however, rules out many entities people interact with on a daily basis. In this position paper, we present the concept of personal knowledge graphs: resources of structured information about entities personally related to its user, including the ones that
more » ... ight not be globally important. We discuss key aspects that separate them for general knowledge graphs, identify the main challenges involved in constructing and using them, and define a r esearch agenda. CCS CONCEPTS Information systems Entity relationship models.
doi:10.1145/3341981.3344241 dblp:conf/ictir/BalogK19 fatcat:m6g54ipn4zb6bpc2wu2ksb4kkq

Filtering Documents over Time on Evolving Topics - The University of Amsterdam at TREC 2013 KBA CCR

Tom Kenter
2013 Text Retrieval Conference  
We use a multinomial Naive Bayes classifier with feature selection based on time-aware χ 2 as described in (Kenter et al., 2013) .  ...  A multinomial Naive Bayes classifier is used for the experiments, which is able to adapt to changes in the classes it monitors over time by selecting features based on time-aware χ 2 (Kenter et al., 2013  ... 
dblp:conf/trec/Kenter13 fatcat:552epmznwzdtlcavg6qlnmnwyq

Hierarchical Re-estimation of Topic Models for Measuring Topical Diversity [chapter]

Hosein Azarbonyad, Mostafa Dehghani, Tom Kenter, Maarten Marx, Jaap Kamps, Maarten de Rijke
2017 Lecture Notes in Computer Science  
.; Kenter, T.M.; Marx, M.J.; Kamps, J.; de Rijke, M. Abstract. A high degree of topical diversity is often considered to be an important characteristic of interesting text documents.  ... 
doi:10.1007/978-3-319-56608-5_6 fatcat:xu7xbtl25nedbk55bdcc6kvroa

Design and Implementation of ShiCo: Visualising Shifting Concepts over Time

Carlos Martinez-Ortiz, Tom Kenter, Melvin Wevers, Pim Huijnen, Jaap Verheul, Joris van Eijnatten
2016 Digital Humanities Conference  
In different times, people use different words to describe concepts. Change and stability in word usage are possible indicators of wider socio-cultural changes. To gain insight into how people perceive concepts, it is valuable to trace how the words denoting a certain concept change over time. Existing tools for exploring historical concepts, such as keyword searching or topic modeling, are ill-suited for the task; they are either too top-down or too rigid for an iterative exploration of
more » ... cal concepts in large data sets. In this article, we present ShiCo: a graphical interface for visualising concepts over time by monitoring shifts in word usage in a document corpus. As the dimension of time plays a crucial role in ShiCo, this article demonstrates ShiCo on a large corpus of newspaper articles spanning several decades. We describe the design choices made during the development of ShiCo and the key parameters that control the tool's behaviour. Lastly, as ShiCo is meant to be used by the broader community, we describe the steps required for running ShiCo on a novel data set.
dblp:conf/dihu/Martinez-OrtizK16 fatcat:oiqojg46trbs7oxpdpwvcgrn3q

Short Text Similarity with Word Embeddings

Tom Kenter, Maarten de Rijke
2015 Proceedings of the 24th ACM International on Conference on Information and Knowledge Management - CIKM '15  
doi:10.1145/2806416.2806475 fatcat:etilsmje7feo5pxftnitwhz73a
« Previous Showing results 1 — 15 out of 168 results