9 Hits in 3.4 sec

Document Similarity Search Based on Manifold-Ranking of TextTiles [chapter]

Xiaojun Wan, Jianwu Yang, Jianguo Xiao
2006 Lecture Notes in Computer Science  
In this paper, we proposed a novel retrieval approach based on manifold-ranking of TextTiles to re-rank the initially retrieved documents.  ...  Document similarity search aims to find documents similar to a query document in a text corpus and return a ranked list of similar documents.  ...  The proposed approach re-ranks a small number of initially retrieved documents based on manifold-ranking of TextTiles.  ... 
doi:10.1007/11880592_2 fatcat:o2xpyycvkja33jni6hdni4al34

A segment-based approach to clustering multi-topic documents

Andrea Tagarelli, George Karypis
2012 Knowledge and Information Systems  
We empirically give evidence of the significance of our segment-based approach on large collections of multi-topic documents, and we compare it to conventional methods for document clustering.  ...  We propose a novel document clustering framework that is designed to induce a document organization from the identification of cohesive groups of segment-based portions of the original documents.  ...  Acknowledgments Portions of this work appeared in SDM 2008 Workshop on Text Mining [46] .  ... 
doi:10.1007/s10115-012-0556-z fatcat:istysfkpabglvpvyiz6il4vcwy

Predicting the effectiveness of queries and retrieval systems

Claudia Hauff
2010 SIGIR Forum  
Based on a document's score vector y, a perturbed score vectorỹ is derived, which is based on the similarity between the ranked documents of a system.  ...  based on the similarity between query and result ranking.  ...  All reported results are based on the Lemur Toolkit for Language Modeling and Information Retrieval 1 , version 4.3.2.  ... 
doi:10.1145/1842890.1842906 fatcat:jkesk5hrvfe77bg5xr7yg7bbie

Multidimensional topic analysis in political texts

Cäcilia Zirn, Heiner Stuckenschmidt
2014 Data & Knowledge Engineering  
In this paper, we propose a method for analyzing and comparing documents according to a set of predefined topics that is based on an extension of Latent Dirichlet Allocation (LDA) for inducing knowledge  ...  We validate the method by showing that it can guess which member of a coalition was assigned a certain ministry based on a comparison of the parties' election manifestos with the coalition contract.  ...  Acknowledgements We want to thank Sven-Oliver Proksch of the Mannheim Centre for European Social Research (MZES) for providing the idea to evaluate our approach by comparing party positions to the coalition  ... 
doi:10.1016/j.datak.2013.07.003 fatcat:xavbwve4ejcvjegagt5onmbrm4

User Group Analytics Survey and Research Opportunities

Behrooz Omidvar-Tehrani, Sihem Amer-Yahia
2019 IEEE Transactions on Knowledge and Data Engineering  
We focus on related work which arises from combining those components. We also discuss challenges and future directions of having an all-in-one system, where all those components are combined.  ...  This survey has been presented in the form of two tutorials [1], [2] .  ...  While t-SNE has the same manifold nature as MS, it focuses on local structures of group members to obtain a clearer view [108] .  ... 
doi:10.1109/tkde.2019.2913651 fatcat:6csthrt4zngqzkaur2lfytrupq

A Semantically-Based Computational Approach to Narrative Structure

Rodolfo Delmonte, Giulia Marchesini
2017 International Conference on Computational Semantics  
structures, and semantic features is used to highlight specific portions of the role of each character in the narrative depending strictly on his/her personality traits and on the structure of the story  ...  factuality and subjectivity, and the other focuses on evaluative features derived from the Appraisal Theory framework.  ...  similarity is measured on the basis of a number of lexically-based similarity criteria, without statistical measures.  ... 
dblp:conf/iwcs/DelmonteM17 fatcat:kfo5oovxuvf6pcwopkuheyv6pq

Discourse analysis of asynchronous conversations

Shafiq Rayhan Joty
Our graph-based approach extends state-of-the-art methods by integrating a fine-grained conversational structure with other conversational features.  ...  This thesis focuses on building novel computational models of different discourse analysis tasks in asynchronous conversations; i.e., conversations where participants communicate with each other at different  ...  This method is very similar to TextTiling [83] except that the similarity is computed based on the scores of the chains instead of term frequencies.  ... 
doi:10.14288/1.0165726 fatcat:jixdchdwqzecployo5xgznb534

Native language identification: explorations and applications

Shervin Malmasi
Most work hitherto has focused on the core machine learning and feature engineering facets of the task, obtaining suitable data and unifying the area with a common evaluation framework.  ...  This thesis makes three broad contributions: (1) exploring the task in new ways; (2) investigating how NLI can inform SLA; and (3) introducing the novel task of L1-based text segmentation.  ...  The hyperparameter θ 0 can be chosen, or can be learned via an Expectation-Maximization process. 2 The authors evaluate their method on two corpora from different domain: the ICSI corpus of meeting transcripts  ... 
doi:10.25949/19437986 fatcat:wnf7vdyrsjfjrmbf3nclwdrire

The Third International Conference on Creative Content Technologies

Hans-Werner Sehring, Wolfgang Fohl
The definition of content is manifold.  ...  This research is not advocating the end of document based search; however, we propose that a new search engine architecture, which aims to inspire the creativity of its users, can only be beneficial to  ...  Definition 7: If the same quality of data collection, its test cases is similar.  ...