Filters








5,002 Hits in 12.2 sec

Processing Long Queries Against Short Text

Dongxiang Zhang, Yuchen Li, Ju Fan, Lianli Gao, Fumin Shen, Heng Tao Shen
2017 ACM Transactions on Information Systems  
Processing long queries against short text: Top-k advertisement matching in news stream applications. ACM Trans. Inf.  ...  and Technology of China Many real applications in real-time news stream advertising call for efficient processing of long queries against short text.  ...  In this article, we study efficient processing of the remaining setup (long query against short text), which finds useful applications in real-time news stream advertising.  ... 
doi:10.1145/3052772 fatcat:i5sii7dr4nfhfhvi6n3mz5otna

Taming Pretrained Transformers for Extreme Multi-label Text Classification [article]

Wei-Cheng Chang, Hsiang-Fu Yu, Kai Zhong, Yiming Yang, Inderjit Dhillon
2020 arXiv   pre-print
For example, the input text could be a product description on Amazon.com and the labels could be product categories. XMC is an important yet challenging problem in the NLP community.  ...  The proposed method achieves new state-of-the-art results on four XMC benchmark datasets.  ...  Given text information about labels, such as a short description of categories in the Wikipedia dataset or search queries on the Amazon shopping website, we can use this short text to represent the labels  ... 
arXiv:1905.02331v4 fatcat:wm3x3jwpnngvfgjpr3ljos3srq

The TREC-2002 Video Track Report

Alan F. Smeaton, Paul Over
2002 Text Retrieval Conference  
The reliable usefulness of features in search generally or in specific situations has yet to be Matching the text of the topic against the text derived by automatic speech recognition on the video's audio  ...  As last year, gradual transitions could only match gradual transitions and cuts match only cuts, except in the case of very short gradual transitions (5 frames or less), which, whether in the reference  ... 
dblp:conf/trec/SmeatonO02 fatcat:csw2fkotczf6vfpuocivvs7iga

LeadLine: Interactive visual analysis of text data through event identification and exploration

Wenwen Dou, Xiaoyu Wang, Drew Skau, William Ribarsky, Michelle X. Zhou
2012 2012 IEEE Conference on Visual Analytics Science and Technology (VAST)  
Top right: people and entities related to President Obama (selected) are shown in the graph. Bottom right: locations mentioned in news articles related to the president.  ...  In this paper, we propose an interactive visual analytics system, LeadLine, to automatically identify meaningful events in news and social media data and support exploration of the events.  ...  ACKNOWLEDGEMENTS This work was supported in part by a grant from the National Science Foundation under award number SBE-0915528.  ... 
doi:10.1109/vast.2012.6400485 dblp:conf/ieeevast/DouWSRZ12 fatcat:icluncbxo5fo5d4se4pawagseq

Data-Intensive Text Processing with MapReduce

Jimmy Lin, Chris Dyer
2010 Synthesis Lectures on Human Language Technologies  
In the context of text processing, streaming algorithms have been applied to language modeling [90] , translation modeling [89] , and detecting the first mention of news event in a stream [121] .  ...  Similarly, Hive [68] , another open-source project, provides an abstraction on top of Hadoop that allows users to issue SQL queries against large relational datasets stored in HDFS.  ...  SUMMARY AND ADDITIONAL READINGS 151 large datasets in parallel, provides researchers with an effective strategy for developing increasingly-effective applications.  ... 
doi:10.2200/s00274ed1v01y201006hlt007 fatcat:daso2mcdvrg6jev3vbospnsvpe

Pangloss: Fast Entity Linking in Noisy Text Environments [article]

Michael Conover, Matthew Hayes, Scott Blackburn, Pete Skomoroch, Sam Shah
2018 arXiv   pre-print
text adds a new dimension to this problem.  ...  Traditionally, this work has focused on text that has been well-formed, like news articles, but in common real world datasets such as messaging, resumes, or short-form social media, non-grammatical, loosely-structured  ...  Even conventional commercial machine learning problems such as content recommendation, query understanding, and targeted advertising can benefit from the structure afforded by connecting raw text to knowledge  ... 
arXiv:1807.06036v1 fatcat:jvjs2ioqibc2vg72ng3gwyql4y

Facebook 'Regulation': a process not a text

Leighton Andrews
2020 Zenodo  
The regulatory process uncovered builds on existing regulatory frameworks and illustrates that platform regulation is a process, not a finished text.  ...  These politicians and regulators have engaged in a process of sense-making, building their discursive capacity in a range of technical and novel issues.  ...  Zealand terrorist action live-streamed on Facebook, though actions fell short of the prohibition of live-streaming called for shortly afterwards by the UK Digital Minister (Andrews, 2019a).  ... 
doi:10.5281/zenodo.3933014 fatcat:n6sapmoqdrhtvesyneorkz7tme

Uncertainty Handling in Named Entity Extraction and Disambiguation for Informal Text [chapter]

Maurice van Keulen, Mena B. Habib
2014 Lecture Notes in Computer Science  
These streams of user generated content (UGC) provide an opportunity and challenge for media analysts to analyze huge amount of new data and use them to infer and reason with new information.  ...  We propose a robust combined framework for NEE and NED in semiformal and informal text.  ...  in informal short text in tweets.  ... 
doi:10.1007/978-3-319-13413-0_16 fatcat:y4ptlh5nsrcqpdd3zb73u3tpye

Text Extraction and Retrieval from Smartphone Screenshots: Building a Repository for Life in Media [article]

Agnese Chiatti, Mu Jung Cho, Anupriya Gagneja, Xiao Yang, Miriam Brinberg, Katie Roehrick, Sagnik Ray Choudhury, Nilam Ram, Byron Reeves and C. Lee Giles
2018 arXiv   pre-print
We show how combining OpenCV-based pre-processing modules with a Long short-term memory (LSTM) based release of Tesseract OCR, without ad hoc training, led to a 74% character-level accuracy of the extracted  ...  In this paper, we present the experimental workflow we exploited to: (i) pre-process a unique collection of screen captures, (ii) extract unstructured text embedded in the images, (iii) organize image  ...  data are produced for new data streams.  ... 
arXiv:1801.01316v1 fatcat:y56cjn7nrrf5xgolceimrs6h3q

Comparing automated text classification methods

Jochen Hartmann, Juliana Huppertz, Christina Schamp, Mark Heitmann
2019 International Journal of Research in Marketing  
All lexicon-based approaches, LIWC in particular, perform poorly compared with machine learning. In some applications, accuracies only slightly exceed chance.  ...  Many marketing applications require structuring this data at scales non-accessible to human coding, e.g., to detect communication shifts in sentiment or other researcher-defined content categories.  ...  Their customer service department of 150 service representatives is structured in five teams mirroring typical customer queries (i.e., questions about activities and the booking process, questions about  ... 
doi:10.1016/j.ijresmar.2018.09.009 fatcat:g2la5l2abbg4pbbl2dckhyyeum

Text-Based Twitter User Geolocation Prediction

B. Han, P. Cook, T. Baldwin
2014 The Journal of Artificial Intelligence Research  
., gazetteer terms, dialectal words) in a text are indicative of its author's location.  ...  In this paper, we investigate and improve on the task of text-based geolocation prediction of Twitter users.  ...  By decomposing the stacked models and evaluating against the base classifiers, we find the accuracy declines are primarily caused by accuracy drops in the LOC classifier on the new LIVE data, of approximately  ... 
doi:10.1613/jair.4200 fatcat:jvdb3fdb4ngoxmsjo4fm2jbqve

Summarization

Jade Goldstein, Jaime Carbonell
1996 Proceedings of a workshop on held at Baltimore, Maryland October 13-15, 1998 -  
The Maximal Marginal Relevance (MMR) criterion strives to reduce redundancy while maintaining query relevance in reranking retrieved documents and in selecting appropriate passages for text summarization  ...  Preliminary results indicate some benefits for MMR diversity ranking in ad-hoc query and in single document summarization.  ...  The top 10 sentences for ~ = 1 (effectively query relevance, but no MMR) and k = .3 (both query relevance and MMR anti-redundancy) are shown in Figures 4 and 5 respectively.  ... 
doi:10.3115/1119089.1119120 dblp:conf/tipster/GoldsteinC98 fatcat:kjgirip4ffewrhb6h4vtyzhz2e

Keyword query cleaning

Ken Q. Pu, Xiaohui Yu
2008 Proceedings of the VLDB Endowment  
The top-k query cleaning algorithm is guaranteed to return the best k cleaned keyword queries in ranked order.  ...  We further extend the basic algorithm to address incremental query cleaning and top-k optimal query cleaning.  ...  Example: Matching bodies of text with advertising postings is a common technique used in internet marketing (e.g. Google's Ad-Sense, or Google Mail's Ad posting).  ... 
doi:10.14778/1453856.1453955 fatcat:yso5pfzpwzh7hbampcvq6wlzvi

Continuous top-k query for graph streams

Shirui Pan, Xingquan Zhu
2012 Proceedings of the 21st ACM international conference on Information and knowledge management - CIKM '12  
In this paper, we propose to query correlated graphs in a data stream scenario, where an algorithm is required to retrieve the top k graphs which are mostly correlated to a query graph q.  ...  Our method represents the first research endeavor for data stream based top-k correlated graph query.  ...  A typical top-k correlated graph query in a data stream with window size w=3 is shown in Fig. 1 . Definition 1.  ... 
doi:10.1145/2396761.2398717 dblp:conf/cikm/PanZ12a fatcat:lugcho6xdbdtzgg5srgcqkwvsa

A Data Structure for Sponsored Search

Arnd Christian König, Kenneth Church, Martin Markov
2009 Proceedings / International Conference on Data Engineering  
When a user issues a search query, bids are typically matched to the query using broad-match semantics: all the terms in the bid need to be in the query (but not vice versa).  ...  This paper proposes novel index structures and query processing algorithms for sponsored search. We evaluate these structures using a real corpus of 180 million advertisements.  ...  However, our main ideas are applicable to tree-like structures as well, something we discuss in Section III-B. III. PROCESSING BROAD-MATCH QUERIES A.  ... 
doi:10.1109/icde.2009.37 dblp:conf/icde/KonigCM09 fatcat:q67vkxvwlrfd3hfpmwo64gd6pu
« Previous Showing results 1 — 15 out of 5,002 results