1,217 Hits in 5.6 sec

Optimizing scoring functions and indexes for proximity search in type-annotated corpora

Soumen Chakrabarti, Kriti Puniyani, Sujatha Das
2006 Proceedings of the 15th international conference on World Wide Web - WWW '06  
Second, we exploit the skew in the distribution over types seen in query logs to optimize the space required by the new index structures required by our system.  ...  First, we propose a new algorithm that estimates a scoring function from past logs of queries and answer spans.  ...  Therefore, we must learn good proximity scoring functions automatically before we can consider index and query optimization.  ... 
doi:10.1145/1135777.1135882 dblp:conf/www/ChakrabartiPD06 fatcat:fzgzxrakmffpni67r56plfn5aq

Making Watson fast

E. A. Epstein, M. I. Schor, B. S. Iyer, A. Lally, E. W. Brown, J. Cwiklik
2012 IBM Journal of Research and Development  
IBM Watsoni is a system created to demonstrate DeepQA technology by competing against human champions in a question-answering game designed for people.  ...  This paper describes how a large set of deep natural-language processing programs were integrated into a single application, scaled out across thousands of central processing unit cores, and optimized  ...  The reranking was implemented by using Indri's capability of plugging in a custom scoring function, and the sentence boundary expansion by precomputing the sentence boundaries over the entire corpus, indexing  ... 
doi:10.1147/jrd.2012.2188761 fatcat:k5mdcdtmkbfsldzidqaj3smgtu

NER in Archival Finding Aids: Extended

Luís Filipe da Costa Cunha, José Carlos Ramalho
2022 Machine Learning and Knowledge Extraction  
In order to achieve high result scores, we annotated several corpora to train our own Machine Learning algorithms in this context domain.  ...  These named entities translate into crucial information about their context and, with high confidence results, they can be used for several purposes, for example, the creation of smart browsing tools by  ...  In total, the resultant annotated corpora contains 164,478 tokens that make up 6302 phrases where the following named entities types were annotated: Person, Profession or Title, Place, Date, and Organization  ... 
doi:10.3390/make4010003 fatcat:xfgvkz5rzfdehidl2vss6heypq

Expansion of medical vocabularies using distributional semantics on Japanese patient blogs

Magnus Ahltorp, Maria Skeppstedt, Shiho Kitajima, Aron Henriksson, Rafal Rzepka, Kenji Araki
2016 Journal of Biomedical Semantics  
Medical vocabularies are, however, essential also for text mining from corpora written in other languages than English and belonging to a variety of medical genres.  ...  Research on medical vocabulary expansion from large corpora has primarily been conducted using text written in English or similar languages, due to a limited availability of large biomedical corpora in  ...  We would also like to thank the Swedish Foundation for Strategic Research, as well as the anonymous reviewers.  ... 
doi:10.1186/s13326-016-0093-x pmid:27671202 pmcid:PMC5037651 fatcat:rzbt3q6wbnb3flcrj7hcee6cwa

CoType: Joint Extraction of Typed Entities and Relations with Knowledge Bases [article]

Xiang Ren, Zeqiu Wu, Wenqi He, Meng Qu, Clare R. Voss, Heng Ji, Tarek F. Abdelzaher, Jiawei Han
2017 arXiv   pre-print
We formulate a joint optimization problem to learn embeddings from text corpora and knowledge bases, adopting a novel partial-label loss function for noisy labeled data and introducing an object "translation  ...  Extracting entities and relations for types of interest from text is important for understanding massive text corpora.  ...  The views and conclusions contained in this paper are those of the authors and should not be interpreted as representing any funding agencies.  ... 
arXiv:1610.08763v2 fatcat:sgowp5ul6jgqxli5yqlnlscmwu


Xiang Ren, Zeqiu Wu, Wenqi He, Meng Qu, Clare R. Voss, Heng Ji, Tarek F. Abdelzaher, Jiawei Han
2017 Proceedings of the 26th International Conference on World Wide Web - WWW '17  
We formulate a joint optimization problem to learn embeddings from text corpora and knowledge bases, adopting a novel partial-label loss function for noisy labeled data and introducing an object "translation  ...  Extracting entities and relations for types of interest from text is important for understanding massive text corpora.  ...  In this work, we define function Q(c) as the equally weighted combination of the phrase quality score and POS pattern quality score for candidate segment c, which is estimated in step (2) .  ... 
doi:10.1145/3038912.3052708 dblp:conf/www/RenWHQVJAH17 fatcat:lw3fcefeibca5bdr4ezkkqtczu

Data-oriented content query system

Mianwei Zhou, Tao Cheng, Kevin Chen-Chuan Chang
2010 Proceedings of the third ACM international conference on Web search and data mining - WSDM '10  
, typed-entity search, and question answering.  ...  For efficient processing, we design novel index structures and query processing algorithms.  ...  [6] propose to search for annotations using proximity in documents. EntityRank [7] proposes the problem of entity search, and studies a probabilistic ranking model.  ... 
doi:10.1145/1718487.1718503 dblp:conf/wsdm/ZhouCC10 fatcat:eguz6sc3wbdjlasvhu4zxapcmi

Crowd-annotation and LoD-based semantic indexing of content in multi-disciplinary web repositories to improve search results

Arshad Khan, Thanassis Tiropanis, David Martin
2017 Proceedings of the Australasian Computer Science Week Multiconference on - ACSW '17  
We deployed a custombuilt annotation, indexing and searching environment in a web repository website that has been used by expert annotators to annotate webpages using free text and vocabulary terms.  ...  Searching for relevant information in multi-disciplinary web repositories is becoming a topic of increasing interest among the computer science research community.  ...  The IDF in our case will be based on N=3400 and DF 10 which is the optimal figure for search results in a web application.  ... 
doi:10.1145/3014812.3014867 dblp:conf/acsw/KhanTM17 fatcat:wbtxxwbwq5bw5hqnkvivozko6u

Early Fusion Strategy for Entity-Relationship Retrieval [article]

Pedro Saleiro, Natasa Milic-Frayling, Eduarda Mendes Rodrigues, Carlos Soares
2017 arXiv   pre-print
In this work, we consider entity and relationships of any type, i.e, characterized by context terms instead of pre-defined types or relationships.  ...  We address the task of entity-relationship (E-R) retrieval, i.e, given a query characterizing types of two or more entities and relationships between them, retrieve the relevant tuples of related entities  ...  Concluding Remarks Work reported in this paper is concerned with expanding the scope of entity-relationship search methods to enable search over large corpora with flexible entity types and complex relationships  ... 
arXiv:1707.09075v2 fatcat:tydfq2bvincixpf6tk5sem6jq4

Improving Automatic Phonetic Segmentation For Creating Singing Voice Synthesizer Corpora

Varun Jewalikar, Jordi Bonada, Merlijn Blaauw
2013 Zenodo  
A score function is calculated for candidate boundaries in the train set. The score and features for the train set are used for training random forest regression models.  ...  A detailed description of how score predictive modelling is adapted for our corpora and how it is implemented is presented.  ...  Acknowledgements I would like to thank Xavier Serra and the staff at the MTG for giving me an  ... 
doi:10.5281/zenodo.1161281 fatcat:o7ui44zjxnd6hhg7r5vdkl5jsi

A Multimedia Search And Navigation Prototype, Including Music And Video-Clips

Geoffroy Peeters, Frédéric Cornu, Christophe Charbuillet, Damien Tardieu, Juan José Burred, Marie Vian, Valérie Botherel, Jean-Bernard Rault, Jean-Philippe Cabanal
2012 Zenodo  
This work was supported by the "Quaero" Program funded by Oseo French State agency for innovation.  ...  These annotated corpora are then used to the train the corresponding technologies and optimization is performed to reduce computation time, disk access and memory load.  ...  ANNOTATED CORPORA FOR TRAINING Corpus creation for the UBM training Since both auto-tagging and search-by-similarity modules rely on Super-Vectors, the corresponding UBM needs to be trained in advance  ... 
doi:10.5281/zenodo.1417760 fatcat:rntgn53s2re2baw5k3uclcft7a

Generating Pseudo Test Collections for Learning to Rank Scientific Articles [chapter]

Richard Berendsen, Manos Tsagkias, Maarten de Rijke, Edgar Meij
2012 Lecture Notes in Computer Science  
We use these annotations and the associated documents as a source for pairs of queries and relevant documents.  ...  We propose a method for generating pseudo test collections in the domain of digital libraries, where data is relatively sparse, but comes with rich annotations.  ...  Terrier-DSM a DFR proximity dependence model, with proximity ngram length of 2, SD = 1, F D=1, and using pBiL. For this model, block indexing has to be performed. We set block.size to 1.  ... 
doi:10.1007/978-3-642-33247-0_6 fatcat:ydc5auu5ibepjnfmuwdaeddbqy

Ranking suspected answers to natural language questions using predictive annotation

Dragomir R. Radev, John Prager, Valerie Samn
2000 Proceedings of the sixth conference on Applied natural language processing -  
We process both corpus and query using a new technique, predictive annotation, which augments phrases in texts with labels anticipating their being targets of certain kinds of questions.  ...  Given a natural language question, our IR system returns a set of matching passages, which we then rank using a linear function of seven predictor variables.  ...  Acknowledgments We would like to thank Eric Brown, Anni Coden, and Wlodek Zadrozny from IBM Research for useful comments and collaboration.  ... 
doi:10.3115/974147.974168 dblp:conf/anlp/RadevPS00 fatcat:lrrdqh5osjdg3lt7o3okcvicvi

A search engine for natural language applications

Michael J. Cafarella, Oren Etzioni
2005 Proceedings of the 14th international conference on World Wide Web - WWW '05  
Yet Web search engines are designed and optimized for simple human queries-they are not well suited to support such applications.  ...  In response, this paper introduces the Bindings Engine (be), which supports queries containing typed variables and string-processing functions.  ...  This research was supported in part by NSF grant IIS-0312988, DARPA contract NBCHD030010, ONR grant N000 14-02-1-0324, and a gift from Google.  ... 
doi:10.1145/1060745.1060811 dblp:conf/www/CafarellaE05 fatcat:5tys5wig2bcapgfqq4jbciirlm


Jonathon S. Hare, Sina Samangooei, David P. Dupplaw, Paul H. Lewis
2012 Proceedings of the 2nd ACM International Conference on Multimedia Retrieval - ICMR '12  
It incorporates a state-of-the-art implementation of the single-pass indexing technique for constructing inverted indexes and is capable of producing highly compressed index data structures.  ...  The ImageTerrier platform is demonstrated to successfully index and search a corpus of over 10 million images containing just under 10,000,000,000 quantised SIFT visual terms.  ...  Systems such as Lucignolo and VIRaL allow search across pre-defined image corpora; not allowing custom corpora to be indexed.  ... 
doi:10.1145/2324796.2324844 dblp:conf/mir/HareSDL12 fatcat:ewmrnnd3avcohdbowsywvmfi2e
« Previous Showing results 1 — 15 out of 1,217 results