Filters








3,674 Hits in 4.1 sec

The Wikipedia XML corpus

Ludovic Denoyer, Patrick Gallinari
<span title="2006-06-01">2006</span> <i title="Association for Computing Machinery (ACM)"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/emlsu7gwyfalfms3c35odq5qlm" style="color: black;">SIGIR Forum</a> </i> &nbsp;
These corpora are currently used for both, INEX 2006 2 and the XML Document Mining Challenge 3 . The article provides a description of the corpus.  ...  Statistics about the collections These statistics are given in table 2 the wiki text The XML obtained Categories The documents of the wikipedia XML collections are organized in a hierarchy of categories  ...  Entity corpus We provide an Entity Corpus where each article of the Main English Corpus has been tagged using a set of possible entity types extracted using the different categories of wikipedia.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/1147197.1147210">doi:10.1145/1147197.1147210</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/yawgcuzx6rgl5csrav57ldosle">fatcat:yawgcuzx6rgl5csrav57ldosle</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20170809102647/http://www-connex.lip6.fr/~denoyer/homepage/publications/wikipediaXML.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/03/f5/03f5c49449cb0aec8cfd51714316dc52b67e02dd.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/1147197.1147210"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="unlock alternate icon" style="background-color: #fb971f;"></i> acm.org </button> </a>

The Wikipedia XML Corpus [chapter]

Ludovic Denoyer, Patrick Gallinari
<i title="Springer Berlin Heidelberg"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/2w3awgokqne6te4nvlofavy5a4" style="color: black;">Lecture Notes in Computer Science</a> </i> &nbsp;
These corpora are currently used for both, INEX 2006 2 and the XML Document Mining Challenge 3 . The article provides a description of the corpus.  ...  Statistics about the collections These statistics are given in table 2 the wiki text The XML obtained Categories The documents of the wikipedia XML collections are organized in a hierarchy of categories  ...  Entity corpus We provide an Entity Corpus where each article of the Main English Corpus has been tagged using a set of possible entity types extracted using the different categories of wikipedia.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/978-3-540-73888-6_2">doi:10.1007/978-3-540-73888-6_2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/vpmxiwonqffx3jqu4ineddmv2y">fatcat:vpmxiwonqffx3jqu4ineddmv2y</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20170809102647/http://www-connex.lip6.fr/~denoyer/homepage/publications/wikipediaXML.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/03/f5/03f5c49449cb0aec8cfd51714316dc52b67e02dd.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/978-3-540-73888-6_2"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> springer.com </button> </a>

Report on the XML mining track at INEX 2005 and INEX 2006

Ludovic Denoyer, Patrick Gallinari
<span title="2007-06-01">2007</span> <i title="Association for Computing Machinery (ACM)"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/emlsu7gwyfalfms3c35odq5qlm" style="color: black;">SIGIR Forum</a> </i> &nbsp;
This article is a report concerning the two years of the XML Mining track at INEX (2005 and 2006) . We focus here on the classication and clustering of XML documents.  ...  We detail these two tasks and the corpus used for this challenge and then present a summary of the dierent methods proposed by the participants.  ...  denition of the dierent tasks and the construction of the dierent corpora.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/1273221.1273230">doi:10.1145/1273221.1273230</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/pv56jdndhjfpngmqubiqh57ufu">fatcat:pv56jdndhjfpngmqubiqh57ufu</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20170829005317/https://who.rocq.inria.fr/Anne-Marie.Vercoustre/PAPERS/xmlminingtrack-final.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/f3/31/f3311da87dd265cdc72e4cf91668b76afe9672a5.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/1273221.1273230"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="unlock alternate icon" style="background-color: #fb971f;"></i> acm.org </button> </a>

CLEF 2017 Microblog Cultural Contextualization Lab Overview [chapter]

Liana Ermakova, Lorraine Goeuriot, Josiane Mothe, Philippe Mulhem, Jian-Yun Nie, Eric SanJuan
<span title="">2017</span> <i title="Springer International Publishing"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/2w3awgokqne6te4nvlofavy5a4" style="color: black;">Lecture Notes in Computer Science</a> </i> &nbsp;
The resulting microblog stream and related URLs are appropriate to experiment on advanced social media search and mining methods.  ...  The MC2 CLEF 2017 Content Analysis task deals with classification, filtering, language recognition, localization, entity extraction, linking open data, and summarization.  ...  For each Wikipedia we provided an XML retrieval system powered by Indri, a Perl API for the XML retrieval system using standard LWP (short for "Library for WWW in Perl"), the corpus in a single XML file  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/978-3-319-65813-1_27">doi:10.1007/978-3-319-65813-1_27</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/2ekp7grjpbg7ffnorb5653euku">fatcat:2ekp7grjpbg7ffnorb5653euku</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200115134334/http://ceur-ws.org/Vol-1866/invited_paper_14.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/5f/9e/5f9eb802e6250f03bfe87f55601fdaa4502d5d5f.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/978-3-319-65813-1_27"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> springer.com </button> </a>

WikiOnto: A System for Semi-automatic Extraction and Modeling of Ontologies Using Wikipedia XML Corpus

Lalindra Niranjan De Silva, Lakshman Jayaratne
<span title="">2009</span> <i title="IEEE"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/wckytungwvfnhotxmw56y6k3q4" style="color: black;">2009 IEEE International Conference on Semantic Computing</a> </i> &nbsp;
Based on the Wikipedia XML Corpus, we present a three-tiered framework for extracting topic ontologies in quick time and a modeling environment to refine these ontologies.  ...  bases in the world -the Wikipedia.  ...  In section 3, we describe our source -the Wikipedia XML Corpus -and the structure of the documents in the corpus, along with recognition as to why it is an ideal source for such research.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/icsc.2009.93">doi:10.1109/icsc.2009.93</a> <a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/semco/SilvaJ09.html">dblp:conf/semco/SilvaJ09</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/ts7doavlrfg43deh3zagubz2yq">fatcat:ts7doavlrfg43deh3zagubz2yq</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20150603074303/http://www.cs.utah.edu/~alnds/papers/wikionto_2009.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/5e/f5/5ef5b293d31d1fec98b83c369ccc3287d08b7166.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/icsc.2009.93"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> ieee.com </button> </a>

An English-translated parallel corpus for the CJK Wikipedia collections

Ling-Xiang Tang, Shlomo Geva, Andrew Trotman
<span title="">2012</span> <i title="ACM Press"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/ym4qbiso25flxacjl7t5dh6k3i" style="color: black;">Proceedings of the Seventeenth Australasian Document Computing Symposium on - ADCS &#39;12</a> </i> &nbsp;
This document collection is named CJK2E Wikipedia XML corpus.  ...  The corpus could be used by the information retrieval research community and knowledge sharing in Wikipedia in many ways; for example, this corpus could be used for experimentations in cross-lingual information  ...  The collections were created from the Wikipedia XML dumps taken in January 2012. The original article text with Wikipedia mark-up was converted to XML using the YAWN system [7] .  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/2407085.2407099">doi:10.1145/2407085.2407099</a> <a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/adcs/TangGT12.html">dblp:conf/adcs/TangGT12</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/rchf2w7imjem3jyk7mkjufxrwa">fatcat:rchf2w7imjem3jyk7mkjufxrwa</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20170705224749/http://eprints.qut.edu.au/57835/1/CJK2E-Wikipedia-XML-Corpus-V7.1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/16/b6/16b662aa3e9877b1a1f23634ed9e4d7ebefb74ab.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/2407085.2407099"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> acm.org </button> </a>

WIScking Ideas

Andrea Budac, Geoffrey Rockwell, Zachary Palmer, Robert Budac, Todd Suomela, Stéfan Sinclair, Stan Ruecker, the INKE Team
<span title="">2017</span> <i title="Japanese Association for Digital Humanities"> Journal of the Japanese Association for Digital Humanities </i> &nbsp;
in order to build a corpus for subsequent text analysis and visualization.  ...  This paper describes the development of a tool called WIScker that works with the Wikipedia Application Programming Interface (API) to scrape, or "wisck," the revision history of any Wikipedia article,  ...  Figure 2 shows what the opening XML of the resulting corpus looks like.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.17928/jjadh.2.1_73">doi:10.17928/jjadh.2.1_73</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/vtlh4trjjjfk7ldkepx2yhbita">fatcat:vtlh4trjjjfk7ldkepx2yhbita</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20181102181407/https://www.jstage.jst.go.jp/article/jjadh/2/1/2_73/_pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/58/25/58258e806112650b1dda4ad2510fff4ab270483b.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.17928/jjadh.2.1_73"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> Publisher / doi.org </button> </a>

An Application of Topic Map-Based Ontology Generated from Wikipedia for Query Expansion

S. Eslami, E. Nazemi
<span title="">2013</span> <i title="EJournal Publishing"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/2uckwik5xjerdg36acn26gjq6e" style="color: black;">International Journal of Machine Learning and Computing</a> </i> &nbsp;
topic maps from Wikipedia XML corpus.  ...  Wikipedia is general purpose, freely available online, is containing up to date information so it is a suitable option for topic map development.  ...  EXPERIMENTAL RESULTS For experiments we used the Wikipedia XML corpus.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.7763/ijmlc.2013.v3.337">doi:10.7763/ijmlc.2013.v3.337</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/7xwi4kiiw5gszmeghucz2xvdy4">fatcat:7xwi4kiiw5gszmeghucz2xvdy4</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20170707120043/http://www.ijmlc.org/papers/337-L369.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/27/08/2708faca994b12b495c1e15fd1eb9066846fbc0f.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.7763/ijmlc.2013.v3.337"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> Publisher / doi.org </button> </a>

Clustering XML Documents Using Frequent Subtrees [chapter]

Sangeetha Kutty, Tien Tran, Richi Nayak, Yuefeng Li
<span title="">2009</span> <i title="Springer Berlin Heidelberg"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/2w3awgokqne6te4nvlofavy5a4" style="color: black;">Lecture Notes in Computer Science</a> </i> &nbsp;
This paper presents an experimental study conducted over the INEX 2008 Document Mining Challenge corpus using both the structure and the content of XML documents for clustering them.  ...  In spite of the large number of documents in the INEX 2008 Wikipedia dataset, the proposed frequent subtree-based clustering approach was successful in clustering the documents.  ...  A number of experiments were conducted on the Wikipedia corpus using the INEX XML Mining Challenge 2008 testing dataset.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/978-3-642-03761-0_45">doi:10.1007/978-3-642-03761-0_45</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/3jopojjmmnhapkqyixqezs75ie">fatcat:3jopojjmmnhapkqyixqezs75ie</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20170829082038/http://eprints.qut.edu.au/18216/1/c18216.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/e5/59/e55927e14c4a0a41d19d5462ba0b88fca47cf092.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/978-3-642-03761-0_45"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> springer.com </button> </a>

Report on INEX 2008

Gianluca Demartin, Gabriella Kazai, Marijn Koolen, Monica Landoni, Ragnar Nordlie, Nils Pharo, Ralf Schenkel, Martin Theobald, Andrew Trotman, Arjen P. de Vries, Alan Woodley, Ludovic Denoye (+8 others)
<span title="2009-06-25">2009</span> <i title="Association for Computing Machinery (ACM)"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/emlsu7gwyfalfms3c35odq5qlm" style="color: black;">SIGIR Forum</a> </i> &nbsp;
Track Investigating link discovery between Wikipedia documents, both at the file level and at the element level. • XML-Mining Track Investigating structured document mining, especially the classification  ...  This paper reports on the INEX 2008 evaluation campaign, which consisted of a wide range of tracks: Ad hoc, Book, Efficiency, Entity Ranking, Interactive, QA, Link the Wiki, and XML Mining. • Link-the-Wiki  ...  INEX 2008 used the Wikipedia XML Corpus based on the English Wikipedia in early 2006, containing a total of 659,338 Wikipedia articles [3]. On average an article contains 161 XML nodes.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/1670598.1670603">doi:10.1145/1670598.1670603</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/zoxbecrybrf63fg54g4kt7w7na">fatcat:zoxbecrybrf63fg54g4kt7w7na</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20170706004930/http://eprints.qut.edu.au/72708/3/72708.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/b2/6a/b26a9d539364a237818fbbcbe41934039c00013c.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/1670598.1670603"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="unlock alternate icon" style="background-color: #fb971f;"></i> acm.org </button> </a>

Overview of the INEX 2009 XML Mining Track: Clustering and Classification of XML Documents [chapter]

Richi Nayak, Christopher M. De Vries, Sangeetha Kutty, Shlomo Geva, Ludovic Denoyer, Patrick Gallinari
<span title="">2010</span> <i title="Springer Berlin Heidelberg"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/2w3awgokqne6te4nvlofavy5a4" style="color: black;">Lecture Notes in Computer Science</a> </i> &nbsp;
This report explains the objectives, datasets and evaluation criteria of both the clustering and classification tasks set in the INEX 2009 XML Mining track.  ...  The report also describes the approaches and results obtained by the different participants.  ...  Acknowledgments We would like to thank all the participants for their efforts and hard work. 6.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/978-3-642-14556-8_36">doi:10.1007/978-3-642-14556-8_36</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/2gyfgsqdfngpbeichjycrgmtau">fatcat:2gyfgsqdfngpbeichjycrgmtau</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20171108134705/https://core.ac.uk/download/pdf/10898572.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/15/86/158682cc980412fbe9790598218cfab0dffdb9e6.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/978-3-642-14556-8_36"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> springer.com </button> </a>

Components for information extraction

Daya C. Wimalasuriya, Dejing Dou
<span title="">2010</span> <i title="ACM Press"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/6g37zvjwwrhv3dizi6ffue642m" style="color: black;">Proceedings of the 19th ACM international conference on Information and knowledge management - CIKM &#39;10</a> </i> &nbsp;
are domain and corpus independent implementations of IE techniques.  ...  Here, we mean not only the reuse of the same IE technique in different situations but also the reuse of information related to the application of IE techniques (e.g., features used for classification).  ...  Arthur Farley of the Computer and Information Science Department of University of Oregon for their help in this work.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/1871437.1871444">doi:10.1145/1871437.1871444</a> <a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/cikm/WimalasuriyaD10.html">dblp:conf/cikm/WimalasuriyaD10</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/mv72pxgosrbdbjy75qpm727zxi">fatcat:mv72pxgosrbdbjy75qpm727zxi</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20110401190226/http://ix.cs.uoregon.edu/~dou/research/papers/cikm10.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/25/33/2533e6ef309f625f891d506d013af47715f8b95f.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/1871437.1871444"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> acm.org </button> </a>

The TopX DB&IR engine

Martin Theobald, Ralf Schenkel, Gerhard Weikum
<span title="">2007</span> <i title="ACM Press"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/vxrc3vebzzachiwy3nopwi3h5u" style="color: black;">Proceedings of the 2007 ACM SIGMOD international conference on Management of data - SIGMOD &#39;07</a> </i> &nbsp;
This paper proposes a demo of the TopX search engine, an extensive framework for unified indexing, querying, and ranking of large collections of unstructured, semistructured, and structured data.  ...  TopX integrates efficient algorithms for top-k-style ranked retrieval with powerful scoring models for text and XML documents, as well as dynamic and selftuning query expansion based on background ontologies  ...  For example, we have converted the IMDB movie database, a structured dataset, and the Wikipedia encyclopedia, a hyperlinked text corpus, into XML as test cases for ranked retrieval of XML data.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/1247480.1247635">doi:10.1145/1247480.1247635</a> <a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/sigmod/TheobaldSW07.html">dblp:conf/sigmod/TheobaldSW07</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/ehiibvqfarh3rkyilvcnmsjr3i">fatcat:ehiibvqfarh3rkyilvcnmsjr3i</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20170705073056/http://infolab.stanford.edu/~theobald/pub/sigmod07_topx.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/ed/3f/ed3fb3edb23645bfa89f1a3ad0d7227a933fa056.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/1247480.1247635"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> acm.org </button> </a>

Excavating the mother lode of human-generated text: A systematic review of research that uses the wikipedia corpus

Mohamad Mehdi, Chitu Okoli, Mostafa Mesgari, Finn Årup Nielsen, Arto Lanamäki
<span title="">2017</span> <i title="Elsevier BV"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/v5dch4enzne6phusiwdh25za24" style="color: black;">Information Processing &amp; Management</a> </i> &nbsp;
[113] exploited Wikipedia's categories structure to add semantics to XML data. They generated an XML corpus (YAWN) from a 2006 Wikipedia dump using a Wiki2XML that converts Wiki markup to XML.  ...  For testing purposes, the authors used a corpus of 100, 000 Wikipedia XML documents, their internal structure and the link information between documents.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1016/j.ipm.2016.07.003">doi:10.1016/j.ipm.2016.07.003</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/qgjeatizfzbyjkbo4rsuxea76y">fatcat:qgjeatizfzbyjkbo4rsuxea76y</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20190426105424/http://orbit.dtu.dk/files/127258892/WikilitCorpus_IP_M_Rev3.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/f9/5c/f95c2ae6a7b11f3e5db731aa527379eadc3b2477.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1016/j.ipm.2016.07.003"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> elsevier.com </button> </a>

Overview of the INEX 2011 Question Answering Track (QA@INEX) [chapter]

Eric SanJuan, Véronique Moriceau, Xavier Tannier, Patrice Bellot, Josiane Mothe
<span title="">2012</span> <i title="Springer Berlin Heidelberg"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/2w3awgokqne6te4nvlofavy5a4" style="color: black;">Lecture Notes in Computer Science</a> </i> &nbsp;
of the Wikipedia.  ...  The INEX QA track aimed to evaluate complex questionanswering tasks where answers are short texts generated from the Wikipedia by extraction of relevant short passages and aggregation into a coherent summary  ...  Like in recent FIR INEX tasks, the corpus is a clean XML extraction of the content of a dump from Wikipedia.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/978-3-642-35734-3_17">doi:10.1007/978-3-642-35734-3_17</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/ehcf5zqytzh4bdpkybkyftymfq">fatcat:ehcf5zqytzh4bdpkybkyftymfq</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20150328040408/http://www.irit.fr/publis/SIG/inexqa2011.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/80/9f/809ff74fe3a8d8e8dd21f4e3504ad3e012304d18.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/978-3-642-35734-3_17"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> springer.com </button> </a>
&laquo; Previous Showing results 1 &mdash; 15 out of 3,674 results