Filters








4 Hits in 0.72 sec

OXPath-Based Data Acquisition for dblp

Christopher Michels, Ruslan R. Fayzrakhmanov, Michael Ley, Emanuel Sallinger, Ralf Schenkel
<span title="">2017</span> <i title="IEEE"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/kw2apmx5ynfyjf6jhs5gzrrx6e" style="color: black;">2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL)</a> </i> &nbsp;
We demonstrate how the contemporary problems of data acquisition for dblp can be tackled with OXPath.  ...  It enables web data extraction and wrapper maintenance for heterogeneous data sources on a simple declarative level.  ...  However, most of them se le for electronic catalogs designed for human consumption. us, data acquisition requires simulated user interaction with sophisticated interfaces to query web documents. e increasing  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/jcdl.2017.7991609">doi:10.1109/jcdl.2017.7991609</a> <a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/jcdl/MichelsFLSS17.html">dblp:conf/jcdl/MichelsFLSS17</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/q4zaqyad4nbbpdr6ide2lp5czu">fatcat:q4zaqyad4nbbpdr6ide2lp5czu</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20201108071603/https://ora.ox.ac.uk/objects/uuid:6336e533-4fce-4a06-861f-3d051cf7a3f7/download_file?safe_filename=dblpOxpathPaper.pdf&amp;file_format=application%2Fpdf&amp;type_of_work=Conference+item" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/11/e9/11e9a8db8fc2eff6ee930558108c46e59ebc1a5d.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/jcdl.2017.7991609"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> ieee.com </button> </a>

Enriching Existing Test Collections with OXPath [chapter]

Philipp Schaer, Mandy Neumann
<span title="">2017</span> <i title="Springer International Publishing"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/2w3awgokqne6te4nvlofavy5a4" style="color: black;">Lecture Notes in Computer Science</a> </i> &nbsp;
We present a light-weight alternative that employs the web data extraction language OXPath to harvest data to be added to an existing test collection from web resources.  ...  This allows the re-use of this collection for other evaluation purposes like bibliometrics-enhanced retrieval.  ...  OXPath can be used e.g. for harvesting bibliographic metadata for digital libraries like dblp, as presented by Michels et al. [7] .  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/978-3-319-65813-1_16">doi:10.1007/978-3-319-65813-1_16</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/ureylxntjreblpb2ty52ifqzt4">fatcat:ureylxntjreblpb2ty52ifqzt4</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200906180404/https://arxiv.org/pdf/1706.06836v1.pdf" title="fulltext PDF download [not primary version]" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <span style="color: #f43e3e;">&#10033;</span> <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/df/3d/df3d4581daa5cbb1da07a75aa7d1102e4f92120c.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/978-3-319-65813-1_16"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> springer.com </button> </a>

Web-Scraping for Non-Programmers: Introducing OXPath for Digital Library Metadata Harvesting

Mandy Neumann, Jan Steinberg, Philipp Schaer
<span title="">2017</span> <i title="Code4Lib"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/btrf225d6fdfdnvguyv3cx6vqu" style="color: black;">Code4Lib Journal</a> </i> &nbsp;
By taking one of our own use cases as an example, we guide you in more detail through the process of creating an OXPath wrapper for metadata harvesting.  ...  We present the open-source tool OXPath, an extension of XPath, that allows the user to define data to be extracted from websites in a declarative way.  ...  Oxpath-based data acquisition for dblp. In: JCDL ’17: Proceedings of the 17th ACM/IEEE-CS on Joint Conference on Digital Libraries; 2017 June 19-23; New York, NY, USA. ACM. p. 319-320.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://doaj.org/article/a9fc5108b6b74633a1a2885f2fdbedad">doaj:a9fc5108b6b74633a1a2885f2fdbedad</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/nz6wjaxv3bhured3dl5wfnogme">fatcat:nz6wjaxv3bhured3dl5wfnogme</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20220116154049/https://journal.code4lib.org/articles/13007" title="fulltext access" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [HTML] </button> </a>

Introduction to OXPath [article]

Ruslan R. Fayzrakhmanov, Christopher Michels, Mandy Neumann
<span title="2018-06-28">2018</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
From the automatic data acquisition point of view, thus, it is essential to be able to correctly render web pages and mimic user actions to obtain relevant data from the web page content.  ...  OXPath integrates Firefox for correct rendering of web pages and extends XPath 1.0 for the DOM node selection, interaction, and extraction.  ...  pos=1") 2 //div[@id="search"]/form[1]/field()[1]/{"Very Large Data Bases (VLDB)"} This expression inputs "Very Large Data Bases (VLDB)" into the search field.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1806.10899v1">arXiv:1806.10899v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/ol7lbsd3zzfgpgdmgy2koa2wde">fatcat:ol7lbsd3zzfgpgdmgy2koa2wde</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200911070900/https://arxiv.org/pdf/1806.10899v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/d2/7e/d27ebd3f4a5b9a3ff86bdf4cb11400264545a9f2.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1806.10899v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>