Filters








15,933 Hits in 6.7 sec

Extracting data records from the web using tag path clustering

Gengxin Miao, Junichi Tatemura, Wang-Pin Hsiung, Arsany Sawires, Louise E. Moser
<span title="">2009</span> <i title="ACM Press"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/s4hirppq3jalbopssw22crbwwa" style="color: black;">Proceedings of the 18th international conference on World wide web - WWW &#39;09</a> </i> &nbsp;
Clustering of tag paths is then performed based on this similarity measure, and sets of tag paths that form the structure of data records are extracted.  ...  Fully automatic methods that extract lists of objects from the Web have been studied extensively.  ...  We apply clustering of tag paths based on this similarity measure, and extract sets of tag paths that form the structure of the data records.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/1526709.1526841">doi:10.1145/1526709.1526841</a> <a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/www/MiaoTHSM09.html">dblp:conf/www/MiaoTHSM09</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/7xr44fj3grfjpdqlvw7wcoaih4">fatcat:7xr44fj3grfjpdqlvw7wcoaih4</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20170809081223/http://www2009.wwwconference.org/proceedings/pdf/p981.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/87/cf/87cfd3ac19dfba177e03fdf1f3cf93a4878dbca1.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/1526709.1526841"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> acm.org </button> </a>

Extracting data records from web using suffix tree

Xiaoqin Xie, Yixiang Fang, Zhiqiang Zhang, Li Li
<span title="">2012</span> <i title="ACM Press"> Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics - MDS &#39;12 </i> &nbsp;
After the refining processes we can capture the useful data region patterns which can be used to extract data records.  ...  Our method transfers a distinct group of tag paths appearing repeatedly in the DOM tree of the Web document to a sequence of integers firstly, and then builds a suffix tree by using this sequence.  ...  Miao et al. view the web page as a string of HTML tags. TPC clusters these tag paths based on this similarity measure and extracts sets of tag paths that form the structure of the data records.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/2350190.2350202">doi:10.1145/2350190.2350202</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/dxqeahnj4venrdly5fr2evgvna">fatcat:dxqeahnj4venrdly5fr2evgvna</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20170830054830/http://wan.poly.edu/KDD2012/forms/workshop/MDS12/doc/mds2012_submission_12.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/1f/86/1f8682ea5816063ba7a52ee0c5fdf2a0fac2f026.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/2350190.2350202"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> acm.org </button> </a>

Mining templates from search result records of search engines

Hongkun Zhao, Weiyi Meng, Clement Yu
<span title="">2007</span> <i title="ACM Press"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/fqqihtxlu5bvfaqxjyvqcob35a" style="color: black;">Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD &#39;07</a> </i> &nbsp;
Precisely identifying this template can greatly help extract and annotate the data units within each record correctly.  ...  Metasearch engine, Comparison-shopping and Deep Web crawling applications need to extract search result records enwrapped in result pages returned from search engines in response to user queries.  ...  Web information extraction can be at the record level or data unit level.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/1281192.1281286">doi:10.1145/1281192.1281286</a> <a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/kdd/ZhaoMY07.html">dblp:conf/kdd/ZhaoMY07</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/gxk42v6dsrcxbmvaify2juo7pq">fatcat:gxk42v6dsrcxbmvaify2juo7pq</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20170706054237/http://www.cs.binghamton.edu/~meng/pub.d/frp551-kdd-zhao.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/27/5b/275bc0f67fc89495ae2e3a7be3ceda770c63f756.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/1281192.1281286"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> acm.org </button> </a>

Top K List Extraction from Web Pages

Priyanka Deshmane, Pramod Patil, Abha Pathak
<span title="2016-09-15">2016</span> <i title="Foundation of Computer Science"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/b637noqf3vhmhjevdfk3h5pdsu" style="color: black;">International Journal of Computer Applications</a> </i> &nbsp;
The paper provides solution to problem by extracting information from top-k websites, which consist top k instances of a subject. For Examples"top 5 football teams in the world".  ...  Proposed system in paper extract the top k list by using title classifier, parser ,candidate picker , ranker, content processor .  ...  Comparison of similar systems 1 Extracting Extracting data Proposed General list records using tag approach Hybrid path clustering approach Working/Al HyLiEn Tag path Tag path gorithm (Hybrid clustering  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.5120/ijca2016911394">doi:10.5120/ijca2016911394</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/2ss2gtrqlrh43gfkiogankvrhm">fatcat:2ss2gtrqlrh43gfkiogankvrhm</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20180603022116/https://www.ijcaonline.org/archives/volume149/number5/deshmane-2016-ijca-911394.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/32/24/322424993a58c09723d93ce818fbc80f80e05fa8.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.5120/ijca2016911394"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> Publisher / doi.org </button> </a>

Design and development of text extraction and retrieval using style of documents in web searching

S. Balan, P. Ponmuthuramalingam
<span title="2017-12-28">2017</span> <i title="Science Publishing Corporation"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/piy2nrvrjrfcfoz5nmre6zwa4i" style="color: black;">International Journal of Engineering &amp; Technology</a> </i> &nbsp;
Query Result Records (QRR's) is used to extract the text information from the different type of documents.  ...  This research focuses on study and extraction of web pages and documents are returned from goggle search engine. The useful task of web is to exactly match the accurate information.  ...  The amount of encouragement received especially from my friends requires a special mention. I record my deep indebtedness to them for their support.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.14419/ijet.v7i1.2.9038">doi:10.14419/ijet.v7i1.2.9038</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/eidt6qv3onhqpfmz2aefzg3uia">fatcat:eidt6qv3onhqpfmz2aefzg3uia</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20180721013030/https://www.sciencepubco.com/index.php/ijet/article/download/9038/3082" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/6c/52/6c524b0e5080ff83ad962c96cc338bddd2690d74.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.14419/ijet.v7i1.2.9038"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="unlock alternate icon" style="background-color: #fb971f;"></i> Publisher / doi.org </button> </a>

An Automatic Annotation Technique for Web Search Results

Rosamma KS, Jiby J Puthiyidam
<span title="2015-06-20">2015</span> <i title="Foundation of Computer Science"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/b637noqf3vhmhjevdfk3h5pdsu" style="color: black;">International Journal of Computer Applications</a> </i> &nbsp;
Every web page generated contains many results to display for particular query, called as Search Result Records (SRRs). Sometimes it becomes troublesome to extract relevant data from diverse sources.  ...  A web search engine takes the query request from the end user and executes that query on relational database used to store the information on behalf of that web search engine.  ...  Web data extraction based on partial tree alignment [5] studies the problem of extracting data record, then segments these records, and finally put them into a database table.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.5120/21383-4375">doi:10.5120/21383-4375</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/7hbigf53sndlddkotjgfyqxzba">fatcat:7hbigf53sndlddkotjgfyqxzba</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20170922212535/http://research.ijcaonline.org/volume119/number24/pxc3904375.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/8f/5d/8f5d19762627b12e5700e301beba4b7a21f77c3d.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.5120/21383-4375"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> Publisher / doi.org </button> </a>

Efficient record-level wrapper induction

Shuyi Zheng, Ruihua Song, Ji-Rong Wen, C. Lee Giles
<span title="">2009</span> <i title="ACM Press"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/6g37zvjwwrhv3dizi6ffue642m" style="color: black;">Proceeding of the 18th ACM conference on Information and knowledge management - CIKM &#39;09</a> </i> &nbsp;
However, most traditional wrapper techniques have issues dealing with web records since they are designed to extract information from a page, not a record. We propose a record-level wrapper system.  ...  Web information is often presented in the form of record, e.g., a product record on a shopping website or a personal profile on a social utility website.  ...  will be used to perform data extraction from the DOM-tree.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/1645953.1645962">doi:10.1145/1645953.1645962</a> <a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/cikm/ZhengSWG09.html">dblp:conf/cikm/ZhengSWG09</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/pudk2mafurgehcb2z24caaiphe">fatcat:pudk2mafurgehcb2z24caaiphe</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20110331173149/http://clgiles.ist.psu.edu/pubs/CIKM2009-wrapper-induction.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/22/ec/22eca7b8a3ab6ea577790eefbefa1e0f3da9e2bb.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/1645953.1645962"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> acm.org </button> </a>

Clustering Visually Similar Web Page Elements for Structured Web Data Extraction [chapter]

Tomas Grigalis, Lukas Radvilavičius, Antanas Čenys, Juozas Gordevičius
<span title="">2012</span> <i title="Springer Berlin Heidelberg"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/2w3awgokqne6te4nvlofavy5a4" style="color: black;">Lecture Notes in Computer Science</a> </i> &nbsp;
Clusters are then used to derive extraction rules.  ...  We propose a novel approach for extraction of structured web data called ClustVX. It clusters visually similar web page elements by exploiting their visual formatting and structural features.  ...  This also means, that each DR has almost the same Xpath (tag path from root node in HTML tree to particular web page element), where only a few node numbers differs.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/978-3-642-31753-8_38">doi:10.1007/978-3-642-31753-8_38</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/v5inxjfvyfe6xghcwfkcouxaxa">fatcat:v5inxjfvyfe6xghcwfkcouxaxa</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20180726144532/https://link.springer.com/content/pdf/10.1007%2F978-3-642-31753-8_38.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/80/32/80328aad0112379fc95d60d6281041866eae0018.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/978-3-642-31753-8_38"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> springer.com </button> </a>

Adaptive and Optimization of Personalized Information Retrieval Model in Semantic Web

<span title="2019-11-02">2019</span> <i title="Blue Eyes Intelligence Engineering and Sciences Engineering and Sciences Publication - BEIESP"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/3sfifsouvjgadp4gfj54u3z2ku" style="color: black;">International journal of recent technology and engineering</a> </i> &nbsp;
To understand the user interest patterns, the web access log files are extracted that depicts the user behavior.  ...  The semantic information retrieval supported the user access pages are preprocessed and the web log data of the particular user is analyzed to identify the user profile.  ...  Web mining is an information extraction system to analyze and obtain valuable information from web data.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.35940/ijrte.b1131.0982s1119">doi:10.35940/ijrte.b1131.0982s1119</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/lwwayem545hyzhespbmvh7tzce">fatcat:lwwayem545hyzhespbmvh7tzce</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200215054113/https://www.ijrte.org/wp-content/uploads/papers/v8i2S11/B11310982S1119.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/51/b5/51b5a98e5eaafd2c49482f72b32b52b4cc04176d.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.35940/ijrte.b1131.0982s1119"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="unlock alternate icon" style="background-color: #fb971f;"></i> Publisher / doi.org </button> </a>

A Travel Planning System Based on Travel Trajectories Extracted from a Large Number of Geotagged Photos on the Web [chapter]

Kohya Okuyama, Keiji Yanai
<span title="2012-08-03">2012</span> <i title="Springer New York"> The Era of Interactive Media </i> &nbsp;
We propose a travel route recommendation system which utilizes actual travel paths extracted from a large number of photos uploaded by many people on the Web.  ...  Some image retrieval systems and travel recommendation systems which make use of geotagged images on the Web have been proposed so far.  ...  To generate travel paths, they used all the places extracted from geotagged photos.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/978-1-4614-3501-3_54">doi:10.1007/978-1-4614-3501-3_54</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/zvw7fr3ogbeblitspvs4wp4er4">fatcat:zvw7fr3ogbeblitspvs4wp4er4</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20170808184907/http://img.cs.uec.ac.jp/e/pub/conf11/111220yanai_2.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/ab/6e/ab6e62cc0fdcd394dd3e31ecbfdbbb42ec8dbcf0.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/978-1-4614-3501-3_54"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> springer.com </button> </a>

AMBER

Cheng Wang
<span title="">2012</span> <i title="ACM Press"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/s4hirppq3jalbopssw22crbwwa" style="color: black;">Proceedings of the 21st international conference companion on World Wide Web - WWW &#39;12 Companion</a> </i> &nbsp;
Web extraction is the task of turning unstructured HTML into knowledge.  ...  Unfortunately, the current systems extracting knowledge from result pages lack accuracy.  ...  Acknowledgments The research leading to these results has received funding from the European Research  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/2187980.2188007">doi:10.1145/2187980.2188007</a> <a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/www/Wang12.html">dblp:conf/www/Wang12</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/rbd4hdhzkbeitbqcvos62n5ip4">fatcat:rbd4hdhzkbeitbqcvos62n5ip4</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20120528123602/http://www2012.org/proceedings/companion/p191.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/50/01/500152dd2c4180ebd94f9ef717b2bdca264a3d92.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/2187980.2188007"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> acm.org </button> </a>

Automatic Data Extraction from Template-Generated Web Pages

Shao-Hua YANG
<span title="2008-07-09">2008</span> <i title="China Science Publishing &amp; Media Ltd."> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/zwqx233xnvggdgdmdkqawwikpy" style="color: black;">Journal of Software (Chinese)</a> </i> &nbsp;
A template detection approach is presented and the detected templates are used to extract data from instance pages.  ...  A substantial fraction of the Web consists of pages that are dynamically generated using a common template populated with data from databases, such as product description pages on e-commerce sites.  ...  Table 1 shows the extracted data from detail pages in Fig.1 . There are many challenges in automatically extracting data from template-generated web pages.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.3724/sp.j.1001.2008.00209">doi:10.3724/sp.j.1001.2008.00209</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/jxxhacmgcbezhoqjm3hkl4sbyq">fatcat:jxxhacmgcbezhoqjm3hkl4sbyq</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20170706035207/http://osm.cs.byu.edu/CS652s09/papers/Yang08.DataExtrFromTemplate-generated.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/b7/64/b7647b68b7d352acd076b4459dd2940fe822a763.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.3724/sp.j.1001.2008.00209"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> Publisher / doi.org </button> </a>

Mining data records in Web pages

Bing Liu, Robert Grossman, Yanhong Zhai
<span title="">2003</span> <i title="ACM Press"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/fqqihtxlu5bvfaqxjyvqcob35a" style="color: black;">Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining - KDD &#39;03</a> </i> &nbsp;
It is useful to mine such data records in order to extract information from them to provide value-added services.  ...  A large amount of information on the Web is contained in regularly structured objects, which we call data records.  ...  Acknowledgement: We thank Chris Livadas for identifying some errors in the original pseudo-code.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/956750.956826">doi:10.1145/956750.956826</a> <a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/kdd/LiuGZ03.html">dblp:conf/kdd/LiuGZ03</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/aacxpche3vaztmee3mlxjyyhz4">fatcat:aacxpche3vaztmee3mlxjyyhz4</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20170808032357/https://www.cs.uic.edu/~liub/publications/KDD-03-techReport.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/f2/c7/f2c7324c1931d60ae6b9dface7f5254113f0bd35.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/956750.956826"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> acm.org </button> </a>

Mining data records in Web pages

Bing Liu, Robert Grossman, Yanhong Zhai
<span title="">2003</span> <i title="ACM Press"> Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining - KDD &#39;03 </i> &nbsp;
It is useful to mine such data records in order to extract information from them to provide value-added services.  ...  A large amount of information on the Web is contained in regularly structured objects, which we call data records.  ...  Acknowledgement: We thank Chris Livadas for identifying some errors in the original pseudo-code.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/956804.956826">doi:10.1145/956804.956826</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/cv5l6nk3vjhevoorg7hp2y2txu">fatcat:cv5l6nk3vjhevoorg7hp2y2txu</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20170808032357/https://www.cs.uic.edu/~liub/publications/KDD-03-techReport.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/f2/c7/f2c7324c1931d60ae6b9dface7f5254113f0bd35.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/956804.956826"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> acm.org </button> </a>

Exploiting Multi-Category Characteristics and Unified Framework to Extract Web Content

Jingwei Zhang, Qian Wang, Qing Yang, Rui Zhou, Yanchun Zhang
<span title="">2018</span> <i title="Springer Nature"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/4pfqfq76vvcxljfl7mvdl37n5q" style="color: black;">Data Science and Engineering</a> </i> &nbsp;
Extracting web content is to obtain the required data embedded in web pages, usually including structured records, such as product information, and text content, such as news.  ...  Web pages use a large number of HTML tags to organize and to present various information.  ...  [9] defined tag path edit distance and tag path ratios to extract news from web pages.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/s41019-018-0067-3">doi:10.1007/s41019-018-0067-3</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/vmlxpckmo5ailin4rpwys6w34u">fatcat:vmlxpckmo5ailin4rpwys6w34u</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20180729190035/https://link.springer.com/content/pdf/10.1007%2Fs41019-018-0067-3.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/8c/ff/8cff7e93b8f62319f88e5625ed59e953c8ca9bc5.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/s41019-018-0067-3"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="unlock alternate icon" style="background-color: #fb971f;"></i> springer.com </button> </a>
&laquo; Previous Showing results 1 &mdash; 15 out of 15,933 results