Filters








15,311 Hits in 7.2 sec

An Expert System for Quality Assurance of Document Image Collections [chapter]

Roman Graf, Reinhold Huber-Mörk, Alexander Schindler, Sven Schlarb
<span title="">2012</span> <i title="Springer Berlin Heidelberg"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/2w3awgokqne6te4nvlofavy5a4" style="color: black;">Lecture Notes in Computer Science</a> </i> &nbsp;
This paper presents an expert system that supports decision making for page duplicate detection in document image collections.  ...  Our goal is to create a reliable inference engine and a solid knowledge base from the output of an image processing tool that detects duplicates based on methods of computer vision.  ...  "Sequence detected" leverages the fact that duplicates mostly originate from automatically performed scan runs resulting in a sequence of documents.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/978-3-642-34234-9_25">doi:10.1007/978-3-642-34234-9_25</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/3bxk326bynervoasg3n3slgfzq">fatcat:3bxk326bynervoasg3n3slgfzq</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20170829065904/http://www.ifs.tuwien.ac.at/~schindler/pubs/EUROMED2012.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/3d/93/3d93b2c9a2d817bfc960300b6aa2072deb386d48.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/978-3-642-34234-9_25"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> springer.com </button> </a>

Document matching on CCITT Group 4 compressed images

Jonathan J. Hull, Luc M. Vincent, Jonathan J. Hull
<span title="1997-04-03">1997</span> <i title="SPIE"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/ttq7pbic7jboja6n4lpaxbsfuu" style="color: black;">Document Recognition IV</a> </i> &nbsp;
A method is proposed for detecting whether two CCITT group 4 images were scanned from the same document.  ...  ., they were scanned from the same document) if the Hausdorff measure finds that a specified number of features are located within a given distance of one another in both images.  ...  INTRODUCTION A useful function in a document image database system is the detection of whether a given image already exists in the database.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1117/12.270061">doi:10.1117/12.270061</a> <a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/drr/Hull97.html">dblp:conf/drr/Hull97</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/wlngakbls5d6rpaxclkwpbs7pm">fatcat:wlngakbls5d6rpaxclkwpbs7pm</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20160606093429/http://jonathanjhull.com/content/pubs/hull_spie97.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/e1/c4/e1c485e0303ca59bdbbf1cec42857b6ef2d70e3f.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1117/12.270061"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> Publisher / doi.org </button> </a>

String Distances for Near-duplicate Detection

Iulia Dănăilă, Liviu P. Dinu, Vlad Niculae, Octavia-Maria Șulea
<span title="2012-06-30">2012</span> <i title="Centro de Innovacion y Desarrollo Tecnologico en Computo"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/msuxglkxsfc65p2nc6skdnp74i" style="color: black;">POLIBITS Research Journal on Computer Science and Computer Engineering With Applications</a> </i> &nbsp;
disjoint set data structure, for the problem of near-duplicate detection.  ...  Near-duplicate detection is important when dealing with large, noisy databases in data mining tasks.  ...  ACKNOWLEDGEMENTS All authors contributed equally to the work presented in this paper. The research of Liviu P.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.17562/pb-45-3">doi:10.17562/pb-45-3</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/24lipm5aazh5jph6dq3fx2bkxu">fatcat:24lipm5aazh5jph6dq3fx2bkxu</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200325075258/https://www.redalyc.org/pdf/4026/402640459004.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/be/3e/be3eeb64ae009b4a027c03aba5147d0e42004b81.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.17562/pb-45-3"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> Publisher / doi.org </button> </a>

Detecting duplicates among symbolically compressed images in a large document database

Dar-Shyang Lee, Jonathan J. Hull
<span title="">2001</span> <i title="Elsevier BV"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/6r4znskbk5h2ngu345slqsm6eu" style="color: black;">Pattern Recognition Letters</a> </i> &nbsp;
The detection of duplicate images is a useful means of indexing a large database of documents.  ...  Experimental results show that it can recover better than 90% of the text in compressed document images and that this is sucient to identify duplicates in a large database. Ó (J.J.  ...  The performance of the conditional n-gram method for text string comparison was tested on the 979 documents in the University of Washington (UW) database (Phillips et al., 1993) .  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1016/s0167-8655(00)00115-x">doi:10.1016/s0167-8655(00)00115-x</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/zpitjv3j2nb4npgg4kn3g63riu">fatcat:zpitjv3j2nb4npgg4kn3g63riu</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20090106153557/http://rii.ricoh.com/~hull/pubs/lee_prl01.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/b2/43/b2436d03a31be7600278f251f997b77779d9d0e0.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1016/s0167-8655(00)00115-x"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> elsevier.com </button> </a>

Compound Method Based on Frequent Terms for Near Duplicate Documents Detection

Gaudence Uwamahoro, Zuping Zhang, Ambele Robert Mtafya, Jun Long
<span title="2014-12-31">2014</span> <i title="Science and Engineering Research Support Society"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/3ditg5diardybi4jjjavndg434" style="color: black;">International Journal of Database Theory and Application</a> </i> &nbsp;
Another method proposed for duplicate documents detection in [5] is Locality Sensitive Hashing Algorithm (LSH).  ...  A method based on conceptual tree has been proposed in [8] where each document is presented as a tree.  ...  Acknowledgements Project supported the National Natural Science Foundation of China (Grant No. 61379109, M1321007) and Research Fund for the Doctoral Program of Higher Education of China (Grant No. 20120162110077  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.14257/ijdta.2014.7.6.05">doi:10.14257/ijdta.2014.7.6.05</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/uxlbtgjqfzbbpdseuq6z5b73aq">fatcat:uxlbtgjqfzbbpdseuq6z5b73aq</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20180602075915/http://www.sersc.org/journals/IJDTA/vol7_no6/5.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/94/0f/940fbf74fcb87f0e7791df462cb60c9bde78bb43.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.14257/ijdta.2014.7.6.05"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> Publisher / doi.org </button> </a>

Source Retrieval for Plagiarism Detection

Šimon Suchomel, Michal Brandejs
<span title="">2015</span> <i title="Engineering and Technology Publishing"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/hfcgk5yv5beurknenrhk7lzef4" style="color: black;">Journal of Advances in Information Technology</a> </i> &nbsp;
Up to date systems for plagiarism detection are discussed from the source retrieval perspective. The key approaches of source retrieval are compared.  ...  Plagiarism has become a serious problem mainly because of the electronically available documents. An online document retrieval is a weighty part of a modern antiplagiarism tool.  ...  ACKNOWLEDGMENT The authors would like to thank to the Information System of Masaryk University for creating an opportunity to improve the plagiarism issue in Europe.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.12720/jait.6.1.18-26">doi:10.12720/jait.6.1.18-26</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/hhkvfizjyrdllfqxmnzhdsglfu">fatcat:hhkvfizjyrdllfqxmnzhdsglfu</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20170921234216/https://is.muni.cz/repo/1316211/20150512103428676.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/6b/79/6b7991a78dd7c941d34fa6a095d9c53d2a52e0d1.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.12720/jait.6.1.18-26"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="unlock alternate icon" style="background-color: #fb971f;"></i> Publisher / doi.org </button> </a>

Record Duplication Detection in Database: A Review

Saleh Rehiel Alenazi, Kamsuriah Ahmad
<span title="2016-12-25">2016</span> <i title="Insight Society"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/micztizddff3zji5xocudzzx64" style="color: black;">International Journal on Advanced Science, Engineering and Information Technology</a> </i> &nbsp;
Improving the efficiency in identifying duplicate records in databases is an essential step for data cleaning and data integration methods.  ...  Despite several techniques proposed to recognize and locate duplication of database records, there is a dearth of studies available which rate the effectiveness of the diverse techniques used for duplicate  ...  These two factors are very important assessment criteria for duplication detection techniques. Since detection methods work with huge databases, scanning them can consume considerable time.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.18517/ijaseit.6.6.1368">doi:10.18517/ijaseit.6.6.1368</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/kr4x3ynvpbbmho7b5nkqea2dbe">fatcat:kr4x3ynvpbbmho7b5nkqea2dbe</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20180720020708/http://www.insightsociety.org/ojaseit/index.php/ijaseit/article/download/1368/892" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/f8/2f/f82fd644719f3c8e139a7a001791963f98646b63.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.18517/ijaseit.6.6.1368"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="unlock alternate icon" style="background-color: #fb971f;"></i> Publisher / doi.org </button> </a>

Data Mining Model for the Data Retrieval from Central Server Configuration

Srivatsan Sridharan, Kausal Malladi, Yamini Muralitharan
<span title="2013-10-31">2013</span> <i title="Academy and Industry Research Collaboration Center (AIRCC)"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/plyx54ayrbefvje4gawexbdyty" style="color: black;">International Journal of Computer Science &amp; Information Technology (IJCSIT)</a> </i> &nbsp;
It also ensures elimination of duplicate document retrieval using unsupervised duplicate detection. The documents are ranked based on user feedback and given higher priority for retrieval.  ...  A server, which is to keep track of heavy document traffic, is unable to filter the documents that are most relevant and updated for continuous text search queries.  ...  Parvatham Assistant Professor, Sri SaiRam Engineering College for her continuous support in implementing this work.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.5121/ijcsit.2013.5514">doi:10.5121/ijcsit.2013.5514</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/imwhnsw5hndjvbvlkz6rx4gxfm">fatcat:imwhnsw5hndjvbvlkz6rx4gxfm</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20180601205703/http://www.airccse.org/journal/jcsit/5513ijcsit14.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/ba/7c/ba7c94b91edfb52e5327d18b4a1383791f182d62.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.5121/ijcsit.2013.5514"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="unlock alternate icon" style="background-color: #fb971f;"></i> Publisher / doi.org </button> </a>

Uncovering hidden duplicated content in public transcriptomics data

Marta Rosikiewicz, Aurélie Comte, Anne Niknejad, Marc Robinson-Rechavi, Frederic B. Bastian
<span title="2013-01-01">2013</span> <i title="Oxford University Press (OUP)"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/ejkvir3kzbdojf5kygupkidsca" style="color: black;">Database: The Journal of Biological Databases and Curation</a> </i> &nbsp;
Materials and methods Affymetrix data We have established a procedure to detect duplicated Affymetrix data.  ...  Moreover, as this practice is not documented, it is misleading for the construction of secondary databases such as Bgee based on the primary transcriptome data.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1093/database/bat010">doi:10.1093/database/bat010</a> <a target="_blank" rel="external noopener" href="https://www.ncbi.nlm.nih.gov/pubmed/23487185">pmid:23487185</a> <a target="_blank" rel="external noopener" href="https://pubmed.ncbi.nlm.nih.gov/PMC3595988/">pmcid:PMC3595988</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/wh6jny7hmvclhpspvbwzonkw7i">fatcat:wh6jny7hmvclhpspvbwzonkw7i</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20180728042631/https://watermark.silverchair.com/bat010.pdf?token=AQECAHi208BE49Ooan9kkhW_Ercy7Dm3ZL_9Cf3qfKAc485ysgAAAb4wggG6BgkqhkiG9w0BBwagggGrMIIBpwIBADCCAaAGCSqGSIb3DQEHATAeBglghkgBZQMEAS4wEQQMYpIorgE3vePB8iZ9AgEQgIIBcVpFaQmrA6SZUjsTfaibTvs0Ihv-tUn1ESn959fuC4dcHHJABYcH-7__bn479Crw4uk-KbMRUvzwccwnfQwS9boix6LYYONfgtZHJ2xaMgwgu5aadoLWPkYvmy8Tw0VJIZvKyzcZaK0i20UzWaN8JQT65RY72uYF41R8wwCH2oK9ijoKOWC7rQjJaoMH2IFjVCvayEyMWefuqurk0h_cFrI7Y3ObKo-C91wnmukSJke5zE4X8MOJCvlfST84SK_BpK6OMdQfigJiCLoMoBW0M-L6Vf-XQ3MTpFXQ69f6zmoNP_F_4T7rH3OMJ70DywN_uJX7i9-PnfBa094S82IViBSZv734Z78WorPBRUwX6YLWY3-yc_aoXqdqnBTa2ZX8i6OyZ6cVQNNM1t7i7UEJqqsINszH0vVI9yfLcGMLWPaiOUtwhDI94afiLPedJYii_I7M22IXEPoKfEGTissEWb29BpDNJAXzVy013fXh7JKLDg" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/c5/59/c559ed2b61b59303660f936567401560ce28d471.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1093/database/bat010"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="unlock alternate icon" style="background-color: #fb971f;"></i> oup.com </button> </a> <a target="_blank" rel="external noopener" href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3595988" title="pubmed link"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> pubmed.gov </button> </a>

The method for detecting plagiarism in a collection of documents

Natalya Shakhovska, Iryna Shvorob
<span title="">2015</span> <i title="IEEE"> 2015 Xth International Scientific and Technical Conference &#34;Computer Sciences and Information Technologies&#34; (CSIT) </i> &nbsp;
The development of the intelligent system for searching for plagiarism by combining two algorithms of searching fuzzy duplicate is considered in this article.  ...  The practical use of the algorithm makes it possible to improve the quality of the detection of plagiarism. Also, this algorithm can be used in different systems text search.  ...  Then identify duplicates in a particular class of documents, signatures using methods based on the analysis of special characters.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/stc-csit.2015.7325453">doi:10.1109/stc-csit.2015.7325453</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/wgznhlrgazfixhfqcz6xtao5l4">fatcat:wgznhlrgazfixhfqcz6xtao5l4</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20170830012506/http://www.acs.pollub.pl/pdf/v11n3/5.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/25/0f/250f4f547106be15548b6f63554356a58c55b89f.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/stc-csit.2015.7325453"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> ieee.com </button> </a>

Plagiarism Detection by using Karp-Rabin and String Matching Algorithm Together

Sonawane KiranShivaji, Prabhudeva S
<span title="2015-04-22">2015</span> <i title="Foundation of Computer Science"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/b637noqf3vhmhjevdfk3h5pdsu" style="color: black;">International Journal of Computer Applications</a> </i> &nbsp;
In this paper, our algorithm divides submitted articles in small pieces and scans it to compare with connected databases to the server on internet.  ...  for markers.  ...  Computational cost of DR and PD is very high due to capacity of huge document scanning at a once since it is useful for large database [2].  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.5120/20294-2734">doi:10.5120/20294-2734</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/ekbbo2g2ffarvk64lzvnnixzvm">fatcat:ekbbo2g2ffarvk64lzvnnixzvm</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20180719033747/https://research.ijcaonline.org/volume115/number23/pxc3902734.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/8d/27/8d27716723f9e2596ccadb8d419ce0b7ab8c11f3.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.5120/20294-2734"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> Publisher / doi.org </button> </a>

A Near-Duplicate Detection Algorithm to Facilitate Document Clustering

Lavanya Pamulaparty, Guru Rao C.V, Sreenivasa Rao M
<span title="2014-11-30">2014</span> <i title="Academy and Industry Research Collaboration Center (AIRCC)"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/g4vvv6juibahrh75gvtebymkwq" style="color: black;">International Journal of Data Mining &amp; Knowledge Management Process</a> </i> &nbsp;
This paper concerns detecting, and optionally removing duplicate and near duplicate documents which are used to perform clustering of documents .We demonstrated our approach in web news articles domain  ...  Web Ming faces huge problems due to Duplicate and Near Duplicate Web pages. Detecting Near Duplicates is very difficult in large collection of data like "internet".  ...  The first algorithms for detecting near-duplicate documents with a reduced number of comparisons were proposed by Manber [25] and Heintze [17] .  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.5121/ijdkp.2014.4604">doi:10.5121/ijdkp.2014.4604</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/66jmy6xqhrbqdistbyawfrfo6u">fatcat:66jmy6xqhrbqdistbyawfrfo6u</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20180602115527/http://www.aircconline.com/ijdkp/V4N6/4614ijdkp04.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/e3/4f/e34f2d946548ed67d20ff03b102388cd1272e116.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.5121/ijdkp.2014.4604"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="unlock alternate icon" style="background-color: #fb971f;"></i> Publisher / doi.org </button> </a>

Identifying duplicate content using statistically improbable phrases

M. Errami, Z. Sun, A. C. George, T. C. Long, M. A. Skinner, J. D. Wren, H. R. Garner
<span title="2010-05-13">2010</span> <i title="Oxford University Press (OUP)"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/wmo54ba2jnemdingjj4fl3736a" style="color: black;">Bioinformatics</a> </i> &nbsp;
However, the CPU-intensive nature of document comparison has limited MEDLINE text similarity studies to the comparison of abstracts, which constitute only a small fraction of a publication's total text  ...  We have derived a method of analyzing statistically improbable phrases (SIPs) for assistance in identifying duplicate content.  ...  ACKNOWLEDGEMENTS We thank Wayne Fisher for helpful comments and discussion and Linda Gunn for administrative assistance. Conflict of Interest: none declared.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1093/bioinformatics/btq146">doi:10.1093/bioinformatics/btq146</a> <a target="_blank" rel="external noopener" href="https://www.ncbi.nlm.nih.gov/pubmed/20472545">pmid:20472545</a> <a target="_blank" rel="external noopener" href="https://pubmed.ncbi.nlm.nih.gov/PMC2872002/">pmcid:PMC2872002</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/6l53dj5exrce5pemgjnuxttkx4">fatcat:6l53dj5exrce5pemgjnuxttkx4</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20180728150102/https://watermark.silverchair.com/btq146.pdf?token=AQECAHi208BE49Ooan9kkhW_Ercy7Dm3ZL_9Cf3qfKAc485ysgAAAcAwggG8BgkqhkiG9w0BBwagggGtMIIBqQIBADCCAaIGCSqGSIb3DQEHATAeBglghkgBZQMEAS4wEQQMlZTQwgzCT6DZaRnaAgEQgIIBc5GKLokSBs1jlTQENQJUtI3VhtzKiX_qmSHhfCIyj_E59c4IWuhJK9A6uMJAYoN_tYJ74dhdixwCWHucQLyh3kUrnygxvv3AbHM6Yi3cWnam_yuy4Am-y6ZMUzH32yHK8QWzi28pkvyGkVBaeq0e18Qi6zp9tgabJGJERYxA4tEQVBZcglb4h6LBmkKcXIXeujbzhEts2yg2u7SV9pJ9GtPiP5WCaktspMVemnk1GjZtfv9hGcRyKFvCTM0_jh31RXC9IQR4De_0rNKLSsJiIcu5EC9JMggsu24TROibKL42P6hOpuAhpqqbEtkEhStDxvFgCb7s_3t7Do6zUqnvtBynR_Yi3dUPzR9Lljl6t2thCSlopO5--hweIaYTSiIaIoKyoR7oAA4jK89_4q1xRNrJD7rSI_Q0qOtEWxBD2vhmm2sxNJGYgOrgCRtc0Ok4X-kRG1aR228w-Y975JEM49_gpZr8-Xu6pMwhnAk_RPrd5b82" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/99/2d/992d68a1bdfb01bed26e3211d22d6f7170902972.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1093/bioinformatics/btq146"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> oup.com </button> </a> <a target="_blank" rel="external noopener" href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2872002" title="pubmed link"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> pubmed.gov </button> </a>

A Pragmatic Review of Data Cleansing models and using Elastic Search shards for Removing Duplicate data

Subhani Shaik
<span title="2018-04-30">2018</span> <i title="International Journal for Research in Applied Science and Engineering Technology (IJRASET)"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/hsp44774azcezeyiq4kuzpfh5a" style="color: black;">International Journal for Research in Applied Science and Engineering Technology</a> </i> &nbsp;
Duplication of the information may lead to confusion, accidental deletion of the authentic information, loss of time, etc.  ...  The quality of the data is significant issue for the accomplishment of any development.  ...  In this Paper we have offered a wide-ranging review of the existing techniques used for detecting non identical duplicate entries in database records.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.22214/ijraset.2018.4566">doi:10.22214/ijraset.2018.4566</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/jclrmnv4znc5vdcmhiavi2yqoi">fatcat:jclrmnv4znc5vdcmhiavi2yqoi</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200215085836/http://ijraset.com/fileserve.php?FID=16613" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/d1/af/d1afa98b0c7dadce53f05b1978fa94d67d47fba5.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.22214/ijraset.2018.4566"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="unlock alternate icon" style="background-color: #fb971f;"></i> Publisher / doi.org </button> </a>

Full-privacy secured search engine empowered by efficient genome-mapping algorithms [article]

Yuan-Yu Chang, Sheng-Tang Wong, Emmanuel O Salawu, Yu-Xuan Wang, Jui-Hung Hung, Lee-Wei Yang
<span title="2022-04-25">2022</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Since the 90s, keyword-based search engines have been helping people locate relevant web content via a simple query, so have the recent full-text-based search engines mainly used for plagiarism detection  ...  ., functioning in both regular and in-private search modes, provides a new option for efficient internet search and plagiarism detection in a compressed search space without a chance of storing and revealing  ...  Part of this work was financially supported by the National Tsing Hua University, Taiwan (10020B002).  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2201.00696v2">arXiv:2201.00696v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/lexisazibzharol25mo7sq3zqe">fatcat:lexisazibzharol25mo7sq3zqe</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20220426015524/https://arxiv.org/ftp/arxiv/papers/2201/2201.00696.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/2e/25/2e25ff1d8b3f5585c10afbe05ab532be83f5a1bf.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2201.00696v2" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>
&laquo; Previous Showing results 1 &mdash; 15 out of 15,311 results