20,158 Hits in 4.7 sec

Mining Historical Documents for Near-Duplicate Figures

Thanawin Rakthanmanon, Qiang Zhu, Eamonn J. Keogh
2011 2011 IEEE 11th International Conference on Data Mining  
While querying/indexing systems can undoubtedly be useful, we believe that the historical manuscript domain is finally ripe for true unsupervised discovery of patterns and regularities.  ...  This fact has driven much of the recent increase in interest in query-by-content systems for images.  ...  ACKNOWLEDGEMENTS We would like to acknowledge the financial support for our research provided by the Royal Thai Government and NSF grants 0803410 and 0808770.  ... 
doi:10.1109/icdm.2011.102 dblp:conf/icdm/RakthanmanonZK11 fatcat:wkwzoiw2rfexzo742jjzulusda

Mining future spatiotemporal events and their sentiment from online news articles for location-aware recommendation system

Shen-Shyang Ho, Mike Lieberman, Pu Wang, Hanan Samet
2012 Proceedings of the First ACM SIGSPATIAL International Workshop on Mobile Geographic Information Systems - MobiGIS '12  
In this paper, we describe a systematic approach for mining future spatiotemporal events from web; in particular, news articles.  ...  In the matching step, we perform spatiotemporal disambiguation, de-duplication, and pairing.  ...  Figure 2 : 2 An overview of mining spatiotemporal future event from a news article. Figure 3 : 3 Number of extracted unique near past and future temporal patterns over 14 days of news documents.  ... 
doi:10.1145/2442810.2442816 dblp:conf/gis/HoLWS12 fatcat:2h3cyaa5gvdnvfg4rdokiudv7e

News Topic Tracking and Re-ranking with Query Expansion Based on Near-Duplicate Detection [chapter]

Xiaomeng Wu, Ichiro Ide, Shin'ichi Satoh
2009 Lecture Notes in Computer Science  
This paper proposes a novel scheme of mining topic-related stories through a query-expansion algorithm on the basis of near duplicates built on top of text.  ...  Experiments showed that the queryexpansion algorithm based on near-duplicate constraints outperformed traditional methods that only use textual features.  ...  [2] presented a technique for mining and tracking the repeated sequence of shots based only on near duplicate detection.  ... 
doi:10.1007/978-3-642-10467-1_66 fatcat:bciuylu6nzblzm56nch7iq67fu

Efficiently Finding Near Duplicate Figures in Archives of Historical Documents

Thanawin Rakthanmanon, Qiang Zhu, Eamonn J. Keogh
2012 Journal of Multimedia  
While querying/indexing systems can undoubtedly be useful, we believe that the historical manuscript domain is finally ripe for true unsupervised discovery of patterns and regularities.  ...  This fact has driven much of the recent increase in interest in query-by-content systems for images.  ...  ACKNOWLEDGEMENTS We would like to acknowledge the financial support for our research provided by the Royal Thai Government and NSF grants 0803410 and 0808770.  ... 
doi:10.4304/jmm.7.2.109-123 fatcat:s5ppjnprhfdadenh2jiwovyjrm

Applications of Data Mining Techniques in Software Engineering

A. R. Pon Periasamy, A. Mishbahulhuda
2017 International Journal of Advanced Research in Computer Science and Software Engineering  
By uncovering hidden patterns using data mining software engineering data is made actionable. There are various goals in software engineering such as optimization, documentation, cost estimation etc.  ...  There are numerous types of data available in software engineering such as graphs, text, facts and figures.  ...  The types of documents(html, portable document format, text etc) available in large variety and another important sources are the multimedia data(audio, video figures) [3] . C.  ... 
doi:10.23956/ijarcsse/v7i3/0174 fatcat:smhdq7wbbrhi3mrlfzlwmnox7m

Detection of Text Reuse in French Medical Corpora

Eva D'hondt, Cyril Grouin, Aurélie Névéol, Efstathios Stamatatos, Pierre Zweigenbaum
2016 International Conference on Computational Linguistics  
the digitization of historical paper records.  ...  Herein, we address the detection of two types of text reuse in French EHRs: 1) the detection of updated versions of the same document and 2) the detection of document duplicates that still bear surface  ...  Acknowledgements This work was supported by the French National Agency for Research under the Accordys 9 ANR-12-CORD-0007, and CABeRneT 10 ANR-13-JS02-0009-01 grants.  ... 
dblp:conf/coling/DhondtGNSZ16 fatcat:k7wcuefbqrc7tkyznu47og3nmq

PageRank with Text Similarity and Video Near-Duplicate Constraints for News Story Re-ranking [chapter]

Xiaomeng Wu, Ichiro Ide, Shin'ichi Satoh
2010 Lecture Notes in Computer Science  
This paper focuses on news story retrieval and re-ranking, and offers a new perspective through the exploration of the pair-wise constraints derived from video near-duplicates for constraint-driven reranking  ...  Pseudo-relevance feedback is a popular and widely accepted query reformulation strategy for document retrieval and re-ranking.  ...  The simplicity and effectiveness of PageRank for text mining were demonstrated through document summarization studies [12, 14] , which suggested combining PageRank with text similarity.  ... 
doi:10.1007/978-3-642-11301-7_53 fatcat:zjdy5rrqpvfzleatin6jmxde5e

SocialSpamGuard: A Data Mining-Based Spam Detection System for Social Media Networks

Xin Jin, Cindy Xide Lin, Jiebo Luo, Jiawei Han
2011 Proceedings of the VLDB Endowment  
In this demo, we propose SocialSpamGuard, a scalable and online social media spam detection system based on data mining for social network security.  ...  Business entities or public figures set up social networking pages to enhance direct interactions with online users. Social media systems heavily depend on users for content contribution and sharing.  ...  GAD Clustering for Smarter Sampling. Because of the huge number of posts, randomly sampling may not be a good choice due to the uneven distribution and duplicate (or near duplicate) posts.  ... 
dblp:journals/pvldb/JinLLH11 fatcat:syec47lac5guzmsjp7akcf5bpa

Impact of historical gold mining activities on marine sediments in Wine Harbour, Nova Scotia, Canada

Megan E Little, Michael B Parsons, Brent A Law, Timothy G. Milligan, John N Smith
2015 Atlantic geology  
Historical maps document tailings deposits near former stamp mill sites; however, the extent to which these mine wastes influence environmental quality in the adjacent marine environment is uncertain.  ...  Results from this study have been used to help assess potential ecosystem and human health risks associated with historical gold mine wastes in the Wine Harbour area.  ...  The authors are grateful to Anne-Marie Ryan (Dalhousie University) for her helpful suggestions throughout this study, and to Brian Fisher and Paul Smith (Nova Scotia Department of Natural Resources) for  ... 
doi:10.4138/atlgeol.2015.016 fatcat:kqkd5j57xfcnzammf2bqeij3yy

Fraud Detection Using A New Multilayered Detection System

Kaminee Gurav, Manisha Gurabe
2015 International Journal of Innovative Research in Computer and Communication Engineering  
CD can detect more types of attacks; better account for changing legal behavior and spike detection (SD) complements CD.  ...  The existing non-data mining detection systems that uses business rules and scorecards, and known fraud matching have limitations.  ...  Duplicates are of two types: exact and near duplicates .Exact (or identical) duplicates have the all same values whereas near (or approximate) duplicates have some same values (or characters), some similar  ... 
doi:10.15680/ijircce.2015.0302007 fatcat:neublt3s2jf3xi7c3iwfv2nvia

Towards the bibliography of life

David King, David Morse, Alistair Willis, Anton Dil
2011 ZooKeys  
Such a bibliography has been achieved for specific study areas within taxonomy, but not for "life" as a whole.  ...  Firstly, it will be easier for them to discover relevant literature, especially pre-digital literature; and secondly, it will be easier for them to identify the canonical form for a citation.  ...  All the authors are researchers at the Open University, which is the lead institution for Work Package 7 on biodiversity literature access and data mining in the ViBRANT project.  ... 
doi:10.3897/zookeys.150.2167 pmid:22207811 pmcid:PMC3234436 fatcat:tbrjvomhabh57bjov3j3hyweam

Mining Relevant Time for Query Subtopics in Web Archives

Tu Ngoc Nguyen, Nattiya Kanhabua, Wolfgang Nejdl, Claudia Niederée
2015 Proceedings of the 24th International Conference on World Wide Web - WWW '15 Companion  
We introduce a brute-force approach to detect a time-reliable sub-collection and propose a method to leverage them for relevant time mining of subtopics.  ...  However, searching in this unique longitudinal collection of huge redundancy (pages of near-identical content are crawled all over again) is completely different from searching over the web.  ...  [2] also study the trending of the anchor texts by looking back at the historical web snapshot to improve the weighting function for document retrieval task.  ... 
doi:10.1145/2740908.2741702 dblp:conf/www/NguyenKNN15 fatcat:qwip7zhuzrdfjghpr6wzryoopi

XML structural delta mining: Issues and challenges

Qiankun Zhao, Ling Chen, Sourav S. Bhowmick, Sanjay Madria
2006 Data & Knowledge Engineering  
The distinct feature of XML structural delta mining is that it is based on the dynamic and temporal characteristics obtained from the historical versions of XML documents.  ...  The challenges for conducting such XML structural delta mining tasks are also discussed.  ...  Figure 1(b) is the tree representation of the XML document. Figures 1(c) , (d) and (e) are the tree representations of another three historical versions of the same XML document in Figure 1(a) .  ... 
doi:10.1016/j.datak.2005.10.002 fatcat:ynzathh2nbhufjjygmvt6hueva

Introduction to Formal Concept Analysis and Its Applications in Information Retrieval and Related Fields [chapter]

Dmitry I. Ignatov
2015 Communications in Computer and Information Science  
, Text Mining and several others.  ...  Since the tutorial was specially prepared for RuSSIR 2014, the covered FCA topics include Information Retrieval with a focus on visualisation aspects, Machine Learning, Data Mining and Knowledge Discovery  ...  The author was partially supported by the Russian Foundation for Basic Research grants no. 13-07-00504 and 14-01-93960 and prepared the tutorial within the project "Data mining based on applied ontologies  ... 
doi:10.1007/978-3-319-25485-2_3 fatcat:m2zad3btkjhkja2mdwmjfbkahi


Chris L. Walla, Daniel B. Adams
2006 Journal American Society of Mining and Reclamation  
The final configuration of the ponds were constructed to reduce wave erosion, increase shoreline sinuosity, provide minienvironments for each specific herbaceous plant specie required, and duplicate the  ...  mandates and approved park planning documents.  ...  seed (See Figure 3) .  ... 
doi:10.21000/jasmr06010763 fatcat:ipeilrplorebxj7ajtmada4jwu
« Previous Showing results 1 — 15 out of 20,158 results