15,925 Hits in 6.1 sec

Automatic Identification and Data Extraction from 2-Dimensional Plots in Digital Documents [article]

William Brouwer, Saurabh Kataria, Sujatha Das, Prasenjit Mitra, C. L. Giles
2008 arXiv   pre-print
If we can extract data from these images automatically and store them in a database, an end-user can query and combine data from multiple digital documents simultaneously and efficiently.  ...  Therefore, searching for figures and extracting figure data are important problems. To the best of our knowledge, there exists no tool to automatically extract data from figures in digital documents.  ...  Shape Total # Correct % Recall Diamond 72 64 88.9 Triangle 78 71 91.0 CONCLUSIONS AND FURTHER WORK We have outlined a system that can identify 2-D plots in digital documents and extract data from the  ... 
arXiv:0809.1802v1 fatcat:dxeuw7aukbb3bn5xq523v6nfbe

Segregating and extracting overlapping data points in two-dimensional plots

William Browuer, Saurabh Kataria, Sujatha Das, Prasenjit Mitra, C. Lee Giles
2008 Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries - JCDL '08  
and store them in a database, an end-user can query and combine data from multiple digital documents simultaneously and efficiently.  ...  The proposed algorithm identifies a 2-D plot and extracts the axis labels, legend and the data points from the 2-D plot. We also segregate overlapping shapes that correspond to different data points.  ...  CONCLUSIONS AND FURTHER WORK We have outlined a system that can identify 2-D plots in digital documents and extract data from the identified documents.  ... 
doi:10.1145/1378889.1378936 dblp:conf/jcdl/BrowuerKDMG08 fatcat:6rm6dtxd75he5jprpvd34lattu

Automatic Extraction of Data from 2-D Plots in Documents

X. Lu, J. Wang, P. Mitra, C.L. Giles
2007 Proceedings of the International Conference on Document Analysis and Recognition  
Two-dimensional (2-D) plots in digital documents contain important information. Often, the results of scientific experiments and performance of businesses are summarized using plots.  ...  We propose an automated algorithm for extracting information from line curves in 2-D plots.  ...  Acknowledgements This work was supported in part by the US National Science Foundation under grants 0535656, 0347148, 0454052, and 0202007, Microsoft Research, and the Internet Archive.  ... 
doi:10.1109/icdar.2007.4378701 dblp:conf/icdar/LuWMG07 fatcat:om7mg23nsbhfvenycp2aij23ry

Automated analysis of images in documents for intelligent document search

Xiaonan Lu, Saurabh Kataria, William J. Brouwer, James Z. Wang, Prasenjit Mitra, C. Lee Giles
2009 International Journal on Document Analysis and Recognition  
Then, an integrated algorithm is used to extract numerical data from data points and lines in the 2-D plot images along with the axes and their labels, the data symbols in the figure's legend and their  ...  Authors use images to present a wide variety of important information in documents. For example, two-dimensional (2-D) plots display important data in scientific publications.  ...  Acknowledgments This work was supported in part by the US National Science Foundation under grants 0535656, 0347148, 0454052, and 0202007, Microsoft Research, and Internet Archive.  ... 
doi:10.1007/s10032-009-0081-0 fatcat:myywoqxytnhfhhw44zkprz73cm

Automatic categorization of figures in scientific documents

Xiaonan Lu, Prasenjit Mitra, James Z. Wang, C. Lee Giles
2006 Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries - JCDL '06  
The proposed approach has been evaluated on a testbed document set collected from the CiteSeer scientific literature digital library.  ...  We propose an architecture for retrieving documents by integrating figures and other information.  ...  Specifically, Figures 1(a) and 2(b) were published in [20] . Figures 1(b) , (c), and 6(a), (c), (d) were published in [29] . Figures 1(d) and 2(a) were published in [5] .  ... 
doi:10.1145/1141753.1141778 dblp:conf/jcdl/LuMWG06 fatcat:sp4l3nickrgq3novmzxdopd5ja

An Architecture for Information Extraction from Figures in Digital Libraries

Sagnik Ray Choudhury, Clyde Lee Giles
2015 Proceedings of the 24th International Conference on World Wide Web - WWW '15 Companion  
An extractor for figures and associated metadata (figure captions and mentions) from PDF documents; 2. A Search engine on the extracted figures and metadata; 3.  ...  We discuss the challenges in each step, report an extractor algorithm to extract vector graphics from scholarly documents and a classification algorithm for figures.  ...  ACKNOWLEDGEMENTS We gratefully acknowledge partial support from the National Science Foundation and NPRP grant # 4-029-1-007 from the Qatar National Research Fund (a member of Qatar Foundation).  ... 
doi:10.1145/2740908.2741712 dblp:conf/www/ChoudhuryG15 fatcat:s5tuhz767vfcjmhuuakibxszwu

Elimination of junk document surrogate candidates through pattern recognition

Eunyee Koh, Daniel Caruso, Andruid Kerne, Ricardo Gutierrez-Osuna
2007 Proceedings of the 2007 ACM symposium on Document engineering - DocEng '07  
While processing these surrogate candidates from an HTML document, relevant information may appear together with less useful junk material, such as navigation bars and advertisements.  ...  A surrogate is an object that stands for a document and enables navigation to that document.  ...  In order to extract information from large and diverse collections of documents, it is necessary to utilize human cognitive feedback in collecting training data that can be used later by procedural classifiers  ... 
doi:10.1145/1284420.1284466 dblp:conf/doceng/KohCKG07 fatcat:fg5rt5pfdvbsxn6w33njheth7q

Event Detection with Convolutional Neural Networks for Forensic Investigation [chapter]

Bo Yang, Ning Li, Zhigang Lu, Jianguo Jiang
2016 IFIP Advances in Information and Communication Technology  
Accuracy and loss plots of CSV-CNN(blue is training data, red is 10% dev data). (c) Accuracy (d) loss Figure 4 . accuracy and loss plots of CNN (blue is training data, red is 10% dev data).  ...  valuable information from them is based on the event detection task, which involves identification of events from specific types in the artifacts.  ...  Builds a embedding mapping from words in each sentence to vector 5. end 6. end 7. output(embedding); 8. for(int k = 0; k<number of windows in the convolution; k++) 9.  ... 
doi:10.1007/978-3-319-48390-0_11 fatcat:6n7dhnv3pjetfdrtyckhg2r7cm

A Novel Interpolation Perspective for Handwritten Digit Recognition using Neural Network

2020 International journal of recent technology and engineering  
And seeks to classify the numerical digits so that digits can be translated into pixels.  ...  Since of the accessibility of enormous knowledge calculation and numerous algorithmic advances that are emerging, it has become easier in this day and age to train deep neural systems.  ...  The author [3] implies real-life document identification systems that represent the collection of multiple sections, including field extraction partitioning and language pattern.  ... 
doi:10.35940/ijrte.b3148.079220 fatcat:jshf2cz2djdvvktfq2qel2e75i

An Algorithm Search Engine for Extracting Algorithm From PDF Document

Akshata R. Sanas, Pallavi S Patil
2019 International Journal of Scientific Research in Computer Science Engineering and Information Technology  
It is used to automatically encounter and take these algorithms in this big collection of documents that enable algorithm indexing, searching, discovery and analysis.  ...  In support of lectures and self-learning, the highlighted documents can be shared with others.  ...  For extracting data and text from two-dimensional plots they advanced automated methods from digital documents and implement it to documents published on the web.  ... 
doi:10.32628/cseit195454 fatcat:uhkedlce4bcwbncolqcrwv7ezm

Handwritten Digit Recognition Using Different Dimensionality Reduction Techniques

2019 International journal of recent technology and engineering  
This method is experimented over and MNIST handwritten digit data set.  ...  PCA reduces the size of data and conserve maximum variance in the form of new variable called principal components where LDA works with minimum class distance and maximizing difference between the classes  ...  First the images are plotted without dimensionality reduction in figure 1 , then after dimensionality reduction image and plotted in figure 2 and 3.  ... 
doi:10.35940/ijrte.b1798.078219 fatcat:nn4omcugenhjddvm7iw2vmat4a

Enhancement and Recognition of Number Plate using OCR Technique

Gayathri Devi S
2019 International Journal for Research in Applied Science and Engineering Technology  
Advances in image processing includes foreseeing the data necessities of Governments, perceiving and following humans and things, diagnosing ailments, performing medical procedure, and programmed driving  ...  From the extracted number plate, each character is isolated by segmentation in the character segmentation phase.  ...  In the obtained coordinates Y width is plotted utilizing histogram plot to discover the frequency and data centers to discover the values of the numbers.  ... 
doi:10.22214/ijraset.2019.5115 fatcat:7xskqxpxejfepbrbn2oi7dudw4

Exploring the Field of Text Mining

Radha Guha
2017 International Journal of Computer Applications  
Text mining is the technique of automatically deducing nonobvious but statistically supported novel information from various text data sources written in natural languages.  ...  In the big data and cloud computing era of today huge amount of text data are getting generated online.  ...  So again we need a computer to solve the challenging task of summarizing documents and extracting other structural, syntactic and semantic relation information from the documents automatically.  ... 
doi:10.5120/ijca2017915682 fatcat:5slojxqclfcybgrhmyfvtr5dl4


M. Murphy, A. Corns, J. Cahill, K. Eliashvili, A. Chenau, C. Pybus, R. Shaw, G. Devlin, A. Deevy, L. Truong-Hong
2017 The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences  
Case studies in Ireland are used to test and develop the suitable systems for (a) data capture/digital surveying/processing (b) developing library of architectural components and (c) mapping these architectural  ...  The testing of open BIM approaches in particular IFCs and the use of game engine platforms is a fundamental component for developing much wider dissemination.  ...  PROTOTYPICAL STANDARDS FOR DATA CAPTURE -LASER SCANNING AND DIGITAL PHOTOGRAMMETRY Data Capture The terrestrial laser scanner is a device that automatically measures the three-dimensional co-ordinates  ... 
doi:10.5194/isprs-archives-xlii-2-w5-539-2017 fatcat:t4n4zwrlffh27f5c6w2f6z2qxy

Assessing earthquake effects on archaeological sites using photogrammetry and 3D model analysis

Paolo Forlin, Riccardo Valente, Miklós Kázmér
2018 Digital Applications in Archaeology and Cultural Heritage  
Three-dimensional data extraction and analysis The three-dimensional surfaces produced through the photogrammetry process resulted in detailed and highly accurate models with which to perform advanced  ...  field, a series of vertical and horizontal sections were extracted from the three-dimensional models.  ... 
doi:10.1016/j.daach.2018.e00073 fatcat:jawi5wlemza3fgrq3xxit57aya
« Previous Showing results 1 — 15 out of 15,925 results