Filters








22,314 Hits in 7.0 sec

Deep learning applications and challenges in big data analytics

Maryam M Najafabadi, Flavio Villanustre, Taghi M Khoshgoftaar, Naeem Seliya, Randall Wald, Edin Muharemagic
2015 Journal of Big Data  
Big Data has become important as many organizations both public and private have been collecting massive amounts of domain-specific information, which can contain useful information about problems such  ...  We also investigate some aspects of Deep Learning research that need further exploration to incorporate specific challenges introduced by Big Data Analytics, including streaming data, high-dimensional  ...  To deal with large scale image data collections, one approach to consider is to automate the process of tagging images and extracting semantic information from the images.  ... 
doi:10.1186/s40537-014-0007-7 fatcat:65mi6dnv5rg6poesotupqbsm7y

Deep Web Crawling for Insights from Polar Data

Siri Jodha S. Khalsa, Chris A. Mattmann, Ruth Duerr
2017 Zenodo  
We use the Polar domain to motivate the problem and our proposed solution. However, our techniques are applicable and scalable to other domains.  ...  We describe efforts to bring new methods of search analytics, machine learning, natural language processing and data visualization to address the challenge of finding and extracting meaning from unstructured  ...  • Documents, datasets, images, video, data services specific to polar region • How is this better than using Google?  ... 
doi:10.5281/zenodo.4659689 fatcat:xxnldvbd75fupfolyjhulzuh34

A scalable architecture for extracting, aligning, linking, and visualizing multi-Int data

Craig A. Knoblock, Pedro Szekely, Barbara D. Broome, Timothy P. Hanratty, David L. Hall, James Llinas
2015 Next-Generation Analyst III  
Under the DARPA Memex program we have already successfully applied this architecture to multiple application domains, including the enormous international problem of human trafficking, where we extracted  ...  We have developed the Domain-Insight Graph (DIG) system, an innovative architecture for extracting, aligning, linking, and visualizing massive amounts of domain-specific content from unstructured sources  ...  In our human trafficking application we used this approach with a database of 20 million images.  ... 
doi:10.1117/12.2177119 fatcat:onqbj5a7rzaypdzkmzppzmh2ia

"STRETCH": a system for document storage and retrieval by content

E. Appiani, L. Boato, S. Bruzzo, A.M. Colla, M. Davite, D. Sciarra
1999 Proceedings. Tenth International Workshop on Database and Expert Systems Applications. DEXA 99  
heterogeneous documents with variable layout and subsequently retrieve them by answering to complex queries.  ...  In this paper a system for storing and retrieving imaged multimedia documents by content is described.  ...  Acknowledgements We would like to acknowledge the contributions by all STRETCH workteam, in particular by P. Penna (AET, Genova), E. Francesconi and S. Marinai (DSI University of Firenze), M.  ... 
doi:10.1109/dexa.1999.795251 dblp:conf/dexaw/AppianiBBCDS99 fatcat:ytfo5vlqrfbvjbay2gqzwokwqm

Information Extraction from Multifaceted Unstructured Big Data

2019 International journal of recent technology and engineering  
In this regard, this paper presents a short review of information extraction process w.r.t. input data type, extraction methods with their corresponding techniques, and representation of extracted information  ...  The issues with unstructured data and the challenges to information extraction from multifaceted unstructured big data as well as the future research directions have also been discussed  ...  There is no single unified technique to extract textual information from images for all applications [25] .  ... 
doi:10.35940/ijrte.b1074.0882s819 fatcat:iwpaamsftrgztduhgfkbmgo3du

INFOHARNESS: managing distributed, heterogeneous information

I. Shah, A. Sheth
1999 IEEE Internet Computing  
Through a powerful, consistent user interface, InfoHarness provides rapid search of and access to information assets including documents and parts of documents, mail messages, images, code files, video  ...  APPLICATIONS Using metadata extraction methods, InfoHarness provides integrated, rapid access to huge amounts of heterogeneous information, regardless of type, representation, location, and medium.  ...  In modeling application domain-specific information, it is crucial to capture the semantic content at a level of abstraction similar to that a human would employ.  ... 
doi:10.1109/4236.806994 fatcat:n3uhguojs5gi7lcpyfoegjzc5e

XML-based Exploitation of Region of Interest Scalability in Scalable Video Coding

Davy De Schrijver, Wesley De Neve, Davy Van Deursen, Yves Dhondt, Rik Van de Walle
2007 Eighth International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS '07)  
In this paper, we show how Flexible Macroblock Ordering can be used in the scalable extension of the H.264/AVC specification in order to define the ROIs in the coded bitstream.  ...  From the performance analysis of our adaptation framework, we can conclude that the ROIs can be extracted in the XML domain and that the ROIs in the adapted bitstream are still intact without quality degradation  ...  To summarize, our approach towards ROI scalability results in bitstreams compliant with the specification (before and after the ROI extraction), in a limitation to inside the GOPs of the drift after the  ... 
doi:10.1109/wiamis.2007.92 dblp:conf/wiamis/SchrijverNDDW07 fatcat:pjtxrxum3retnn4r5ks7stbira

Fully-Automatic Pipeline for Document Signature Analysis to Detect Money Laundering Activities [article]

Nikhil Woodruff, Amir Enshaei, Bashar Awwad Shiekh Hasan
2021 arXiv   pre-print
We propose an integrated pipeline of signature extraction and curation, with no human assistance from the obtaining of company documents to the clustering of individual signatures.  ...  We evaluate both the effectiveness of the pipeline at matching obscured same-author signature pairs and the effectiveness of the entire pipeline against a human baseline for document signature analysis  ...  Connected Component Analysis Connected component analysis uses a graph representation of an image to extract high-level information and separate regions which contain similar characteristics.  ... 
arXiv:2107.14091v1 fatcat:2a5nifbq3zavrnwrpqw6fofo2a

Bootstrapping Ontology Evolution with Multimedia Information Extraction [chapter]

Georgios Paliouras, Constantine D. Spyropoulos, George Tsatsaronis
2011 Lecture Notes in Computer Science  
Thus, in addition to annotating multimedia content with semantics, the extracted knowledge is used to expand our understanding of the domain and extract even more useful knowledge.  ...  This chapter summarises the approach and main achievements of the research project BOEMIE (Bootstrapping Ontology Evolution with Multimedia Information Extraction).  ...  Driven by domain-specific multimedia ontologies, BOEMIE information extraction systems are able to identify high-level semantic features in image, video, audio and text and fuse these features for improved  ... 
doi:10.1007/978-3-642-20795-2_1 fatcat:rnn442weubdrvhp2bzza6odaca

Semantic Research for Digital Libraries

Hsinchun Chen
1999 D-Lib Magazine  
wttwt any specific system.  ...  A textual semantic analysis pyramid was developed by The University of Arizona AI Lab to assist in semantic indexing, analysis, and visualization of textual documents.  ... 
doi:10.1045/october99-chen fatcat:gxlv5znrxza2xmpcak7h4noway

Knowledge as a Service Framework for Disaster Data Management

Katarina Grolinger, Miriam A.M. Capretz, Emna Mezghani, Ernesto Exposito
2013 2013 Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises  
As the capabilities of software and hardware evolve, so does the role of information and communication technology in disaster mitigation, preparation, response, and recovery.  ...  In this paper, a Knowledge as a Service (KaaS) framework is proposed for disaster cloud data management (Disaster-CDM) , with the objectives of 1) storing large amounts of disaster-related data from diverse  ...  MS Word documents also go through text extraction because they often contain images that may include relevant information.  ... 
doi:10.1109/wetice.2013.48 dblp:conf/wetice/GrolingerCME13 fatcat:sp6kt3gcefalbaqwmqmiib3m7u

Towards content-based patent image retrieval: A framework perspective

Stefanos Vrochidis, Symeon Papadopoulos, Anastasia Moumtzidou, Panagiotis Sidiropoulos, Emanuelle Pianta, Ioannis Kompatsiaris
2010 World Patent Information  
The proposed framework involves the application of document image pre-processing, image feature and textual metadata extraction in order to support effectively content-based image retrieval in the patent  ...  domain.  ...  Finally, the application of image retrieval technologies in realworld patent search scenarios brings to surface the problem of scalability.  ... 
doi:10.1016/j.wpi.2009.05.010 fatcat:4wyfy2didfa3hflf76uvkfuqfu

ToMaR -- A Data Generator for Large Volumes of Content

Rainer Schmidt, Matthias Rella, Sven Schlarb
2014 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing  
ToMaR specifically addresses the need for extracting data sets from large volumes of binary content based on existing, content-specific applications within a scalable data management environment.  ...  We present ToMaR, a scalable application that supports the efficient integration of legacy applications within a MapReduce environment.  ...  specifically support scalability and integration with digital preservation processes as well as to integrate with other SCAPE components.  ... 
doi:10.1109/ccgrid.2014.88 dblp:conf/ccgrid/SchmidtRS14 fatcat:olnhs66avbah7m2l7clnobxpfq

Design Issues in Web Crawlers and Review of Parallel Crawlers

2016 International Journal of Science and Research (IJSR)  
With the increase in size of web, search engine depends upon the Web Crawler to download and build index of million/billion of pages for efficient information retrieval when user interact through search  ...  This paper will include the definition of Web Crawler, criteria on the basis of which various types of crawler are defined [4] and some common issues with the design of crawler, parallel crawler, its issues  ...  The information which it extract can be in the form or web pages, images, video, pdf files or various other type of files.  ... 
doi:10.21275/v5i6.nov163887 fatcat:fydxnzn2inhe7kb2wvbgi6ekea

Efficient Automated Processing of the Unstructured Documents using Artificial Intelligence: A Systematic Literature Review and Future Directions

Dipali Baviskar, Swati Ahirrao, Vidyasagar Potdar, Ketan Kotecha
2021 IEEE Access  
have used different datasets to train the model for specific information extraction tasks, or the unstructured document analysis tasks.  ...  TABLE 1 . 1 Various Application Domains for Automatic Information Extraction Techniques from Unstructured Documents.  ... 
doi:10.1109/access.2021.3072900 fatcat:lrbzlmo5gnczhadnrxd2aoqz4u
« Previous Showing results 1 — 15 out of 22,314 results