Filters








484 Hits in 5.0 sec

ScienceTreks: an autonomous digital library system

Alexander Ivanyukovich, Maurizio Marchese, Fausto Giunchiglia, A.R.D. Prasad
2008 Online information review (Print)  
Originality/value -High quality automatic metadata extraction is a crucial step in order to move from linguistic entities to logical entities, relation information and logical relations and therefore to  ...  Findings -The proposed pipeline is implemented in a working prototype of an Autonomous Digital Library system -the ScienceTreks system -that: (1) support a broad range of methods for documents acquisition  ...  Lee Giles for useful comments and advice during initial brainstorming on the system architecture.  ... 
doi:10.1108/14684520810897368 fatcat:x7dhcwnjcreqzladrxidw6xniu

Bringing taxonomic structure to large digital libraries

David Sanchez, Antonio Moreno
2007 International Journal of Metadata, Semantics and Ontologies  
The system has been tested for several digital libraries and domains of knowledge, providing good quality results in all cases.  ...  In this paper, we present an automatic, unsupervised, domain-independent and scalable approach for structuring the resources available in a certain electronic repository for a particular domain.  ...  In this sense, web-based digital libraries (e.g., Citeseer, PubMed, etc.) provide an environment in which the scientific production for a particular domain is stored, configuring a trusted, updated and  ... 
doi:10.1504/ijmso.2007.016805 fatcat:j6fygsin6zco7gm4xcfec6fccq

Multimedia content analysis, management and retrieval: trends and challenges

Alan Hanjalic, Nicu Sebe, Edward Chang, Edward Y. Chang, Alan Hanjalic, Nicu Sebe
2006 Multimedia Content Analysis, Management, and Retrieval 2006  
The origin of this tendency we see in the fact that the domain knowledge used to bridge the semantic gap between the features and semantic concepts is in most cases far too specific.  ...  to domain/context-specific conditions -are applicable in a much broader scope than what is currently possible.  ...  More flexibility and less complexity can be obtained by using more generic domain knowledge.  ... 
doi:10.1117/12.673788 fatcat:klfzds3zuvdcfd2czupjqafefe

Grappling with the Scale of Born-Digital Government Publications: Toward Pipelines for Processing and Searching Millions of PDFs [article]

Benjamin Charles Germain Lee, Trevor Owens
2021 arXiv   pre-print
This paper utilizes a Library of Congress dataset of 1,000 government PDFs in order to offer initial approaches for searching and analyzing these PDFs at scale.  ...  Government documents posted to the web in PDF form have been archived by libraries to date.  ...  In many cases, a user must either know a priori exactly what they are looking for when they access archived digital resources or traverse the entirety of an archived website to find an individual digital  ... 
arXiv:2112.02471v1 fatcat:yg2xrmgnwva2lpoc334ptiwpoa

Unsupervised strategies for information extraction by text segmentation

Eli Cortez, Altigran S. da Silva
2010 Proceedings of the Fourth SIGMOD PhD Workshop on Innovative Database Research - IDAR '10  
We report here partial results from a PhD thesis work in which we introduce ONDUX (On Demand Unsupervised Information Extraction), a new unsupervised probabilistic approach for IETS.  ...  As other unsupervised IETS approaches, ONDUX relies on information available on pre-existing data to associate segments in the input string with attributes of a given domain.  ...  Acknowledgements This work was supported by a CNPq fellowship grant to Altigran S. Silva and by a CAPES scholarship to Eli Cortez.  ... 
doi:10.1145/1811136.1811145 fatcat:zawle5n2v5d75fwanj5ysbjrw4

Efficient topic-based unsupervised name disambiguation

Yang Song, Jian Huang, Isaac G. Councill, Jia Li, C. Lee Giles
2007 Proceedings of the 2007 conference on Digital libraries - JCDL '07  
Experiments on web data and scientific documents from CiteSeer indicate that our approach consistently outperforms other unsupervised learning methods such as spectral clustering and DBSCAN clustering  ...  In this paper, we focus on the problem of disambiguating person names within web pages and scientific documents. We present an efficient and effective twostage approach to disambiguate names.  ...  known a priori for an ever increasing digital library and the computational complexity O(N 2 ) is intractable for N=739,135 in CiteSeer.  ... 
doi:10.1145/1255175.1255243 dblp:conf/jcdl/SongHCLG07 fatcat:k26gwgsok5cqnas7uapu2obzhy

Old document image segmentation using the autocorrelation function and multiresolution analysis

Maroua Mehri, Petra Gomez-Krämer, Pierre Héroux, Rémy Mullot, Richard Zanibbi, Bertrand Coüasnon
2013 Document Recognition and Retrieval XX  
Therefore, in order to control the quality of historical document image digitization and to meet the need of a characterization of their content using intermediate level metadata (between image and document  ...  Recent progress in the digitization of heterogeneous collections of ancient documents has rekindled new challenges in information retrieval in digital libraries and document layout analysis.  ...  The authors would like to thank Geneviève CRON of the BnF for providing access to the Gallica digital library.  ... 
doi:10.1117/12.2002365 dblp:conf/drr/MehriGHM13 fatcat:32tt4k5wgrcmtjeqvgjcwvzlee

Materials Informatics for Mechanical Deformation: A Review of Applications and Challenges

Karol Frydrych, Kamran Karimi, Michal Pecelerowicz, Rene Alvarez, Francesco Javier Dominguez-Gutiérrez, Fabrizio Rovaris, Stefanos Papanikolaou
2021 Materials  
In this fast-growing field, we focus on reviewing advances at the intersection of data science with mechanical deformation simulations and experiments, with a particular focus on studies of metals and  ...  In the design and development of novel materials that have excellent mechanical properties, classification and regression methods have been diversely used across mechanical deformation simulations or experiments  ...  without any a priori knowledge about the degree of intercorrelations.  ... 
doi:10.3390/ma14195764 pmid:34640157 fatcat:o5csvoojpbhazfo63rcv4ersf4

ThManager: An Open Source Tool for Creating and Visualizing SKOS

Javier Lacasta, Javier Nogueras-Iso, Francisco Javier López-Pellicer, Pedro Rafail Muro-Medrano, Francisco Javier Zarazaga-Soria
2007 Information Technology and Libraries  
Simple knowledge organization systems (SKOS) seem to be the most promising representation for the type of knowledge models used in digital libraries, but there is a lack of tools that are able to properly  ...  Knowledge organization systems denotes formally represented knowledge that is used within the context of digital libraries to improve data sharing and information retrieval.  ...  The authors would like to express their gratitude to Juan José Floristán for his support in the technical development of the tool.  ... 
doi:10.6017/ital.v26i3.3274 fatcat:isyy7nsvmvcb7lksvjexo242hy

Biological imaging software tools

Kevin W Eliceiri, Michael R Berthold, Ilya G Goldberg, Luis Ibáñez, B S Manjunath, Maryann E Martone, Robert F Murphy, Hanchuan Peng, Anne L Plant, Badrinath Roysam, Nico Stuurman, Jason R Swedlow (+2 others)
2012 Nature Methods  
ACKNOwLEdGMENTS We acknowledge our respective funding sources and members of our laboratories for feedback and useful comments, in particular A. Merouane and A.  ...  Narayanswamy of the Roysam lab for their assistance in preparing figures, and L. Kamentsky and M. Bray of the Carpenter lab for useful input and edits on the manuscript.  ...  Ontologies are formal expressions of human knowledge about a domain in machinereadable form 58 .  ... 
doi:10.1038/nmeth.2084 pmid:22743775 pmcid:PMC3659807 fatcat:6wnzuhdrdfd67o72cmcgwwpgka

Discovering geographic knowledge in data rich environments

Harvey J. Miller, Jiawei Han
2000 SIGKDD Explorations  
This includes a demonstration of GKD techniques leading to new, unexpected knowledge in key geographic research domains.  ...  These techniques are confirmatory and require the researcher to have a priori hypotheses.  ...  The necessary preliminary and tentative modifications of metadata for use by the Alexandria Digital Library Project indicate this gap quite clearly.  ... 
doi:10.1145/846183.846208 fatcat:tnjb5v2ecfczlhk6bxstxdgcxy

Content-Based Image Retrieval in Radiology: Current Status and Future Directions

Ceyhun Burak Akgül, Daniel L. Rubin, Sandy Napel, Christopher F. Beaulieu, Hayit Greenspan, Burak Acar
2010 Journal of digital imaging  
Many advances have occurred in CBIR, and a variety of systems have appeared in nonmedical domains; however, permeation of these methods into radiology has been limited.  ...  Radiology images pose specific challenges compared with images in the consumer domain; they contain varied, rich, and often subtle features that need to be recognized in assessing image similarity.  ...  Since no a priori domain-specific knowledge is exploited in their computation, they are primarily used with vector distance-based similarity analyses 14,40,55,57 in high-dimensional feature vector spaces  ... 
doi:10.1007/s10278-010-9290-9 pmid:20376525 pmcid:PMC3056970 fatcat:54efjsvb2vdxhecunbmv7pazei

Citation recommendation: approaches and datasets

Michael Färber, Adam Jatowt
2020 International Journal on Digital Libraries  
In this article, we give a thorough introduction to automatic citation recommendation research.  ...  Due to the overload of published scientific works in recent years on the one hand, and the need to cite the most appropriate publications when writing scientific texts on the other hand, citation recommendation  ...  For instance, in the medical digital library database PubMed, the number of publications in 2014 (514k) was more than triple the amount published in 1990 (137k) and more than 100 times the amount published  ... 
doi:10.1007/s00799-020-00288-2 fatcat:6gig2fv6uvfipipphtemb5thfm

Camera-based Sudoku recognition with deep belief network

Baptiste Wicht, Jean Hennebert
2014 2014 6th International Conference of Soft Computing and Pattern Recognition (SoCPaR)  
The digits position are extracted from the grid and finally, each character is recognized using a Deep Belief Network (DBN).  ...  On average, our solution is able to produce a result from a Sudoku in less than 100ms.  ...  ACKNOWLEDGMENT We would like to thank all the people who contributed to the dataset by sending us Sudoku images taken from their phones, in particular Patrick Anagnostaras.  ... 
doi:10.1109/socpar.2014.7007986 dblp:conf/socpar/WichtH14 fatcat:jyxkv3z37fecfddyi2jviqc7w4

Visual Analysis and Knowledge Discovery for Text [chapter]

Christin Seifert, Vedran Sabol, Wolfgang Kienreich, Elisabeth Lex, Michael Granitzer
2013 Large-Scale Data Analytics  
We argue that visual analysis, in combination with automatic knowledge discovery methods, provides several advantages.  ...  Providing means for effectively accessing and exploring large textual data sets is a problem attracting the attention of text mining and information visualization experts alike.  ...  Semantic Enrichment Semantic enrichment extracts domain-specific semantics from single documents and enriches each document with external knowledge.  ... 
doi:10.1007/978-1-4614-9242-9_7 fatcat:5ti6ubvuj5d5dhfjg6eyiyld6m
« Previous Showing results 1 — 15 out of 484 results