Filters








28,611 Hits in 4.8 sec

Redundancy-driven web data extraction and integration

Lorenzo Blanco, Mirko Bronzi, Valter Crescenzi, Paolo Merialdo, Paolo Papotti
2010 Procceedings of the 13th International Workshop on the Web and Databases - WebDB '10  
We present a domain independent system that exploits the redundancy of information to automatically extract and integrate data from the Web.  ...  Our proposal is based on an original approach that exploits the mutual dependency between the data extraction and the data integration tasks.  ...  In large data intensive web sites, we observe two important characteristics that suggest new opportunities for the automatic extraction and integration of web data.  ... 
doi:10.1145/1859127.1859137 dblp:conf/webdb/PapottiCMBB10 fatcat:izz4jb5q2zeo5kzhsjvfrxy64u

Exploiting information redundancy to wring out structured data from the web

Lorenzo Blanco, Mirko Bronzi, Valter Crescenzi, Paolo Merialdo, Paolo Papotti
2010 Proceedings of the 19th international conference on World wide web - WWW '10  
We present a domain independent system that exploits the redundancy of information to automatically extract and integrate data from the Web.  ...  Our proposal is based on an original approach that exploits the mutual dependency between the data extraction and the data integration tasks.  ...  We introduce an automatic, domain independent technique that exploits an unexplored publishing pattern to extract and integrate data from the Web.  ... 
doi:10.1145/1772690.1772805 dblp:conf/www/BlancoBCMP10 fatcat:3acznfpf7fhgxojerdnedg66zu

Automatically building probabilistic databases from the web

Lorenzo Blanco, Mirko Bronzi, Valter Crescenzi, Paolo Merialdo, Paolo Papotti
2011 Proceedings of the 20th international conference companion on World wide web - WWW '11  
, extracts and integrate the published data, and finally performs a probabilistic analysis to characterize the impreciseness of the data and the accuracy of the sources.  ...  We present an automatic and domain independent system that performs all the steps required to benefit from these data: it discovers data intensive web sites containing information about an entity of interest  ...  ; (ii) data extraction and integration: inferring wrappers to extract data from each source and then integrating the redundant data; (iii) probabilistic analysis: to address the intrinsic imprecision of  ... 
doi:10.1145/1963192.1963285 dblp:conf/www/BlancoBCMP11 fatcat:jtlpz53wjraijh7mkl6lwqpxia

Using Conditional Random Field in Named Entity Recognition for Crime Location Identification

Quintin Jackson Goraseb, Faculty of Computing and Informatics, Multimedia University, Cyberjaya, Malaysia, Nathar Shah
2020 International Journal of Mechanical Engineering and Robotics Research  
This paper will discuss the mining of crime data from electronic news sources in Malaysia, and how this data is further transformed to extract meaningful information from it.  ...  Electronic data or information comes in different forms, some are structured data and others unstructured data. The act of collecting such data is known as data mining.  ...  It is a software type that standardizes the integration of data from different sources. It has a wrapper per data source for extraction and a mediator for integration.  ... 
doi:10.18178/ijmerr.9.2.252-257 fatcat:trfuexd4brflhnqin3b4lmg4sm

PIQMIe: a web server for semi-quantitative proteomics data management and analysis

Arnold Kuzniar, Roland Kanaar
2014 Nucleic Acids Research  
data available in the form of a light-weight relational database, which enables dedicated data analyses (e.g. in R) and user-driven queries.  ...  We present the Proteomics Identifications and Quantitations Data Management and Integration Service or PIQMIe that aids in reliable and scalable data management, analysis and visualization of semiquantitative  ...  ACKNOWLEDGMENTS The authors thank Jeroen Demmers for providing the MSbased proteomics data set, Joyce Lebbink Charlie Laffeber, Karen Sap and Harm Nijveen for beta testing and helpful suggestions.  ... 
doi:10.1093/nar/gku478 pmid:24861615 pmcid:PMC4086067 fatcat:2xxqywbd4vex5m6hayxrenon6i

An exploration of the principles underlying redundancy-based factoid question answering

Jimmy Lin
2007 ACM Transactions on Information Systems  
Specifically, we develop two theses: that stable characteristics of data redundancy allow factoid systems to rely on external "black box" components, and that despite embodying a data-driven approach,  ...  The so-called "redundancy-based" approach to question answering represents a successful strategy for mining answers to factoid questions such as "Who shot Abraham Lincoln?" from the World Wide Web.  ...  Data Redundancy At the highest level, data redundancy allows systems to capitalize on statistical regularities to extract "easy" answers to factoid questions from the Web.  ... 
doi:10.1145/1229179.1229180 fatcat:l2pwnam7qvh6xbp6a3krpoidpq

Design, Implementation, and Assessment of Innovative Data Warehousing; Extract, Transformation, and Load(ETL); and Online Analytical Processing(OLAP) on BI

Ramesh Venkatakrishnan
2020 International Journal of Database Management Systems  
The predominant challenges with these fundamental components are Data Volume, Data Variety, Data Integration, Complex Analytics, Constant Business changes, Lack of skill sets, Compliance, Security, Data  ...  changes with the explosion of data) and Self-Service BI.  ...  Data reduction techniques such as data virtualization can avoid data redundancy and optimize storage needs indirectly, helping with the performance and data integrity.  ... 
doi:10.5121/ijdms.2020.12301 fatcat:ztaew4ehmrfzldrfwcktx64rce

Knowledge Curation and Knowledge Fusion

Xin Luna Dong, Divesh Srivastava
2015 Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data - SIGMOD '15  
Our tutorial highlights the similarities and differences between knowledge management and data integration, and has two goals.  ...  even more challenges in extracting knowledge from both structured and unstructured data, across a large variety of domains, and in multiple languages.  ...  Relation extraction is analogous to data extraction and schema alignment in data integration.  ... 
doi:10.1145/2723372.2731083 dblp:conf/sigmod/DongS15 fatcat:ox27mcqbwvgq3lxebvupl25yte

Data Conceptualisation for Web-Based Data-Centred Application Design [chapter]

Julien Vilz, Anne France Brogneaux, Ravi Ramdoyal, Vincent Englebert, Jean Luc Hainaut
2006 Lecture Notes in Computer Science  
The paper describes the conceptualisation process in the ReQuest approach, a wide-spectrum methodology for web-based information systems analysis and development.  ...  The analysis includes a tree-based representation of the fragments, the detection of shared subtrees through mining techniques, their normalisation and the derivation of the conceptual schema.  ...  Integrating existing services is sketched for data only. The logical schema and the conceptual schemas of the legacy database are extracted through reverse engineering techniques [8] .  ... 
doi:10.1007/11767138_15 fatcat:rt252eobpjgsjhsc3myjlss3ey

13th international workshop on the web and databases

Xin Luna Dong, Felix Naumann
2011 SIGMOD record  
In 2010 WebDB focused on Quality of Web Data and on Linked Data, but papers on all aspects of the web and databases were solicited, such as unstructured and semi-structured data management, data -extraction  ...  , -integration, -cleansing, and -mining, web applications and privacy, search and information retrieval, and distributed data management.  ...  Redundancy-Driven Web Data Extraction and Integration. Paolo Papotti, Valter Crescenzi, Paolo Merialdo, Mirko Bronzi, Lorenzo Blanco (Universit Roma Tre).  ... 
doi:10.1145/1942776.1942787 fatcat:xmbrywp4w5edjfg5xmqn5zlf3y

A semantic approach to retrieving, linking, and integrating heterogeneous geospatial data

Ying Zhang, Yao-Yi Chiang, Pedro Szekely, Craig A. Knoblock
2013 Joint Proceedings of the Workshop on AI Problems and Approaches for Intelligent Environments and Workshop on Semantic Cities - AIIP '13  
First, we encapsulate the retrieval algorithms as web services and invoke the services to extract geospatial data from various sources.  ...  There is a tremendous amount of geospatial data available, and there are numerous methods for extracting, processing and integrating geospatial sources.  ...  The integration process eliminates the data redundance, and combines the complementary properties from the linked data.  ... 
doi:10.1145/2516911.2516914 dblp:conf/ijcai/ZhangCSK13 fatcat:obp3wt5zrzfhbolokruwfkvucm

SEEDEEP: A System for Exploring and Querying Scientific Deep Web Data Sources [chapter]

Fan Wang, Gagan Agrawal
2009 Lecture Notes in Computer Science  
SEEDEEP is able to automatically mine deep web data source schemas, integrate heterogeneous data sources, answer cross-source keyword queries, and incorporates features like caching and fault-tolerance  ...  Currently, SEEDEEP integrates 16 deep web data sources in the biological domain.  ...  Acknowledgements This work was supported by NSF grants 0541058 and 0619041. The equipment used for the experiments reported here was purchased under the grant 0403342.  ... 
doi:10.1007/978-3-642-02279-1_6 fatcat:dc5i2cvgmvh4rjgawhimeoa36i

Adoption of the Semantic Web for overcoming technical limitations of knowledge management systems

Jaehun Joo, Sang M. Lee
2009 Expert systems with applications  
We found that inconvenience, search and integration were statistically significant limitation factors for system quality.  ...  The purpose of this study is to analyze the limitations of current KM systems and to propose an approach for applying the Semantic Web to KM.  ...  Although the traditional integration approaches such as middleware and standardization easily integrate structured data extracted from heterogeneous databases, they have limitations when integrating unstructured  ... 
doi:10.1016/j.eswa.2008.09.005 fatcat:4gfk3gaw3jch5hvkjvca5igm4q

Question answering from the web using knowledge annotation and knowledge mining techniques

Jimmy Lin, Boris Katz
2003 Proceedings of the twelfth international conference on Information and knowledge management - CIKM '03  
Our approach, implemented in a system called Aranea, extracts answers from the Web using two different techniques: knowledge annotation and knowledge mining.  ...  Knowledge mining is a statistical approach that leverages massive amounts of Web data to overcome many natural language processing challenges.  ...  In contrast, the AskMSR [5] , 4 one of the top performers at TREC-2001, embraced data-redundancy and applied extremely simple word-counting techniques on Web data.  ... 
doi:10.1145/956884.956886 fatcat:pv5khxeuknedpilvfsuqxlemlq

Question answering from the web using knowledge annotation and knowledge mining techniques

Jimmy Lin, Boris Katz
2003 Proceedings of the twelfth international conference on Information and knowledge management - CIKM '03  
Our approach, implemented in a system called Aranea, extracts answers from the Web using two different techniques: knowledge annotation and knowledge mining.  ...  Knowledge mining is a statistical approach that leverages massive amounts of Web data to overcome many natural language processing challenges.  ...  In contrast, the AskMSR [5] , 4 one of the top performers at TREC-2001, embraced data-redundancy and applied extremely simple word-counting techniques on Web data.  ... 
doi:10.1145/956863.956886 dblp:conf/cikm/LinK03 fatcat:nejpsaxtkja2pkzdy7ncgnryf4
« Previous Showing results 1 — 15 out of 28,611 results