Filters








21,178 Hits in 8.3 sec

Data Sources and Persistent Identifiers in the Open Science Research Graph of OpenAIRE

Jochen Schirrwagen, Alessia Bardi, Andreas Czerniak, Aenne Loehden, Najla Rettberg, Mike Mertens, Paolo Manghi
2020 International Journal of Digital Curation  
In this article, we give an overview of the data source typologies used in OpenAIRE and provide an outline on the role of persistent identifiers in the aggregation, curation and provision workflows that  ...  lead to the generation of the Research Graph in OpenAIRE.  ...  This paper will outline all the types of data sources that OpenAIRE gathers from and will focus on the importance of persistent identifers as enabling technology for metadata curation, enrichment, monitoring  ... 
doi:10.2218/ijdc.v15i1.722 fatcat:cofhbu6hwrebjilrdi7piun2v4

The Openaire Workflows For Data Management

Claudio Atzori, Alessia Bardi, Paolo Manghi, Andrea Mannocci
2017 Zenodo  
Such an information space graph is constructed by a set of autonomic (orchestrated) workflows operating in a regimen of continuous data integration.  ...  The OpenAIRE initiative is the point of reference for Open Access in Europe and aims at the creation of an e-Infrastructure for the free flow, access, sharing, and re-use of research outcomes, services  ...  Grouping duplicates requires the identification of the connected components formed by the equivalence relationships identified by duplicate identification.  ... 
doi:10.5281/zenodo.996005 fatcat:b3z6ooajxrhhjmwsblhstdilxa

Specimens as Research Objects: Reconciliation Across Distributed Repositories to Enable Metadata Propagation

Nicky Nicolson, Alan Paton, Sarah Phillips, Allan Tucker
2018 2018 IEEE 14th International Conference on e-Science (e-Science)  
Following a data mining exercise applied to an aggregated dataset of 19,827,998 specimen records from 292 separate specimen repositories, 36% or 7,102,710 specimens are assessed to participate in duplication  ...  The results enable the creation of networks to identify which repositories could work in collaboration.  ...  Duplicate identification and assessment Of the 19,489,798 data mined records, 7,347,705 records participate in a duplicate relationship, forming 2,914,181 duplicate groups.  ... 
doi:10.1109/escience.2018.00028 dblp:conf/eScience/NicolsonPPT18 fatcat:bo3obmou7vakbk3w73xjba5hp4

Gdup: An Integrated And Scalable Graph Deduplication System

Claudio Atzori, Cinzia Bernardeschi, Paolo Manghi
2016 Zenodo  
In this thesis we start from the experiences and solutions for duplicate identification in Big Data collections and address the broader and more complex problem of 'Entity Deduplication over Big Graphs  ...  As things stand today, data curators can find a plethora of tools supporting duplicate identification for Big collections of objects, which they can adopt to efficiently process the objects of individual  ...  Pain, frustration, hardships, encouragement, trust and joy are just a few of the endless moods that accompanied me during this journey.  ... 
doi:10.5281/zenodo.1454879 fatcat:2gwjffyufreoxohd5khem77kpq

Openaire Lod Services: Scholarly Communication Data As Linked Data

Giorgos Alexiou, Sahar Vahdati, Christoph Lange, George Papastefanatos, Steffen Lohmann
2017 Zenodo  
We furthermore explore how this novel integration of data about research can facilitate scholarly communication.  ...  We present a scalable and maintainable architecture that converts the OpenAIRE data from its original HBase NoSQL source to RDF.  ...  A typical problem is related to the persistent identification of published entities.  ... 
doi:10.5281/zenodo.293836 fatcat:3thu6lztuzgjxgrgynlhmlpacm

Advanced grouping and aggregation for data integration

Eike Schallehn, Kai-Uwe Sattler, Gunter Saake
2001 Proceedings of the tenth international conference on Information and knowledge management - CIKM'01  
The general concept of grouping and aggregation appears to be a fitting paradigm for a number of the mentioned issues, but in its common form of equality based groups and restricted aggregate functions  ...  We propose generic interfaces for user-defined grouping and aggregation as part of a SQL extension, allowing for more complex functions, for instance integration of data mining algorithms.  ...  Basics of user-defined Grouping and Aggregation Obviously, using grouping for duplicate identification and aggregation for reconciliation depends heavily on the problem domain.  ... 
doi:10.1145/502684.502685 fatcat:5oipvqkm45hwzifrvmr4wu2ari

Advanced grouping and aggregation for data integration

Eike Schallehn, Kai-Uwe Sattler, Gunter Saake
2001 Proceedings of the tenth international conference on Information and knowledge management - CIKM'01  
The general concept of grouping and aggregation appears to be a fitting paradigm for a number of the mentioned issues, but in its common form of equality based groups and restricted aggregate functions  ...  We propose generic interfaces for user-defined grouping and aggregation as part of a SQL extension, allowing for more complex functions, for instance integration of data mining algorithms.  ...  Basics of user-defined Grouping and Aggregation Obviously, using grouping for duplicate identification and aggregation for reconciliation depends heavily on the problem domain.  ... 
doi:10.1145/502585.502685 dblp:conf/cikm/SchallehnSS01 fatcat:zol5g4tmvzev5feyvvjel77fhm

Bridging registries of research organizations. Supporting disambiguation and improving the quality of data

Claudio Atzori, Gina Pavone
2021 Zenodo  
In OpenOrgs, data curators can enrich the metadata description of organizations and resolve the ambiguity of duplicates detected with an automated process by stating whether two or more entities correspond  ...  Their names can be derived from various data sources, each of which often contains a different version of the organization's name (full legal name, short or alternative names, acronym, and so on) and different  ...  quality of the data is a key aspect enabling the realization of data-centric services.  ... 
doi:10.5281/zenodo.5101095 fatcat:5fd2k5wvyvg63dxi5ld36cm3va

The data-literature interlinking service

Adrian Burton, Hylke Koers, Paolo Manghi, Sandro La Bruzzo, Amir Aryani, Michael Diepenbroek, Uwe Schindler
2017 Program  
The Research Data Alliance (RDA) Publishing Data Services Working Group (PDS-WG) aims to address this issue of fragmentation by bringing together different stakeholders to agree on a common infrastructure  ...  Design/methodology/approach -This paper presents the synergic effort of the RDA PDS-WG and the OpenAIRE infrastructure toward enabling a common infrastructure for exchanging data-literature links by realizing  ...  The service requires de-duplication tools, capable of identifying groups of duplicates by matching their properties and merging them into one "representative" object.  ... 
doi:10.1108/prog-06-2016-0048 fatcat:kdy4hurpvzeaddpyy4vnr5nmsu

Tools for Collection, Analysis and Visualization of Data from the Stockholm Convention Global Monitoring Plan on Persistent Organic Pollutants [chapter]

Jakub Gregor, Richard Hůlek, Jiří Jarkovský, Jana Borůvková, Jiří Kalina, Kateřina Šebková, Daniel Schwarz, Jana Klánová, Ladislav Dušek
2013 IFIP Advances in Information and Communication Technology  
been performed on the basis of the mandate given by Global Coordination Group for GMP and Secretariat of the Stockholm Convention: content analysis of the GMP monitoring reports published in 2009, on-line  ...  visualization tool for browsing and analyzing collected data from the monitoring reports, and proposal of a design of future data collection campaigns.  ...  This work has been supported by the project TB010MZP058 "Development of the system for spatial evaluation of the environmental contamination". On-Line Tools for GMP Data  ... 
doi:10.1007/978-3-642-41151-9_21 fatcat:3ojenj4hbjcotgmrby65tblixu

The Data-Literature Interlinking Service

Adrian Burton, Hylke Koers, Paolo Manghi, Sandro La Bruzzo, Amir Aryani, Michael Diepenbroek, Uwe Schindler
2020 Zenodo  
These operate in silos so that content cannot be readily combined to deliver a network graph connecting research data and literature in a comprehensive and reliable way.  ...  The RDA Publishing Data Services Working Group (PDS-WG) aims to address this issue of fragmentation by bringing together different stakeholders to agree on a common infrastructure for sharing links between  ...  The authors would like to thank the PDS-WG members and representatives from CrossRef, DataCite, The National Data Service, ORCID, The Research Data Alliance, ICSU World Data Systems, and the RMap project  ... 
doi:10.5281/zenodo.3776087 fatcat:cbvln3psurfcfjvccjtwh4t2hq

Sample data processing in an additive and reproducible taxonomic workflow by using character data persistently linked to preserved individual specimens

Norbert Kilian, Tilo Henning, Patrick Plitzner, Andreas Müller, Anton Güntsch, Ben C. Stöver, Kai F. Müller, Walter G. Berendsohn, Thomas Borsch
2015 Database: The Journal of Biological Databases and Curation  
Sample data processing in an additive and reproducible taxonomic workflow by using character data persistently linked to preserved individual specimens.  ...  ) Downloaded from data exchange via standard exchange formats and enable the link between the character datasets and samples in research collections, ensuring high visibility and instant reusability of  ...  The concept of linking derivatives and their visualization in the EDIT Data Portal has been developed in discussion with and using the work by Wolf-Henning Kusber (BGBM, Berlin) for AlgaTerra.  ... 
doi:10.1093/database/bav094 pmid:26424081 pmcid:PMC4589695 fatcat:5p43axpn2raoleckxhxpszozjq

Posting Behaviour Patterns in an Online Smoking Cessation Social Network: Implications for Intervention Design and Development

Benjamin Healey, Janet Hoek, Richard Edwards, Lion Shahab
2014 PLoS ONE  
First week interaction data enabled identification of Minimally Engaged Users with high specificity and sensitivity (AUROC = 0.94).  ...  Results: Repeating periodic peaks and troughs in aggregate activity related not only to seasonality (e.g., New Year), but also to day of the week.  ...  Acknowledgments The authors wish to thank the Quit Group, and Hayley Guiney (Analyst) in particular, for their assistance and insights during the data collection and interpretation of results for this  ... 
doi:10.1371/journal.pone.0106603 pmid:25192174 pmcid:PMC4156345 fatcat:talecivrzrb27hzxedcwpe7sk4

Consistency and Trends of Technological Innovations: A Network Approach to the International Patent Classification Data [chapter]

Yuan Gao, Zhen Zhu, Massimo Riccaboni
2017 Studies in Computational Intelligence  
Based on the OECD Triadic Patent Family database, this study constructs a cohort network based on the grouping of IPC subclasses in the same patent families, and a citation network based on citations between  ...  Classifying patents by the technology areas they pertain is important to enable information search and facilitate policy analysis and socio-economic studies.  ...  Temporal Networks The 2 networks mentioned above are aggregated from all the available years. in order to capture the changes over time, we split the data by year.  ... 
doi:10.1007/978-3-319-72150-7_60 fatcat:vg6nwftgcje5feheeaarmdkb7q

uBioRSS: Tracking taxonomic literature using RSS

P. R. Leary, D. P. Remsen, C. N. Norton, D. J. Patterson, I. N. Sarkar
2007 Bioinformatics  
These standardized syndication formats deliver content directly to the subscriber, allowing them to locally aggregate content from a variety of sources instead of having to find the information on multiple  ...  It aggregates syndicated content from academic publishers and science news feeds, and then uses a taxonomic Named Entity Recognition algorithm to identify and index taxonomic names within those data streams  ...  Conflict of Interest: none declared.  ... 
doi:10.1093/bioinformatics/btm109 pmid:17392332 fatcat:gnitncs27jgknp4odho7ksy63a
« Previous Showing results 1 — 15 out of 21,178 results