655,470 Hits in 9.7 sec

Heterogeneous Data and Big Data Analytics

Lidong Wang
2017 Automatic Control and Information Sciences  
This paper introduces data processing methods for heterogeneous data and Big Data analytics, Big Data tools, some traditional data mining (DM) and machine learning (ML) methods.  ...  Deep learning and its potential in Big Data analytics are analysed.  ...  Techniques for feature selection can be divided in two approaches: feature ranking and subset selection.  ... 
doi:10.12691/acis-3-1-3 fatcat:t3yzrk4r2bfornki34khobe4su

Spatial data fusion in Spatial Data Infrastructures using Linked Data

Stefan Wiemann, Lars Bernard
2015 International Journal of Geographical Information Science  
Acknowledgments We would like to thank the editor and the anonymous reviewers for their suggestions and comments that helped to improve the quality of the article.  ...  Disclosure statement No potential conflict of interest was reported by the authors.  ...  A method of unique identification management within SDIs is a prerequisite for the application of Linked Data concepts.  ... 
doi:10.1080/13658816.2015.1084420 fatcat:at7pszz4ajej3c4xwcylfcyfpm

Identifying and Preventing Data Leakage in Multi-relational Classification

Hongyu Guo, Herna L. Viktor, Eric Paquet
2010 2010 IEEE International Conference on Data Mining Workshops  
This paper demonstrates this potential for privacy leakage in multirelational classification and illustrates how such potential leaks may be detected.  ...  We propose a method to generate a ranked list of subschemas which maintains the predictive performance on the class attribute, while limiting the disclosure risk, and predictive accuracy, of confidential  ...  For complex databases, it is becoming more difficult to detect, avoid and limit the inference capabilities between attributes, especially during data mining.  ... 
doi:10.1109/icdmw.2010.33 dblp:conf/icdm/GuoVP10 fatcat:q3e6lsi2urefnlquzkdjbyumim

Sampling labelled profile data for identity resolution

Matthew Edwards, Stephen Wattam, Paul Rayson, Awais Rashid
2016 2016 IEEE International Conference on Big Data (Big Data)  
We validate the comparability of samples drawn through this method and discuss the implications of this mechanism for researchers as well as potential alternatives and extensions.  ...  Identity resolution capability for social networking profiles is important for a range of purposes, from open-source intelligence applications to forming semantic web connections.  ...  Limitations of the tool We have realised our implementation in a Python tool capable of sampling ground truth data from the primary and secondary networks given in this paper.  ... 
doi:10.1109/bigdata.2016.7840645 dblp:conf/bigdataconf/EdwardsWRR16 fatcat:daveavhj4rc2nng4skbwibcszu

A Comparative Analysis of Different Categorical Data Clustering Ensemble Methods in Data Mining

S. Sarumathi, N. Shanthi, M. Sharmila
2013 International Journal of Computer Applications  
This paper reveals the comparative study of different cluster ensemble methods along with their features, systematic working process and the average accuracy and error rates of each ensemble methods.  ...  Moreover a myriad of algorithms and methods has been proposed focusing on clustering different data types, representation of cluster models, and accuracy rates of the clusters.  ...  Hence the main salient features of this method are the capability of handling the high dimensionality and multi view data issues.  ... 
doi:10.5120/14004-2050 fatcat:6m2ztxxf7jfapnlkmt5hbrlbyi

Data Augmentation for Graph Data: Recent Advancements [article]

Maria Marrium, Arif Mahmood
2022 arXiv   pre-print
Data Augmentation techniques for images and text data can not be used for graph data because of the complex and non-euclidean structure of graph data.  ...  Graph Neural Network (GNNs) based methods have recently become a popular tool to deal with graph data because of their ability to incorporate structural information.  ...  Augmentation Methods for Link Prediction with SOTAs.  ... 
arXiv:2208.11973v1 fatcat:alz3xcoz3rfd7m6vbeikmubjy4

Population Data Centre Profiles: Centre for Data Linkage

James H Boyd, Sean Randall, Adrian P Brown, Max Maller, Davie Botes, Margo Gillies, Anna Ferrante
2020 International Journal of Population Data Science  
The Centre has been instrumental in the development of practical methods for privacy-preserving record linkage, with this methodology now regularly used for real-world linkages.  ...  The Centre for Data Linkage (CDL) was established at Curtin University, Western Australia, to develop infrastructure to enable cross-jurisdictional record linkage in Australia.  ...  demand for linked data and the increasing volume and complexity of datasets to be linked.  ... 
doi:10.23889/ijpds.v4i2.1139 pmid:32935041 pmcid:PMC7473267 fatcat:qrwp4s4fmrcr7camnpfygzkikq

Data Anonymization using Pseudonym System to Preserve Data Privacy

Shukor Razak, Nur Hafizah, Arafat Al-Dhaqm
2020 IEEE Access  
General services employ a unique identifier for the aim of storing data in a digital database. However, it may be associated with some limitations and challenges.  ...  There is a link between the unique identifier and the data holder, e.g., name, address, Identity card number, etc. Attackers can manipulate a unique identifier for stealing the whole data.  ...  Their method was found capable of combining multiparty vertically-partitioned data in a secure way.  ... 
doi:10.1109/access.2020.2977117 fatcat:ugad6oujcfdkzg7q5h4jdbyetm

Developing an Online Resource Center about Geospatial Data Preservation

Robert R. Downs, Robert C. Chen
2020 Zenodo  
Enabling the future use of geospatial data can foster new opportunities for learning and facilitate capabilities for scientific investigations to build on the results of previous research.  ...  A key challenge is to promote awareness of the need for preservation and the approaches, methods, and tools available to support preservation efforts.  ...  & Training, Tools, and Digital Preservation Policy -Quick links for categories of users (data managers, system developers, researchers) • Content and search capabilities -Thank you!  ... 
doi:10.5281/zenodo.3781628 fatcat:mmevzsuj45bnljebcdwye4abz4

Data Wrangling- A Goliath of Data Industry

Ritvik Voleti, KCC ITM
2020 International Journal of Engineering Research and  
It is a method in which we have data identification, extracting, cleaning and integrating data for a dataset which would be analyzed as needed.  ...  Data Wrangling is much more than just modifying and cleaning data and provides user the benefit of interactive and an efficient data.  ...  Regularly, in large data of files, a group of columns that are closely linked, hence showing that containing redundant data, that only provides in featuring and making selecting the model tougher.  ... 
doi:10.17577/ijertv9is080122 fatcat:2vxeguat6baxbg2i2b3oi5yjny

Social Media Data Extraction Method Benchmarking Comparison

Zhenhuan Sui
2019 International Journal on Data Science and Technology  
Seven criteria related to features are applied to compare the methods for ease of use, extraction timing and capability to accommodate big data.  ...  Given that our results may be approximate because we might not be able to observe all the capability and features of the software, our results show that Python plus Tweepy method is the most ideal one  ...  Next Analytics method should be selected.  ... 
doi:10.11648/j.ijdst.20190502.12 fatcat:a63l7g6225h6dbath7akdfnnsu

Carpé data

Max Van Kleek, Daniel A. Smith, Heather S. Packer, Jim Skinner, Nigel R. Shadbolt
2013 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems - CHI '13  
Observations from these pre-studies led to Dat-aPalette, an interface that introduced simple co-reference and group multi-path-selection mechanisms for working with terminologically and structurally heterogeneous  ...  The information processing capabilities of humans enable them to opportunistically draw and integrate knowledge from nearly any information source.  ...  ACKNOWLEDGEMENTS The work presented in this paper was supported by the SOCIAM project, funded by the Engineering and Physical Sciences Research Council under contract EP/J017728/1. Dr  ... 
doi:10.1145/2470654.2481324 dblp:conf/chi/KleekSPSS13 fatcat:wuwo75xtarfqzlfdsz2jusluum

Big Data Visualisation [chapter]

Miguel Ángel Esbrí, Eva Klien, Karel Charvát, Christian Zinke-Wehlmann, Javier Hitado, Caj Södergård
2021 Big Data in Bioeconomy  
These examples show that there are many technologies and software components available for big data visualisation, but they also point to limitations and the need for further research and development.  ...  AbstractIn this chapter, we introduce the topic of big data visualization with a focus on the challenges related to geospatial data.  ... 6 An extension of JSON for Linked Data is JSON-LD (JavaScript Object Notation for Linked Data), which is a method of encoding Linked Data using JSON  ... 
doi:10.1007/978-3-030-71069-9_13 fatcat:sj5iht2ngzcftmjnufcry7avie

A Domain-Agnostic Tool for Scalable Ontology Population and Enrichment from Diverse Linked Data Sources

Efstratios Kontopoulos, Panagiotis Mitzias, Marina Riga, Ioannis Kompatsiaris
2017 International Conference on Data Analytics and Management in Data Intensive Domains  
The paper argues that the rapidly increasing array of published Linked Datasets can serve as the input for large-scale ontology population in data-intensive domains and presents PROPheT, a novel software  ...  tool for ontology population and enrichment.  ...  We would also like to thank the anonymous reviewers for their valuable remarks, thanks to which the paper has been significantly improved.  ... 
dblp:conf/rcdl/KontopoulosMRK17 fatcat:dcnqstac5nettedtohptiziwra

Efficient Model-Data Integration for Flexible Modeling, Parameter Analysis and Visualization, and Data Management

Angela Gregory, Chao Chen, Rui Wu, Sarah Miller, Sajjad Ahmad, John W. Anderson, Hays Barrett, Karl Benedict, Dan Cadol, Sergiu M. Dascalu, Donna Delparte, Lynn Fenstermaker (+8 others)
2020 Frontiers in Water  
The developed data management technologies provide a suite of capabilities, enabling diverse computation capabilities, data storage capacity, connectivity, and accessibility.  ...  and centralized data storage, enabled the statistical distribution of hydrometeorological model input, and coupled models using multiple methods, both to each other and to a distributed data management  ...  HRU Selection Methods Two different methods of selecting HRUs are available in the system: parameter and manual selection.  ... 
doi:10.3389/frwa.2020.00002 fatcat:qiedbymr3rcq3ab36iziswe3dy
« Previous Showing results 1 — 15 out of 655,470 results