2,560 Hits in 9.8 sec

Advantage of integration in big data: Feature generation in multi-relational databases for imbalanced learning

Farrukh Ahmed, Michele Samorani, Colin Bellinger, Osmar R. Zaiane
2016 2016 IEEE International Conference on Big Data (Big Data)  
It becomes further complicated in the realm of Big Data where related information is spread over different data repositories.  ...  This work focuses on the automatic construction of a mining table by aggregating information from multiple local tables and additional data sources as external tables in a multi-relational database.  ...  However, the value of data is in the integration of disparate data sources and this integration is a major pillar of Big Data, yet very few tools exist to truly take advantage of data integration in Big  ... 
doi:10.1109/bigdata.2016.7840644 dblp:conf/bigdataconf/AhmedSBZ16 fatcat:yxwoeguvg5bz7o4muol2ctsa2q

A Review on Classification of Data Imbalance using BigData

Ramasubramanian, Hariharan Shanmugasundaram
2021 International Journal of Managing Information Technology  
With the advancement of technology and increase in the generation of real-time data from various sources like Internet, IoT and Social media it needs more processing and challenging.  ...  In this paper, the author analysis the data imbalance models using big data and classification algorithm.  ...  ACKNOWLEDGEMENTS The authors would like to extend sincere thanks to the management for providing us support and environment for carrying out the research and to other fellow colleagues for their support  ... 
doi:10.5121/ijmit.2021.13302 fatcat:7v52ofngqvgyjarvlsrqqzoimm

Big data preprocessing: methods and prospects

Salvador García, Sergio Ramírez-Gallego, Julián Luengo, José Manuel Benítez, Francisco Herrera
2016 Big Data Analytics  
The massive growth in the scale of data has been observed in recent years being a key factor of the Big Data scenario.  ...  Big Data can be defined as high volume, velocity and variety of data that require a new high-performance processing.  ...  Acknowledgements This work was partially supported by the Spanish Ministry of Science and Technology under project TIN2014-57251-P and the Andalusian Research Plan P11-TIC-7765.  ... 
doi:10.1186/s41044-016-0014-0 fatcat:z3lqu2yi3vey3khbdal6mu34qa

Intelligent fault diagnosis system based on big data

Tianshu Wu, Shuyu Chen, Peng Wu
2019 The Journal of Engineering  
In view of the actual problems existing in life-cycle health monitoring and diagnosis of large complex equipment, the machine-learning algorithm is applied to data mining of the equipment operation big  ...  The system uses uncertain fault prediction method and hybrid intelligent algorithm to discover the hierarchical association between operation feature big data and operation faults, the feature extraction  ...  Acknowledgments This work is supported by National Natural Science Foundation of China (Grant No. 61272399, 61572090).  ... 
doi:10.1049/joe.2018.9162 fatcat:6mvo2d4dfzfj7nxtvs7xf6epuy

Exploring complex and big data

Jerzy Stefanowski, Krzysztof Krawiec, Robert Wrembel
2017 International Journal of Applied Mathematics and Computer Science  
All in all, we consider it to be the truly defining feature of big data (posing particular research and technological challenges), which ultimately seems to be of greater importance than the sheer data  ...  We then survey the dedicated solutions for storing and processing big data, including a data lake, virtual integration, and a polystore architecture.  ...  Researchers have also started studying streams of graphs and more sophisticated relational structures (e.g., relational learning in streams) as well as multi-labeled data streams or sequence predictions  ... 
doi:10.1515/amcs-2017-0046 fatcat:q6ugvobzi5cmbos4ct52mb3d34

Imbalanced Data Classification for Multi-source Heterogenous Sensor Networks

Wei Wang, Mengjun Zhang, Li Zhang, Qiong Bai
2020 IEEE Access  
Therefore, we propose the imbalanced multi-source heterogeneous data classification algorithms in this paper, which are mainly based on the expansion and extension of Support Vector Machines.  ...  We perform tensor representation and feature extraction on the heterogeneous data, and two different classification algorithms are proposed in this paper.  ...  Zieba et al. in [9] used active learning to remove redundant samples and classify imbalanced data sets by integrating multiple SVM classifiers.  ... 
doi:10.1109/access.2020.2966324 fatcat:h3eta76sh5hdvphgtz76lfmwiy

A Profit Function-Maximizing Inventory Backorder Prediction System using Big Data Analytics

Petr Hajek, Mohammad Zoynul Abedin
2020 IEEE Access  
We show that the proposed inventory backorder prediction model shows better prediction and profit function performance than the state-of-the-art machine learning methods used for large imbalanced data.  ...  To provide those inventory models with a big data-driven backorder prediction, we propose a machine learning model equipped with an undersampling procedure to maximize the expected profit of backorder  ...  Indeed, big data analytics generate competitive advantages by mining important information from the high-volume databases.  ... 
doi:10.1109/access.2020.2983118 fatcat:wuuaycvs2vhljd3af6adhrxwey

Feature selection methods and genomic big data: a systematic review

Khawla Tadist, Said Najah, Nikola S. Nikolov, Fatiha Mrabti, Azeddine Zahi
2019 Journal of Big Data  
In this paper, we present a systematic and structured literature review of the feature-selection techniques used in studies related to big genomic data analytics.  ...  With the absence of a thorough investigation of the field, it is almost impossible for researchers to get an idea of how their work relates to existing studies as well as how it contributes to the research  ...  Acknowledgements The authors thank the anonymous reviewers for their helpful suggestions and comments Authors' contributions All mentioned authors contribute in the elaboration of the paper.  ... 
doi:10.1186/s40537-019-0241-0 fatcat:ju4fy5sh3vh6lfuofiwbbizvsu

Table2Vec: Automated Universal Representation Learning to Encode All-round Data DNA for Benchmarkable and Explainable Enterprise Data Science [article]

Longbing Cao, Chengzhang Zhu
2021 arXiv   pre-print
We illustrate Table2Vec in characterizing all-round customer data DNA in an enterprise on complex heterogeneous multi-relational big tables to build universal customer vector representations.  ...  Table2Vec integrates automated universal representation learning on low-quality enterprise data and downstream learning tasks.  ...  They are effective and efficient in terms of tailoring data, features and processes for specific learning tasks and business needs, enabling personalized processing and modeling by taking advantage of  ... 
arXiv:2112.01830v1 fatcat:jolg4eiidvekbhbo2s4brkac5e

Analysis of Bayesian optimization algorithms for big data classification based on Map Reduce framework

Chitrakant Banchhor, N. Srinivasu
2021 Journal of Big Data  
AbstractThe process of big data handling refers to the efficient management of storage and processing of a very large volume of data.  ...  The data in a structured and unstructured format require a specific approach for overall handling.  ...  Srinivasu for his valuable and constructive suggestions during the planning and development of this research work. His willingness to give his time so generously has been very much appreciated.  ... 
doi:10.1186/s40537-021-00464-4 fatcat:upfhfxiw2fdxhixuwys2sn4f5e

ExtremeEarth Meets Satellite Data From Space

Desta Haileselassie Hagos, Theofilos Kakantousis, Vladimir Vlassov, Sina Sheikholeslami, Tianze Wang, Jim Dowling, Claudia Paris, Daniele Marinelli, Giulio Weikmann, Lorenzo Bruzzone, Salman Khaleghian, Thomas Krmer (+14 others)
2021 IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing  
that enables scalable data processing, machine learning, and deep learning on Copernicus data, and development of very large training datasets for deep learning architectures targeting the classification  ...  Bringing together a number of cutting-edge technologies that range from storing extremely large volumes of data all the way to developing scalable machine learning and deep learning algorithms in a distributed  ...  His work focuses on distributed tools for the transformation of large volumes of data in the RDF model and interlinking techniques for linked geospatial data.  ... 
doi:10.1109/jstars.2021.3107982 fatcat:fxmpayska5bvlj7ibw3peqhuzu

Survey on deep learning with class imbalance

Justin M. Johnson, Taghi M. Khoshgoftaar
2019 Journal of Big Data  
deep learning techniques for addressing class imbalanced data.  ...  Several areas of focus include: data complexity, architectures tested, performance interpretation, ease of use, big data application, and generalization to other domains.  ...  Acknowledgements The authors would like to thank the anonymous reviewers for their constructive evaluation of this paper, and the various members of the Data Mining and Machine Learning Laboratory, Florida  ... 
doi:10.1186/s40537-019-0192-5 fatcat:dor65fgn7ffhxmqqv3mkold6wq

Predictive Analytics On Big Data - An Overview

Gayathri Nagarajan, Dhinesh Babu L.D
2019 Informatica (Ljubljana, Tiskana izd.)  
Big data generated in different domains and industries are voluminous and the velocity at which they are generated is pretty high.  ...  The overview throws light on the core predictive models, challenges of these models on big data, research gaps in several domain sectors and using different techniques.  ...  Traditional relational databases, data warehouses and many visualization tools and analytical tools are developed for structured data.  ... 
doi:10.31449/inf.v43i4.2577 fatcat:hqi45o6t7jb63dr3aaesink6l4

Evolving scenario of big data and Artificial Intelligence (AI) in drug discovery

Manish Kumar Tripathi, Abhigyan Nath, Tej P. Singh, A. S. Ethayathulla, Punit Kaur
2021 Molecular diversity  
The accumulation of massive data in the plethora of Cheminformatics databases has made the role of big data and artificial intelligence (AI) indispensable in drug design.  ...  novo molecule design and discovery in this big data era.  ...  Deep learning methods permit automatic generation of higher level hierarchical abstractions from big data that can be used as features, thus reducing the dependency for feature generation in ML.  ... 
doi:10.1007/s11030-021-10256-w pmid:34159484 pmcid:PMC8219515 fatcat:p3lsp57x6rbnxgxdu7y5dggdeu

A survey on data‐efficient algorithms in big data era

Amina Adadi
2021 Journal of Big Data  
AbstractThe leading approaches in Machine Learning are notoriously data-hungry.  ...  This has triggered a serious debate in both the industrial and academic communities calling for more data-efficient models that harness the power of artificial learners while achieving good results with  ...  Indeed, it is common to think of "big data", "machine learning" and related technologies as relatively modern technologies.  ... 
doi:10.1186/s40537-021-00419-9 fatcat:v4uahsvhlzdldlxqf24bshmja4
« Previous Showing results 1 — 15 out of 2,560 results