1,930 Hits in 3.6 sec

Modeling and Imputation of Large Incomplete Multidimensional Datasets [chapter]

Xintao Wu, Daniel Barbará
2002 Lecture Notes in Computer Science  
The presence of missing or incomplete data is a commonplace in large realword databases. In this paper, we study the problem of missing values which occur at the measure dimension of data cube.  ...  We propose a two-part mixture model, which combines the logistic model and loglinear model together, to predict and impute the missing values.  ...  Empirical evaluation This paper targets modeling and imputation of missing semicontinuous attribute values in large multidimensional dataset.  ... 
doi:10.1007/3-540-46145-0_28 fatcat:gjqqlznqbjf7veqm6wuzyazfum

Multiple Imputation of missing values in exploratory factor analysis of multidimensional scales: estimating latent trait scores

Urbano Lorenzo-Seva, Joost R. Van Ginkel
2016 Anales de Psicología  
We applied the approach in a real dataset where missing responses were artificially introduced following a real pattern of non-responses and a simulation study based on artificial datasets.  ...  In this paper we focus on the exploratory factor analysis of multidimensional scales (i.e., scales that consist of a number of subscales) where each subscale is made up of a number of Likert-type items  ...  incomplete data can be analyzed with IRT models and esti- Even if the problem of item nonresponse is as old as mates of latent abilities.  ... 
doi:10.6018/analesps.32.2.215161 fatcat:q7be2tgtwjerxhqnhb4ecd3j6e

Simultaneous Incomplete Traffic Data Imputation and Similarity Pattern Discovery with Bayesian Nonparametric Tensor Decomposition

Yaxiong Han, Zhaocheng He
2020 Journal of Advanced Transportation  
In this paper, we propose the Bayesian nonparametric tensor decomposition (BNPTD) to achieve incomplete traffic data imputation and similarity pattern discovery simultaneously.  ...  BNPTD is a hierarchical probabilistic model, which is comprised of Bayesian tensor decomposition and Dirichlet process mixture model.  ...  a hierarchical probabilistic model, which can achieve incomplete traffic data imputation and similarity pattern discovery simultaneously.  ... 
doi:10.1155/2020/8810753 fatcat:osj2i6ukinezvicl2zyrsqvoue

Internal Data Imputation in Data Warehouse Dimensions [chapter]

Yuzhao Yang, Fatma Abdelhédi, Jérôme Darmont, Franck Ravat, Olivier Teste
2021 Lecture Notes in Computer Science  
As consequence, we propose in this article an internal data imputation method for multidimensional data warehouse based on the existing data and considering the intra-dimension and inter-dimension relationships  ...  Some other data imputation methods need extra time and effort costs.  ...  Data in DWs are usually modeled in a multidimensional way, which helps users consult and analyze aggregated data with On-Line Analytical Processing (OLAP).  ... 
doi:10.1007/978-3-030-86472-9_22 fatcat:hevso3cezzc2hl4sisxebwn7dy

Similarity Detection for Higher-Order Structure of DNA Sequences

Nguyen Thi Ngoc Anh, Ho Phan Hieu, Tran Anh Kiet, Vo Trung Hung
2017 Journal of Science and Technology Issue on Information and Communications Technology  
With the advances in data collection and storage capabilities, large amount of multidimensional dataset, known as higher-order data representation, has been generated on bioinformatics applications recently  ...  This paper thus proposes a mathematical modeling could be capable of the multidimensional problem of DNA similarity detection with high accuracy and reliability.  ...  Acknowledgment This research is funded by Funds for Science and Technology Development of the University of Danang under grant number B2017-DN01-07 and B2017-DN03-07.  ... 
doi:10.31130/jst.2017.51 fatcat:w7gl57cayzbvvgknnnqwdoh3ga

Missing Value Imputation on Multidimensional Time Series [article]

Parikshit Bansal, Prathamesh Deshpande, Sunita Sarawagi
2021 arXiv   pre-print
We present DeepMVI, a deep learning method for missing value imputation in multidimensional time-series datasets.  ...  One strategy is imputing the missing values, and a wide variety of algorithms exist spanning simple interpolation, matrix factorization methods like SVD, statistical models like Kalman filters, and recent  ...  Comparison on Imputation Accuracy Given the large number of datasets, methods, missing scenarios and missing sizes we present our numbers in stages.  ... 
arXiv:2103.01600v2 fatcat:bhojfu55ujev3kl46qziowjatu

Time Series Data Imputation: A Survey on Deep Learning Approaches [article]

Chenguang Fang, Chen Wang
2020 arXiv   pre-print
We will review and discuss their model architectures, their pros and cons as well as their effects to show the development of the time series imputation methods.  ...  Currently, time series data imputation is a well-studied problem with different categories of methods.  ...  Constraint based methods [43, 42] discover the rules in dataset, and take advantage of these rules to impute.  ... 
arXiv:2011.11347v1 fatcat:a27t7fsu7bcbxkwjamcjpxhzea

Differentiable and Scalable Generative Adversarial Models for Data Imputation [article]

Yangyang Wu and Jun Wang and Xiaoye Miao and Wenjia Wang and Jianwei Yin
2022 arXiv   pre-print
SCIS consists of two modules, differentiable imputation modeling (DIM) and sample size estimation (SSE).  ...  for large-scale incomplete data.  ...  The above methods calculate the model gradients with a series of random partitions of the dataset, to train the imputation models over large-scale incomplete data.  ... 
arXiv:2201.03202v1 fatcat:nhzoo7hixjha5opg5qb6kn2sb4

Tensor Data Imputation by PARAFAC with Updated Chaotic Biases by Adam Optimizer

Pooja Choudhary, Kanwal Garg
2021 International journal of recent technology and engineering  
The idea has experimented with Netflix and traffic datasets from Guangzhou, China.  ...  The biases are created and updated by a chaotic exponential factor in Adam's optimization, which reduces the imputation error.  ...  A multidimensional EEG dataset was analyzed by the Bayesian Tensor Factorization model [20] .  ... 
doi:10.35940/ijrte.e5291.039621 fatcat:gg52oizc3ja6xk7by4ydzniqvq

Handling missing rows in multi-omics data integration: multiple imputation in multiple factor analysis framework

Valentin Voillet, Philippe Besse, Laurence Liaubet, Magali San Cristobal, Ignacio González
2016 BMC Bioinformatics  
Results: We assessed the performance of our method, named MI-MFA, on two real omics datasets. Incomplete artificial datasets with different patterns of missingness were created from these data.  ...  ., regularized iterative MFA (RI-MFA) and mean variable imputation (MVI-MFA).  ...  Ethics approval and consent to participate Not applicable. The liver toxicity data set were already published in [29] .  ... 
doi:10.1186/s12859-016-1273-5 pmid:27716030 pmcid:PMC5048483 fatcat:usj7eysd2zgxridpo3swuviaxm

Decomposition Methods for Machine Learning with Small, Incomplete or Noisy Datasets

Cesar Federico Caiafa, Jordi Solé-Casals, Pere Marti-Puig, Sun Zhe, Toshihisa Tanaka
2020 Applied Sciences  
Correct handling of incomplete, noisy or small datasets in machine learning is a fundamental and classic challenge.  ...  In this article, we provide a unified review of recently proposed methods based on signal decomposition for missing features imputation (data completion), classification of noisy samples and artificial  ...  Conflicts of Interest: The authors declare no conflict of interest.  ... 
doi:10.3390/app10238481 fatcat:2gqm3tos4vdorptqayewqu2mum

DTW-Approach for uncorrelated multivariate time series imputation

Thi-Thu-Hong Phan, Emilie Poisson Caillault, Andrhe Bigand, Alain Lefebvre
2017 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)  
Data analysis with missing values can lead to a loss of efficiency and unreliable results, especially for large missing sub-sequence(s).  ...  Some well-known methods for multivariate time series imputation require high correlations between series or their features.  ...  In particular, our method further proves the ability to fill in incomplete data with large missing rates (7.5% and 10% on Marel Carnot dataset).  ... 
doi:10.1109/mlsp.2017.8168165 dblp:conf/mlsp/PhanCBL17 fatcat:6mamdwptavbqrge54dvaveej4a

Imputation techniques on missing values in breast cancer treatment and fertility data [article]

Xuetong Wu, Hadi Akbarzadeh Khorshidi, Uwe Aickelin, Zobaida Edib, Michelle Peate
2020 arXiv   pre-print
However, clinical datasets often suffer from high missingness, which adversely impacts the quality of modelling if handled improperly.  ...  This study examines a series of machine learning based imputation methods and suggests an efficient approach to in preparing a good quality breast cancer (BC) dataset, to find the relationship between  ...  Another way to utilise the knowledge of whole dataset is the model-based approach.  ... 
arXiv:2011.09912v1 fatcat:su2nw4kspnczrdbqzss2i5gazm

Statistical file matching of flow cytometry data

Gyemin Lee, William Finn, Clayton Scott
2011 Journal of Biomedical Informatics  
This requires us to perform clustering with missing data, which we address with a mixture model approach and novel EM algorithm.  ...  We show that simple nearest neighbor based imputation can lead to spurious subpopulations in the imputed data, and introduce an alternative approach based on nearest neighbor imputation restricted to a  ...  When imputing incomplete units in file 2, the roles change.  ... 
doi:10.1016/j.jbi.2011.03.004 pmid:21406248 fatcat:n6g33e3rtnddxoloko6sioc4zm

Statistical File Matching of Flow Cytometry Data [article]

Gyemin Lee Department of Electrical Engineering and Computer Science, University of Michigan, Department of Statistics, University of Michigan)
2010 arXiv   pre-print
This requires us to perform clustering with missing data, which we address with a mixture model approach and novel EM algorithm.  ...  We show that simple nearest neighbor based imputation can lead to spurious subpopulations in the imputed data, and introduce an alternative approach based on nearest neighbor imputation restricted to a  ...  Then pairs of markers that are not measured together can still be visualized through scatter plots, and methods of multidimensional analysis may be applied to the full dataset.  ... 
arXiv:1003.5539v1 fatcat:sf6ddm2tabg53f3wzhanxiggbq
« Previous Showing results 1 — 15 out of 1,930 results