Filters








1,576 Hits in 4.2 sec

A Map Reduce Hadoop Implementation of Random Tree Algorithm based on Correlation Feature Selection

Aman Gupta, Pranita Jain
2017 International Journal of Computer Applications  
For speedily processing, we introduce a scalable fast approximate attribute reduction algorithm with Map Reduce.  ...  On the other hand, they still cannot deal with massive data. Massive data processing is a difficult problem in the age of big data.  ...  Random Tree algorithm is a commonly used algorithm applied to data classification. But traditional Random Tree algorithm is not fit for the massive data.  ... 
doi:10.5120/ijca2017913055 fatcat:b7tfw5itoffxnijfiqkpfhuac4

Hadoop based Feature Selection and Decision Making Models on Big Data

Thulasi Bikku, N. Sambasiva Rao, Ananda Rao Akepogu
2016 Indian Journal of Science and Technology  
It becomes computationally inaccurate to analyze such big data for decision making systems.  ...  Findings: Most of the traditional classification algorithms have issues such as class imbalance and dimension reduction on Big Data.  ...  selection are implemented for attribute reduction.  ... 
doi:10.17485/ijst/2016/v9i10/88905 fatcat:2h6kz3u6qzf5xgt2coshk6sgqi

Sampling for Scalable Visual Analytics

Bum Chul Kwon, Janu Verma, Peter J. Haas, Cagatay Demiralp
2017 IEEE Computer Graphics and Applications  
Acknowledgments The authors thank Theresa-Marie Rhyne and the anonymous reviewers for their many suggestions, which greatly improved this article.  ...  For massive datasets, it is timeconsuming to access and render even a single tile.  ...  Here, we make a case for sampling as an essential tool for scalable interactive visual analysis.  ... 
doi:10.1109/mcg.2017.6 pmid:28103544 fatcat:a2pru5jsmzbttfo56c3mwocz54

Mining Quality Phrases from Massive Text Corpora

Jialu Liu, Jingbo Shang, Chi Wang, Xiang Ren, Jiawei Han
2015 Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data - SIGMOD '15  
on time and space  Looking forward: High-quality, scalable phrase mining  Facilitate entity recognition and typing in large corpora  Transform massive unstructured data into semi-structured knowledge  ...  Phrase Mining Document 3 Principal Component Analysis is a linear dimensionality reduction technique commonly used in machine learning applications.  ...  Similarity Search  Find high-quality similar phrases based on user's phrase query  In response to a user's phrase query, SegPhrase+ generates high quality, semantically similar phrases  In DBLP, query on "data  ... 
doi:10.1145/2723372.2751523 pmid:26705375 pmcid:PMC4688018 dblp:conf/sigmod/LiuSWRH15 fatcat:6gme5yjdzjdzpaxdywqq7koqiy

Scalable multi stage clustering of tagged micro-messages

Oren Tsur, Adi Littman, Ari Rappoport
2012 Proceedings of the 21st international conference companion on World Wide Web - WWW '12 Companion  
In this work we propose SMSC -a scalable, accurate and efficient multi stage clustering algorithm.  ...  Our algorithm leverages users practice of adding tags to some messages by bootstrapping over virtual non sparse documents.  ...  We attribute that to the batch sampling used in order to achieve scalability. This sampling cannot handle extremely sparse micro-messages, although proven suitable for clustering web pages.  ... 
doi:10.1145/2187980.2188157 dblp:conf/www/TsurLR12 fatcat:tzha5j7eabcfnkadms63i3i5ee

Distributed Pasting of Small Votes [chapter]

N. V. Chawla, L. O. Hall, K. W. Bowyer, T. E. Moore, W. P. Kegelmeyer
2002 Lecture Notes in Computer Science  
Voting many classifiers built on small subsets of data ("pasting small votes") is a promising approach for learning from massive datasets.  ...  Experiments show this approach is fast, accurate, and scalable to massive datasets.  ...  Acknowledgments This work was supported in part by the United States Department of Energy through the Sandia National Laboratories LDRD program and ASCI VIEWS Data Discovery Program, contract number DE-AC04  ... 
doi:10.1007/3-540-45428-4_5 fatcat:6y5xzyvfqfaoxlvp5zwdthug2a

Automatic Entity Recognition and Typing in Massive Text Corpora

Xiang Ren, Ahmed El-Kishky, Chi Wang, Jiawei Han
2016 Proceedings of the 25th International Conference Companion on World Wide Web - WWW '16 Companion  
To turn such massive unstructured text data into actionable knowledge, one of the grand challenges is to gain an understanding of entities and the relationships between them.  ...  In this tutorial, we introduce data-driven methods to recognize typed entities of interest in different kinds of text corpora (especially in massive, domain-specific text corpora).  ...  to Knowledge (BD2K) initiative (www.bd2k.nih.gov), and MIAS, a DHS-IDS Center for Multimodal Information Access and Synthesis at UIUC.  ... 
doi:10.1145/2872518.2891065 dblp:conf/www/RenEWH16 fatcat:nhwxdfwwpbdgpm2mhuitgnmtkm

2018 Index IEEE Transactions on Knowledge and Data Engineering Vol. 30

2019 IEEE Transactions on Knowledge and Data Engineering  
., þ, TKDE March 2018 449-459 Data reduction Multi-Label Learning with Emerging New Labels. Zhu, Y., þ, TKDE Oct. 2018 1901-1914 Redundancy Reduction for Prevalent Co-Location Patterns.  ...  Tang, J., þ, TKDE June 2018 1095-1108 PurTreeClust: A Clustering Algorithm for Customer Segmentation from Massive Customer Transaction Data.  ... 
doi:10.1109/tkde.2018.2882359 fatcat:asiids266jagrkx5eac6higrlq

Bootstrapping Privacy Compliance in Big Data Systems

Shayak Sen, Saikat Guha, Anupam Datta, Sriram K. Rajamani, Janice Tsai, Jeannette M. Wing
2014 2014 IEEE Symposium on Security and Privacy  
for building user trust.  ...  Central to the design of the system are (a) LEGALEASE-a language that allows specification of privacy policies that impose restrictions on how user data is handled; and (b) GROK-a data inventory for Map-Reduce-like  ...  We thank Leena Sheth, Carrie Culley, Boris Asipov and Robert Chen for their contributions to the operational system. We thank Michael Tschantz and the anonymous reviewers for useful feedback.  ... 
doi:10.1109/sp.2014.28 dblp:conf/sp/SenGDRTW14 fatcat:2zour3xzdrdblcstjudopr3pzy

Automatic Entity Recognition and Typing in Massive Text Data

Xiang Ren, Ahmed El-Kishky, Heng Ji, Jiawei Han
2016 Proceedings of the 2016 International Conference on Management of Data - SIGMOD '16  
In this tutorial, we introduce data-driven methods to recognize typed entities of interest in massive, domain-specific text corpora.  ...  ., people, product and food) in a scalable way.  ...  applied to modern-day massive text data.  ... 
doi:10.1145/2882903.2912567 dblp:conf/sigmod/RenEJH16 fatcat:ycuyy6wdt5ffjjucikqbamvbhq

A Fast and Scalable Implementation Method for Competing Risks Data with the R Package fastcmprsk [article]

Eric S Kawaguchi, Jenny I Shen, Gang Li, Marc A Suchard
2019 arXiv   pre-print
However, current implementations are not computationally scalable for large-scale competing risks data.  ...  Numerical studies compare the speed and scalability of our implementation to current methods for unpenalized and penalized Fine-Gray regression and show impressive gains in computational efficiency.  ...  The manuscript was reviewed and approved for publication by an officer of the National Institute of Diabetes and Digestive and Kidney Diseases. Data reported herein were supplied by the USRDS.  ... 
arXiv:1905.07438v1 fatcat:jawe6iw7nna5pplwrzbim7oqqq

Big Data Analytics: Importance, Challenges, Categories, Techniques, and Tools

Sarah Alswedani
2020 International Journal of Advanced Trends in Computer Science and Engineering  
With the gigantic explosion of the volume of data generated every single day, big data analytics has born as a powerful technology for various organizations.  ...  This paper explores the importance of big data analytics for different domains, the challenges of utilizing big data analytics due to the complex nature of big data, and the technical approaches that are  ...  For semi-supervised, bootstrapping technique is an example while for unsupervised learning clustering algorithms can be used [25] .  ... 
doi:10.30534/ijatcse/2020/17 fatcat:wnd4pn5b6fazbfwzhajtlm6qvy

Toward enhanced understanding and projections of climate extremes using physics-guided data mining techniques

A. R. Ganguly, E. A. Kodra, A. Banerjee, S. Boriah, S. Chatterjee, S. Chatterjee, A. Choudhary, D. Das, J. Faghmous, P. Ganguli, S. Ghosh, K. Hayhoe (+15 others)
2014 Nonlinear Processes in Geophysics Discussions  
To be successful, scalable methods will need to handle what has been called "big data" to tease out elusive but robust statistics of extremes and change from what is ultimately small data.  ...  This perspectives article explores the possibility that physically cognizant mining of massive climate data may lead to significant advances in generating credible predictive insights about climate extremes  ...  Erickson III, Joseph Kanney, Vimal Mishra and Habib Najm for helpful discussions.  ... 
doi:10.5194/npgd-1-51-2014 fatcat:5pyuk62mxjgqdnof3dvuyjcmum

Toward enhanced understanding and projections of climate extremes using physics-guided data mining techniques

A. R. Ganguly, E. A. Kodra, A. Agrawal, A. Banerjee, S. Boriah, Sn. Chatterjee, So. Chatterjee, A. Choudhary, D. Das, J. Faghmous, P. Ganguli, S. Ghosh (+17 others)
2014 Nonlinear Processes in Geophysics  
To be successful, scalable methods will need to handle what has been called "big data" to tease out elusive but robust statistics of extremes and change from what is ultimately small data.  ...  This perspectives article explores the possibility that physically cognizant mining of massive climate data may lead to significant advances in generating credible predictive insights about climate extremes  ...  Erickson III, Joseph Kanney, Vimal Mishra and Habib Najm for helpful discussions.  ... 
doi:10.5194/npg-21-777-2014 fatcat:z4ozxvkazjfenhj3f64usn632m

Leveraging Identity-Based Cryptography for Node ID Assignment in Structured P2P Systems

K.R.B. Butler, S. Ryu, P. Traynor, P.D. McDaniel
2009 IEEE Transactions on Parallel and Distributed Systems  
Structured peer-to-peer systems have grown enormously because of their scalability, efficiency and reliability. These systems assign a unique identifier to each user and object.  ...  This accounts for approximately 80% of the time required for this exchange, with the remaining 20% attributed to network delay, software initialization, etc.  ...  The last message cost can be attributed to signature costs associated with token generation.  ... 
doi:10.1109/tpds.2008.249 fatcat:7zvzadzzhvfx7nmj5nxsynbazi
« Previous Showing results 1 — 15 out of 1,576 results