5. Computational Issues in Statistical Data Analysis

Hiroyuki Minami, Yuriko Komiya, Masahiro Mizuta
2003 Journal of the Japanese Society of Computational Statistics  
We have to analyze enormous data in many cases. A personal computer can handle them, however, it would take a lot of time even if today's personal computer would have good specifications. Anyway, we have to seek a faster analysis environment. A parallel computer which has large computing power will satisfy us. Parallel Virtual Machine (PVM) is one of the popular computer libraries to make many computers, connected via computer network, one (virtual) parallel one. If we could use thousands of
more » ... nected computers concurrently, we would analyze various data quickly with PVM. We have investigated PVM features through many simulations and found some interesting ones. Accordingly we construct a generic experimental model of execution time in PVM. This model is applicable for most methods on data analysis which can be implemented with master-slave style, in other words, which can be divided into one main part and some sub parts. In terms of this model, we evaluate turn-around time, related to amount of transferred data, load (described by execution time) on each slave computer and number of (part-)jobs. Our model is so generic that we can estimate execution time for such analysis methods as Bootstrap, k-means, etc. We can also derive how many computers are required if we analyze data in time. In this paper, we summarize our work with numerical examples and discuss some points to use our framework in practice.
doi:10.5183/jjscs1988.15.2_193 fatcat:psn5evf2ijawtcspfuix5omtjq