Data intensive applications on clouds

Geoffrey C. Fox
2011 Proceedings of the second international workshop on Data intensive computing in the clouds - DataCloud-SC '11  
The cyberinfrastructure supporting science will include large-scale simulation systems headed to exascale combined with cloud like systems supporting data intensive and high throughput computing, pleasingly parallel jobs and the long tail of science. Clouds offer economies of scale, elasticity supporting real time and interactive use and powerful new programming models such as MapReduce. We stress that iterative extensions of MapReduce will be necessary to get good performance on for several
more » ... a mining (analytics) applications. We give several illustrations mainly from bioinformatics. We suggest that the data deluge implies a corresponding increase in the computational resources needed to support analysis and this suggests new architectures for large scale data repositories.
doi:10.1145/2087522.2087524 fatcat:ilmqpvteszclliewspmzz6lt6q