Filters








403 Hits in 2.8 sec

An experience report on scaling tools for mining software repositories using MapReduce

Weiyi Shang, Bram Adams, Ahmed E. Hassan
2010 Proceedings of the IEEE/ACM international conference on Automated software engineering - ASE '10  
We use three representative case studies from the MSR field to analyze the potential of the MapReduce platform to scale MSR tools with minimal effort.  ...  We find that many of the web field's guidelines for using the MapReduce platform need to be modified to better fit the characteristics of software engineering problems.  ...  [28] claim that using CC-Finder to detect code clones in the FreeBSD source code requires 40 days.  ... 
doi:10.1145/1858996.1859050 dblp:conf/kbse/ShangAH10 fatcat:bpmjx35m5jflzphl2mgelso5pi

A parallel and efficient approach to large scale clone detection

Hitesh Sajnani, Cristina Lopes
2013 2013 7th International Workshop on Software Clones (IWSC)  
We propose a new token-based approach for large scale code clone detection. It is based on a filtering heuristic which reduces the number of token comparisons.  ...  We also implement a MapReduce based parallel algorithm that implements the filtering heuristic and scales to thousands of projects.  ...  Subsequently, we will present a MapReduce based clone detection algorithm which makes use of this computed list.  ... 
doi:10.1109/iwsc.2013.6613042 dblp:conf/iwsc/SajnaniL13 fatcat:q3563inqtfb5za5jtpvmuqiw3y

Using Pig as a data preparation language for large-scale mining software repositories studies: An experience report

Weiyi Shang, Bram Adams, Ahmed E. Hassan
2012 Journal of Systems and Software  
In this paper, we report on our experience in using a web-scale platform (i.e., Pig) as a data preparation language to aid large-scale MSR studies.  ...  Through three case studies, we carefully validate the use of this web platform to prepare (i.e., Extract, Transform, and Load, ETL) data for further analysis.  ...  As the authors are not clone detection experts, they choose to use an existing clone detection tool.  ... 
doi:10.1016/j.jss.2011.07.034 fatcat:6lupcadplnaivf6flcm7olqs24

Automatic contention detection and amelioration for data-intensive operations

John Cieslewicz, Kenneth A. Ross, Kyoho Satsumi, Yang Ye
2010 Proceedings of the 2010 international conference on Management of data - SIGMOD '10  
To take full advantage of the parallelism offered by a multicore machine, one must write parallel code. Writing parallel code is difficult.  ...  Rather, we aim to provide a framework within which a programmer can, without detailed knowledge of concurrent and parallel programming, develop code that efficiently utilizes a multi-core machine.  ...  The programmer does not need to write code to address these issues. Not all computations can be abstracted into a MapReduce framework.  ... 
doi:10.1145/1807167.1807221 dblp:conf/sigmod/CieslewiczRSY10 fatcat:jr3qatygmnff7h7q2wttb2rac4

Improved Hadoop Cluster Performance by Dynamic Load and Resource Aware Speculative Execution and Straggler Node Detection

2020 International Journal of Engineering and Advanced Technology  
The detection and cloning of tasks assigned with the stragglers only will not be enough to enhance the performance unless cloning of tasks is allocated in a resource aware method.  ...  For the lightly loaded case, a task cloning scheme, namely, the combined file task cloning algorithm, which is based on maximizing the overall system utility, a straggler detection algorithm is proposed  ...  Detailed performance analysis using several classical MapReduce Program along with word count. Performance analysis by using Spark by parallelism tuning.  ... 
doi:10.35940/ijeat.d8017.049420 fatcat:6b7d6gkh4nc3xkz7qpu65anqia

Monte Carlo simulation of photon migration in a cloud computing environment with MapReduce

Guillem Pratx, Lei Xing
2011 Journal of Biomedical Optics  
However, its widespread use is hindered by the high computational cost.  ...  The purpose of this work is to report on our implementation of a simple MapReduce method for performing fault-tolerant Monte Carlo computations in a massively-parallel cloud computing environment.  ...  In our MapReduce implementation, each task allocates its own detection grid locally, which avoids data write hazards. Local detection grids are combined using parallel Reduce tasks.  ... 
doi:10.1117/1.3656964 pmid:22191916 pmcid:PMC3273307 fatcat:ftanxuevnnfzbeo342gf57mmcu

On how often code is cloned across repositories

Niko Schwarz, Mircea Lungu, Romain Robbes
2012 2012 34th International Conference on Software Engineering (ICSE)  
Large-scale clone detection also opens new challenges beyond asking for the provenance of a single clone fragment, such as assessing the prevalence of code clones on the entire code base, and their evolution  ...  Detecting code duplication in large code bases, or even across project boundaries, is problematic due to the massive amount of data involved.  ...  In their paper, Hummel et al. describe how they implemented their own tables that could be queried in parallel using MapReduce.  ... 
doi:10.1109/icse.2012.6227097 dblp:conf/icse/SchwarzLR12 fatcat:pdcdibyjzjfwpkfhkoyh54rvb4

Index-based code clone detection: incremental, distributed, scalable

Benjamin Hummel, Elmar Juergens, Lars Heinemann, Michael Conradt
2010 2010 IEEE International Conference on Software Maintenance  
Although numerous different clone detection approaches have been proposed to date, not a single one is both incremental and scalable to very large code bases.  ...  We report on several case studies that show both its suitability for real-time clone detection and its scalability: on 42 MLOC of Eclipse code, average time to retrieve all clones for a file was below  ...  The first MapReduce program constructs the clone index and stores it in a Bigtable. As the addition of different files to the index is completely independent, it can be easily parallelized.  ... 
doi:10.1109/icsm.2010.5609665 dblp:conf/icsm/HummelJHC10 fatcat:adtomikryngdrpxyzoryjpqr64

Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends

Emad A Mohammed, Behrouz H Far, Christopher Naugler
2014 BioData Mining  
The MapReduce programming framework uses two tasks common in functional programming: Map and Reduce.  ...  MapReduce is a new parallel processing framework and Hadoop is its open-source implementation on a single computing node or on clusters.  ...  MapReduce and Hadoop can be consciously used to train detection and forecasting models.  ... 
doi:10.1186/1756-0381-7-22 pmid:25383096 pmcid:PMC4224309 fatcat:zpis7kklerh2vna5le2gtxc5vi

Rearchitecting MapReduce for Heterogeneous Multicore Processors with Explicitly Managed Memories

Anastasios Papagiannis, Dimitrios S. Nikolopoulos
2010 2010 39th International Conference on Parallel Processing  
We advance the state of the art in runtime support for MapReduce using five instruments: (1) A new multi-threaded, event-driven controller for task instantiation, task scheduling, synchronization, and  ...  bulk-synchronous execution of MapReduce stages.  ...  They enable high-performance vectorization of data-parallel code.  ... 
doi:10.1109/icpp.2010.21 dblp:conf/icpp/PapagiannisN10 fatcat:wrn7enbn45axtnkcv5vpb6zgge

Task-Cloning Algorithms in a MapReduce Cluster with Competitive Performance Bounds [article]

Huanle Xu, Wing Cheong Lau
2015 arXiv   pre-print
To tackle this online job scheduling challenge, we adopt the task cloning approach and design the corresponding scheduling algorithms which aim at minimizing the weighted sum of job flowtimes in a MapReduce  ...  Job scheduling for a MapReduce cluster has been an active research topic in recent years.  ...  Under the Cloning approach, extra copies of a task are scheduled in parallel with the initial task and the one which finishes first is used for the subsequent computation.  ... 
arXiv:1501.02330v1 fatcat:n5hoiptz7benvp3jjqg32txnvq

Optimization for Speculative Execution of Multiple Jobs in a MapReduce-like Cluster [article]

Huanle Xu, Wing Cheong Lau
2015 arXiv   pre-print
For the lightly loaded case, we analyze and propose two optimization-based schemes, namely, the Smart Cloning Algorithm (SCA) which is based on maximizing the job utility and the Straggler Detection Algorithm  ...  To tackle this so-called straggler problem, most parallel processing frameworks such as MapReduce have adopted various strategies under which the system may speculatively launch additional copies of the  ...  The corresponding pseudo-code is given in Algorithm 1 as below.  ... 
arXiv:1406.0609v3 fatcat:ept2yi2xabhs3ldn5r6heag2c4

CloudBLAST: Combining MapReduce and Virtualization on Distributed Resources for Bioinformatics Applications

Andréa Matsunaga, Maurício Tsugawa, José Fortes
2008 2008 IEEE Fourth International Conference on eScience  
The proposed approach uses the MapReduce paradigm to parallelize tools and manage their execution, machine virtualization to encapsulate their execution environments and commonly used data sets into flexibly  ...  The implementation integrates Hadoop, Virtual Workspaces, and ViNe as the MapReduce, virtual machine and virtual network technologies, respectively, to deploy the commonly used bioinformatics tool NCBI  ...  Using virtual machines with software and data needed for execution of both the application and MapReduce greatly facilitates the distributed deployment of sequential codes.  ... 
doi:10.1109/escience.2008.62 dblp:conf/eScience/MatsunagaTF08 fatcat:nrvf6gbjrjcznnjd6f5k5aqqym

MapReduce in MPI for Large-scale graph algorithms

Steven J. Plimpton, Karen D. Devine
2011 Parallel Computing  
This means the calling program does not need to include explicit parallel code, but instead provides "map" and "reduce" functions that operate independently on elements of a data set distributed across  ...  We describe a parallel library written with message-passing (MPI) calls that allows algorithms to be expressed in the MapReduce paradigm.  ...  5, and for his overall support of this work and many useful discussions.  ... 
doi:10.1016/j.parco.2011.02.004 fatcat:icat6ghmqvaetevtbhkwslzqbq

Fault Tolerance in MapReduce: A Survey [chapter]

Bunjamin Memishi, Shadi Ibrahim, María S. Pérez, Gabriel Antoniu
2016 Computer Communications and Networks  
Data-intensive computing has become one of the most popular forms of parallel computing. This is due to the explosion of digital data we are living.  ...  In particular, MapReduce frameworks tolerate machine failures (crash failures) by re-executing all the tasks of the failed machine by the virtue of data replication.  ...  The classic implementation of MapReduce has no mechanism for dealing with the failure of the master, since the heartbeat mechanism is not used to detect this kind of failure.  ... 
doi:10.1007/978-3-319-44881-7_11 dblp:series/ccn/MemishiIPA16 fatcat:m5x33gpzunhzzgrdslagndiwzy
« Previous Showing results 1 — 15 out of 403 results