A Proposal for Duplicate Data Detection in Big Data

Nancy Jasmine Goldena
2018 International Journal for Research in Applied Science and Engineering Technology  
Big Data is now the most talked about research subject. Over the years with the internet and storage space expansions vast swaths of data are available for would be searcher. But the problem that plagues the internet storage space is that multiple copies of the same data exits. This not only degrades the search results but also concedes time. Also it prevents accurate data analysis. In order to solve these problems a novel proposal has been proposed here. Traditional data mining approaches work
more » ... well with dataset of small sizes. When the size of the dataset grows newer techniques are needed as it would consume more time to implement an operation on the large Big dataset. Hence a simpler approach is being proposed here, which does need a creation of new technique to process Big Data but proposes a unique strategy so that we can make use of existing data mining techniques efficiently.
doi:10.22214/ijraset.2018.3361 fatcat:5jqbsrotebcujllazwt3lazjii