High Frequency Rule Synthesis in a Large Scale Multiple Database with MapReduce

Sudhanshu Shekhar Bisoyi, Pragnyaban Mishra, Sarojananda Mishra

Abstract


Increasing development in information and communication technology leads to the generation of large amount of data from various sources. These collected data from multiple sources grows exponentially and may not be structurally uniform. In general, these are heterogeneous and distributed in multiple databases. Because of large volume, high velocity and variety of data mining knowledge in this environment becomes a big data challenge. Distributed Association Rule Mining(DARM) in these circumstances becomes a tedious task for an effective global Decision Support System(DSS). The DARM algorithms generate a large number of association rules and frequent itemset in the big data environment. In this situation synthesizing high-frequency rules from the big database becomes more challenging. Many  algorithms for synthesizing association rule have been proposed in multiple database mining environments. These are facing enormous challenges in terms of high availability, scalability, efficiency, high cost for the storage and processing of large intermediate results and multiple redundant rules. In this paper, we have proposed a model to collect data from multiple sources into a big data storage framework based on HDFS. Secondly, a weighted multi-partitioned method for synthesizing high-frequency rules using MapReduce programming paradigm has been proposed. Experiments have been conducted in a parallel and distributed environment by using commodity hardware. We ensure the efficiency, scalability, high availability and cost-effectiveness of our proposed method.

Full Text:

PDF

Refbacks

  • There are currently no refbacks.


International Journal of Electronics and Telecommunications
is a periodical of Electronics and Telecommunications Committee
of Polish Academy of Sciences

eISSN: 2300-1933