Description
Title: HYBRID PARALLEL DIMENSIONALITY REDUCTION BASED BIG DATA CLASSIFICATION APPROACH USING SPARK CLUSTER
Abstract: Studies on how to precisely and thoroughly extract useful information from a sizable dataset have been prompted by the big data concept. Data dimensionality, which arises from the abundance of dimensions in such datasets, is the main issue encountered during big data mining. Due to the presence of numerous redundant features in a dataset, high data dimensionality has the major effect of reducing the accuracy of machine learning (ML) classifiers and increasing time consumption. Possible solutions for this issue include quick feature reduction techniques. In order to make feature reduction on shared/distributed-memory clusters easier, this study presents a quick HP-PL, a new hybrid parallel feature reduction framework that makes use of Spark. The proposed HP-PL was tested using the CICIDS2017 dataset, and results indicated that it was significantly faster than traditional feature reduction methods. On a 3-node cluster, the proposed technique took more than a minute to choose 4 dataset features from more than 79 features and 3,000,000 samples (a total of 21 cores). The same feat took the comparative algorithm more than two hours to complete. The proposed system uses Apache Spark as the computing engine and Hadoop’s distributed le system (HDFS) to achieve distributed storage. The development of the model was based on a parallel model that took into account the high throughput and performance of distributed computing. In conclusion, compared to the traditional methods of feature reduction, the proposed HP-PL method can achieve good accuracy with less memory and time. Public access to this tool is available at https://github.com/ahmed/Fast-HP-PL.
Keywords: big data, dimensionality reduction, parallel processing, Spark, PCA, LDA
Paper Quality: SCOPUS / Web of Science Level Research Paper
Paper type: Analysis Based Research Paper
Subject: Computer Science
Writer Experience: 20+ Years
Plagiarism Report: Turnitin Plagiarism Report will be less than 10%
Restriction: Only one author may purchase a single paper. The paper will then indicate that it is out of stock.
What will I get after the purchase?
A turnitin plagiarism report of less than 10% in a pdf file and a full research paper in a word document.
In case you have any questions related to this research paper, please feel free to call/ WhatsApp on +919726999915
Reviews
There are no reviews yet.