Result: Performance Evaluation of Data-driven Intelligent Algorithms for Big data Ecosystem.

Title:
Performance Evaluation of Data-driven Intelligent Algorithms for Big data Ecosystem.
Authors:
Junaid M; Department of Electrical and Computer Engineering, Sungkyunkwan University, Suwon, South Korea., Ali S; Department of Computer Science and Engineering, Sungkyunkwan University, Suwon, South Korea., Siddiqui IF; Department of Software Engineering, Mehran University of Engineering and Technology, Jamshoro, Pakistan., Nam C; Department of Software Convergence Engineering, Inha University, Incheon, South Korea., Qureshi NMF; Department of Computer Education, Sungkyunkwan University, Seoul, South Korea., Kim J; Department of Computer Education, Sungkyunkwan University, Seoul, South Korea., Shin DR; Department of Electrical and Computer Engineering, Sungkyunkwan University, Suwon, South Korea.
Source:
Wireless personal communications [Wirel Pers Commun] 2022; Vol. 126 (3), pp. 2403-2423. Date of Electronic Publication: 2022 Aug 23.
Publication Type:
Journal Article
Language:
English
Journal Info:
Publisher: Springer Country of Publication: Netherlands NLM ID: 101670529 Publication Model: Print-Electronic Cited Medium: Print ISSN: 0929-6212 (Print) Linking ISSN: 09296212 NLM ISO Abbreviation: Wirel Pers Commun Subsets: PubMed not MEDLINE
Imprint Name(s):
Publication: 2005- : Heidelberg, Germany : Springer
Original Publication: 1994- : Dordrecht ; Boston : Kluwer Academic Publishers
Contributed Indexing:
Keywords: Big Data; Machine Learning; Predictive analytic; PySpark Ml-lib; Rapid Miner; SK-Learn
Entry Date(s):
Date Created: 20220829 Latest Revision: 20221014
Update Code:
20250114
PubMed Central ID:
PMC9396610
DOI:
10.1007/s11277-021-09362-7
PMID:
36033548
Database:
MEDLINE

Further Information

Artificial intelligence, specifically machine learning, has been applied in a variety of methods by the research group to transform several data sources into valuable facts and understanding, allowing for superior pattern identification skills. Machine learning algorithms on huge and complicated data sets, computationally expensive on the other hand, processing requires hardware and logical resources, such as space, CPU, and memory. As the amount of data created daily reaches quintillion bytes, A complex big data infrastructure becomes more and more relevant. Apache Spark Machine learning library (ML-lib) is a famous platform used for big data analysis, it includes several useful features for machine learning applications, involving regression, classification, and dimension reduction, as well as clustering and features extraction. In this contribution, we consider Apache Spark ML-lib as a computationally independent machine learning library, which is open-source, distributed, scalable, and platform. We have evaluated and compared several ML algorithms to analyze the platform's qualities, compared Apache Spark ML-lib against Rapid Miner and Sklearn, which are two additional Big data and machine learning processing platforms. Logistic Classifier (LC), Decision Tree Classifier (DTc), Random Forest Classifier (RFC), and Gradient Boosted Tree Classifier (GBTC) are four machine learning algorithms that are compared across platforms. In addition, we have tested general regression methods such as Linear Regressor (LR), Decision Tree Regressor (DTR), Random Forest Regressor (RFR), and Gradient Boosted Tree Regressor (GBTR) on SUSY and Higgs datasets. Moreover, We have evaluated the unsupervised learning methods like K-means and Gaussian Mixer Models on the data set SUSY and Hepmass to determine the robustness of PySpark, in comparison with the classification and regression models. We used "SUSY," "HIGGS," "BANK," and "HEPMASS" dataset from the UCI data repository. We also talk about recent developments in the research into Big Data machines and provide future research directions.
(© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022.)