Treffer: PWFS: A scalable parallel Python module for wrapper feature selection

Title:
PWFS: A scalable parallel Python module for wrapper feature selection
Source:
Volume: 5, Issue: 2704-719 ; 2791-7630 ; Yenilikçi Mühendislik ve Doğa Bilimleri ; Journal of Innovative Engineering and Natural Science
Publisher Information:
İdris Karagöz
Publication Year:
2025
Collection:
DergiPark Akademik (E-Journals)
Document Type:
Fachzeitschrift article in journal/newspaper
File Description:
application/pdf
Language:
English
DOI:
10.61112/jiens.1639780
Accession Number:
edsbas.2A3DAAE4
Database:
BASE

Weitere Informationen

In the field of machine learning, the feature selection process is a crucial step, and it can significantly impact the performance of predictive models. Despite the existence of various time-efficient algorithms, the only method that guarantees problem optimization is exhaustive search, but it requires an enormous computational load. Although the exhaustive search ensures the best feature selection, a lifetime would not be enough after certain large feature counts. This study proposes a generic, scalable open-source parallel Python module to find the best wrapper feature subset in a fully optimized execution time, especially for reasonable feature counts. This parallel wrapper feature selection module, PWFS, is independent of machine learning algorithms and can function with user-defined methods. The framework promises maximum benefit on the machine learning side by empowering parallel performance and efficiency. The system design is built on the most efficient message-passing communication, where the framework distributes the computational load equally among the parallel agents via feature masking. The module is validated on two workstations, one of which is hyper-threading capable. An overall performance gain of 19.77% is achieved with hyper-threading. Various scenarios and experiments yield different speedups and efficiencies up to 96.74%, validating the flexible design of the proposed parallel framework. The source code of the module is available at https://github.com/haeren/parallel-feature-selector and https://pypi.org/project/parallel-feature-selector/. ; In the field of machine learning, the feature selection process is a crucial step, and it can significantly impact the performance of predictive models. Despite the existence of various time-efficient algorithms, the only method that guarantees problem optimization is exhaustive search, but it requires an enormous computational load. Although the exhaustive search ensures the best feature selection, a lifetime would not be enough after certain large ...