ABSTRACT: With the extensive applicability of machine learning classification algorithms to a wide spectrum of domains, feature selection (FS) becomes a relevant data preprocessing technique due to the high dimensionality of data used in these domains. While efforts have been made to study various filters for ranking features, scholars have paid little attention to developing a unified framework that can be used as an interface for any filter. The development of such a framework would formalize the understanding of filter-based FS. This helps put scholars in the same perspective when analyzing new FS algorithms. This study proposes a new filter-based FS framework based on the best–worst multi-attribute decision-making method. The proposed algorithm is compared to two control groups: (a) no FS and (b) randomized algorithm. Furthermore, two blocking variables are considered: (i) classifier and (ii) training dataset. The performance of the classifiers was measured using the area under the curve (AUC) of the receiver operating characteristics (ROC) curve. A three-way analysis of variance (ANOVA) is used to compare the proposed approach to the control groups considering the blocking variables. This paper offers several contributions to the literature. For one thing, it is one of the few works that put forward a framework for performing filter-based FS. To the best of the authors’ knowledge, the study is the first to provide empirical evidence about the interaction between the factors considered in the literature for evaluating FS algorithms.

Full paper can be found here.