Balancing performance and environmental efficiency: a multiclass classification study of textual data
Abstract
This study evaluates Multiclass classification (MCC) strategies -- One-Vs-Rest (OVA), One-Vs-One (OVO), Best-of-Best (BOB), and Error-Correcting-Output-Codes (ECOC) -- using classifiers like Naïve Bayes, Random Forest, Linear Discriminant Analysis, Logistic Regression, Neural Networks, Support Vector Machine, and Threshold-based Naïve Bayes on the 20NewsGroup text dataset, well known in literature for its complexity. Findings shows that the choice of classifier significantly affects accuracy and computational effort. Threshold-based Naïve Bayes excels with OVO, OVA, and BOB but declines with ECOC. Artificial Neural Network and Random Forest, which are slowest, align well with BOB and OVA respectively. In contrast, Naïve Bayes and Logistic Regression stand out for speed, particularly with OVA. Along with the Support Vector Machine, these classifiers demonstrate versatility across all strategies, balancing accuracy and training time. Additionally, OVO and BOB prove to be advantageous for handling unbalanced data, by focusing on individual class pairings. OVA emerges as the fastest strategy, while ECOC's performance is classifier-dependent. Our analysis underscores the importance of selecting the appropriate classifier and strategy pairing in MCC tasks, particularly in imbalanced datasets. Importantly, this study underlines the environmental impact of computational choices, advocating for efficient, accurate predictions to minimize energy consumption and optimize machine learning applications' ecological footprint.
Keywords:
Statistical Learning; Multiclass Classification; One-vs-One; One-vs-All; Supervised Learning; Green AI
Full Text: pdf


