ENHANCED FEATURE SELECTION AND CLASSIFICATION OF BREAST CANCER SUBTYPES USING HEURISTIC OPTIMIZATION AND ENSEMBLE MODELS ON MICROARRAY DATA
Keywords:
Feature Selection, Gene Expression, Breast Cancer, Optimization Techniques, Ensemble Learning, Biomarker IdentificationAbstract
Feature selection plays a crucial role in analyzing high-dimensional gene expression datasets, such as the GSE45827 breast cancer dataset, which contains numerous genes but a limited number of samples. The presence of irrelevant or redundant genes can negatively impact classification accuracy and biological interpretation. This study enhances classification performance by selecting the most informative genes using three optimization techniques: Self-Organizing Migrating Algorithm (SOMA), Particle Swarm Optimization (PSO), and Stellar Mass Black Hole Optimization (SMBO). To further refine the selected genes, ElasticNet is employed as a second-level feature selection method. The optimized gene subsets are then used in ensemble learning models, including Random Forest, Extreme Randomized Trees (ERT), and XGBoost, for breast cancer classification. Performance is evaluated using accuracy, precision, recall, F1-score, and the kappa constant. Results show that Random Forest achieves 100% accuracy with PSO, 90% with Cuckoo Search, and 97% with SOMA, while ERT reaches 100% accuracy using SMBO. Additionally, differentially expressed genes, pathway analysis, fold change of genes, and Kaplan-Meier survival analysis provide valuable biological insights into breast cancer biomarkers. These findings highlight the importance of feature selection in improving classification accuracy and biomarker discovery, supporting early detection and personalized oncology treatment strategies.
Downloads
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.