ENHANCED FEATURE SELECTION AND CLASSIFICATION OF BREAST CANCER SUBTYPES USING HEURISTIC OPTIMIZATION AND ENSEMBLE MODELS ON MICROARRAY DATA

Authors

  • PREMALATHA KANDHASAMY DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, BANNARI AMMAN INSTITUTE OF TECHNOLOGY, ERODE, TAMIL NADU, INDIA
  • MR. WASIM RAJA A ASSISTANT PROFESSOR, DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE SRI KRISHNA COLLEGE OF ENGINEERING AND TECHNOLOGY, COIMBATORE
  • M. VIGNESH ASSISTANT PROFESSOR, DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE KARPAGAM INSTITUTE OF TECHNOLOGY, COIMBATORE - 641 105 TAMIL NADU, INDIA
  • DR.D. MADESWARAN PROFESSOR, DEPARTMENT OF ELECTRICAL AND ELECTRONICS ENGINEERING, SSM COLLEGE OF ENGINEERING, KOMARAPALAYAM.
  • JANANI S ASSISTANT PROFESSOR, DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, KGISL INSTITUTE OF TECHNOLOGY, COIMBATORE
  • SIVAKUMAR KANDHASAMY KARPAGA VINAYAGA COLLEGE OF ENGINEERING AND TECHNOLOGY, CHENGALPATTU, TAMIL NADU, INDIA

Keywords:

Feature Selection, Gene Expression, Breast Cancer, Optimization Techniques, Ensemble Learning, Biomarker Identification

Abstract

Feature selection plays a crucial role in analyzing high-dimensional gene expression datasets, such as the GSE45827 breast cancer dataset, which contains numerous genes but a limited number of samples. The presence of irrelevant or redundant genes can negatively impact classification accuracy and biological interpretation. This study enhances classification performance by selecting the most informative genes using three optimization techniques: Self-Organizing Migrating Algorithm (SOMA), Particle Swarm Optimization (PSO), and Stellar Mass Black Hole Optimization (SMBO). To further refine the selected genes, ElasticNet is employed as a second-level feature selection method. The optimized gene subsets are then used in ensemble learning models, including Random Forest, Extreme Randomized Trees (ERT), and XGBoost, for breast cancer classification. Performance is evaluated using accuracy, precision, recall, F1-score, and the kappa constant. Results show that Random Forest achieves 100% accuracy with PSO, 90% with Cuckoo Search, and 97% with SOMA, while ERT reaches 100% accuracy using SMBO. Additionally, differentially expressed genes, pathway analysis, fold change of genes, and Kaplan-Meier survival analysis provide valuable biological insights into breast cancer biomarkers. These findings highlight the importance of feature selection in improving classification accuracy and biomarker discovery, supporting early detection and personalized oncology treatment strategies.

Downloads

How to Cite

KANDHASAMY, P., RAJA A, M. W., VIGNESH, M., MADESWARAN, D., S, J., & KANDHASAMY, S. (2025). ENHANCED FEATURE SELECTION AND CLASSIFICATION OF BREAST CANCER SUBTYPES USING HEURISTIC OPTIMIZATION AND ENSEMBLE MODELS ON MICROARRAY DATA. TPM – Testing, Psychometrics, Methodology in Applied Psychology, 32(S4(2025): Posted 17 July), 750–774. Retrieved from https://tpmap.org/submission/index.php/tpm/article/view/621