Skip to content

Latest commit

 

History

History
93 lines (77 loc) · 6.21 KB

README.md

File metadata and controls

93 lines (77 loc) · 6.21 KB

Exhaustive Exploitation of Nature-inspired Computation for Cancer Screening in an Ensemble Manner

Xubin Wang1 · Yunhe Wang2* · Zhiqing Ma3 · Ka-Chun Wong4 · Xiangtao Li1*

1Jilin University · 2Hebei University of Technology · 3Northeast Normal University · 4City University of Hong Kong

*corresponding authors

PDF · Code

Contents

Overview

Accurate screening of cancer types is crucial for effective cancer detection and precise treatment selection. However, the association between gene expression profiles and tumors is often limited to a small number of biomarker genes. While computational methods using nature-inspired algorithms have shown promise in selecting predictive genes, existing techniques are limited by inefficient search and poor generalization across diverse datasets. This study presents a framework termed Evolutionary Optimized Diverse Ensemble Learning (EODE) to improve ensemble learning for cancer classification from gene expression data. The EODE methodology combines an intelligent grey wolf optimization algorithm for selective feature space reduction, guided random injection modeling for ensemble diversity enhancement, and subset model optimization for synergistic classifier combinations. Extensive experiments were conducted across 35 gene expression benchmark datasets encompassing varied cancer types. Results demonstrated that EODE obtained significantly improved screening accuracy over individual and conventionally aggregated models. The integrated optimization of advanced feature selection, directed specialized modeling, and cooperative classifier ensembles helps address key challenges in current nature-inspired approaches. This provides an effective framework for robust and generalized ensemble learning with gene expression biomarkers.

Framework

model Overview of the proposed EODE algorithm: In the GWO feature selection phase, the original cancer gene expression training data is utilized to train all base classifiers, and the classifier with the highest performance is selected as the evaluation classifier. The processed data is then optimized to construct an ensemble model. Specifically, the training data is incrementally clustered using the K-means method to form subspace clusters. These clusters are used to train individual base classifiers, which are then added to the model pool. Any classifiers in the pool with below-average performance are filtered out. Next, the GWO is applied to optimize the classifier pool and determine the best possible ensemble combination. Finally, the optimized ensemble model is evaluated on the independent test dataset using a plurality voting strategy to generate the final cancer type predictions.

Data and Baseline Availability

  • ComparisonMethods: The baselines for comparison, including nature-inspired methods, machine learning methods and ensemble methods.
  • OriginalData: The original data. They were randomly divided into the training set and the test set in an 8:2 ratio.
  • TrainData: Training data used in the experiment.
  • TestData: Test data used in the experiment.

Dependencies

  • This project was developed with MATLAB 2021a. Early versions of MATLAB may have incompatibilities.

Instructions

1. Main Code

  • EODE.m (This is the main file of the proposed model)
    • You can replace your data in the Problem. For example:
      • Problem = {'The_name_of_your_own_data'};
    • How to load your own data?
        traindata = load(['C:\Users\c\Desktop\EODE\train\',p_name]);
        traindata = getfield(traindata, p_name);
        data = traindata;
        feat = data(:,1:end-1); 
        label = data(:,end);
      
    • You can set the number of iterations of the whole experiment through numRun
    • The file path can be replaced under traindata and testdata
    • The parameters of GWO algorithm can be replaced in:
      • opts.k = 3; % number of k in K-nearest neighbor
      • opts.N = 100; % number of solutions
      • opts.T = 50; % maximum number of iterations

To reproduce our experiments, you can run EODE.m ten times and take the average of the results.

2. Data Partition

  • DataPartition.m (This file is used to divide the raw data in a 8:2 ratio)

3. Feature Selection Phase

  • jGreyWolfOptimizer.m (To find an optimal feature subset)

4. Classifier Generation Phase

  • generateClusters.m (To generate multiple clusters)
  • trainClassifiers.m (To train base classifiers use these clusters)

5. Classifier Pool Optimization Phase

  • classifierSelectionGWO.m (Use GWO algorithm to find an optimal classifier set)
  • GWOPredict.m

6. Model Fusion

  • fusion.m

7. Fitness Function

  • jFeatureSelectionFunction.m
  • jFitnessFunction.m

Results

We conducted experiments on 35 datasets encompassing various cancer types, and the results demonstrate the effectiveness of our algorithm compared to four nature-inspired ensemble methods (PSOEL, EAEL, FESM, and GA-Bagging-SVM), six benchmark machine learning algorithms (KNN, DT, ANN, SVM, DISCR, and NB), six state-of-the-art ensemble algorithms (RF, ADABOOST, RUSBOOST, SUBSPACE, TOTALBOOST, and LPBOOST), and seven nature-inspired methods (ACO, CS, DE, GA, GWO, PSO, and ABC). Our algorithm outperformed these methods in terms of classification accuracy.

Cite Our Work

@article{wang2024eode,
  title={Exhaustive Exploitation of Nature-inspired Computation for Cancer Screening in an Ensemble Manner},
  author={Wang, Xubin and Wang, Yunhe and Ma, Zhiqiang and Wong, Ka-Chun and Li, Xiangtao},
  journal={IEEE/ACM Transactions on Computational Biology and Bioinformatics},
  year={2024},
  publisher={IEEE/ACM}
}

Contact

wangxb19 at mails.jlu.edu.cn