Home About Login Current Archives Announcements Editorial Board
Submit Now For Authors Call for Submissions Statistics Contact
Home > Archives > Volume 20, No 8 (2022) > Article

DOI: 10.14704/nq.2022.20.8.NQ44542

Optimizing the Classifier Performance through Similarity Based Entropy Max Score Feature Selection

E. Bharath, T. Rajagopalan


Breast cancer diagnosis plays a vital role in saving women mortality. The revolution of machine learning in the medical process helps in identifying the benign or malignant. We propose entropy max score approach (EMSA) using a class distribution based similarity, threshold constraint on attribute pairs, and scoring. The major contribution of EMSA is to find significant features to improvise the classifier performance. For this, the Wisconsin diagnosis of breast cancer (WDBC) dataset is taken from the UCI repository. Our scheme selects only three features from thirty features such as (𝑥23 – radius largest worst; 𝑥24 – texture largest worst; 𝑥30 – concave points largest worst). The classification performance is made by accuracy, sensitivity, specificity, precision, F–score, Matthews’s correlation coefficient (MCC), and negative predictive value. Through EMSA significant features, the classifiers RF, DT, SVM. LM and NN achieve 99.40%, 95.90%, 96.90%, 94.20%, and 62.70%. Though, by all features (without EMSA), the RF and DT achieve only 99.30% and 95.20% accuracy. Therefore with only three features, the EMSA approach outperforms all feature performance. As a final point, the experimental results and discussions are presented here along with EMSA.


Feature selection; Breast cancer; Classification; Machine Learning (ML)

Full Text