Home About Login Current Archives Announcements Editorial Board
Submit Now For Authors Call for Submissions Statistics Contact
Home > Archives > Volume 20, No 8 (2022) > Article

DOI: 10.14704/nq.2022.20.8.NQ44961

Data Splitting Techniques to Reduce False-Positive and False-Negative Cases in Breast Cancer Prediction

Vijay Birchha,Bhawna Nigam


The massive worldwide number of women affected with breast cancer; is the most common and severe cause of women’s high mortality rate. The false diagnosis can be considered the most significant cause of the late discovery of breast cancer. The chances of curing breast cancer increase if the number of false-positive and false-negative predictions is reduced. The research objectivesare; can dataset splitting techniques used to train the machine learning classifiers affect the classifier performance?; do they help to minimize false-positive and false-negative predictions of breast cancer? In this work, artificial neural network (NN), support vector machine (SVM), logistic regression (LR) and decision forest (DF) machine learning (ML) classifiers were used with The breast cancer Wisconsin (original) dataset (WBC). The classifier’s false-positive and false-negative predictions were compared with different dataset splitting techniques train-test (TT), train-test-validation (TTV) and k-fold cross-validation. The neural network classifier scored zero FP predictions with the train-test-validation dataset splitting method. The support vector machine recorded zero FN predictions with the k-fold cross-validation dataset splitting method. The results proved that the selection of dataset splitting techniques significantly impacts machine learning classifierperformance. The result will help implement a computer-aided system to diagnose breast cancer more accurately.


Breast cancer, Wisconsin dataset, machine learning,false-positive, false-negative,support vector machine, decision forest, neural network, logistic regression,dataset split,

Full Text