TY - JOUR
T1 - Performance analysis of 10 machine learning models in lung cancer prediction
AU - Zapata-Paulini, Joselyn
AU - Cabanillas-Carbonell, Michael
N1 - Publisher Copyright:
© 2025 Institute of Advanced Engineering and Science. All rights reserved.
PY - 2025/2
Y1 - 2025/2
N2 - Lung cancer is one of the diseases with the highest incidence and mortality in the world. Machine learning (ML) models can play an important role in the early detection of this disease. This study aims to identify the ML algorithm that has the best performance in predicting lung cancer. The algorithms that were contrasted were logistic regression (LR), decision tree (DT), k-nearest neighbors (KNN), gaussian Naive Bayes (GNB), multinomial Naive Bayes (MNB), support vector classifier (SVC), random forest (RF), extreme gradient boosting (XGBoost), multilayer perceptron (MLP) and gradient boosting (GB). The dataset used was provided by Kaggle, with a total of 309 records and 16 attributes. The study was developed in several phases, such as the description of the ML models and the analysis of the dataset. In addition, the contrast of the models was performed under the metrics of specificity, sensitivity, F1 count, accuracy, and precision. The results showed that the SVC, RF, MLP, and GB models obtained the best performance metrics, achieving 98% accuracy, 98% precision, and 98% sensitivity.
AB - Lung cancer is one of the diseases with the highest incidence and mortality in the world. Machine learning (ML) models can play an important role in the early detection of this disease. This study aims to identify the ML algorithm that has the best performance in predicting lung cancer. The algorithms that were contrasted were logistic regression (LR), decision tree (DT), k-nearest neighbors (KNN), gaussian Naive Bayes (GNB), multinomial Naive Bayes (MNB), support vector classifier (SVC), random forest (RF), extreme gradient boosting (XGBoost), multilayer perceptron (MLP) and gradient boosting (GB). The dataset used was provided by Kaggle, with a total of 309 records and 16 attributes. The study was developed in several phases, such as the description of the ML models and the analysis of the dataset. In addition, the contrast of the models was performed under the metrics of specificity, sensitivity, F1 count, accuracy, and precision. The results showed that the SVC, RF, MLP, and GB models obtained the best performance metrics, achieving 98% accuracy, 98% precision, and 98% sensitivity.
KW - Lung cancer
KW - Machine learning
KW - Models
KW - Performance
KW - Predicting
UR - http://www.scopus.com/inward/record.url?scp=85210769598&partnerID=8YFLogxK
U2 - 10.11591/ijeecs.v37.i2.pp1352-1364
DO - 10.11591/ijeecs.v37.i2.pp1352-1364
M3 - Article
AN - SCOPUS:85210769598
SN - 2502-4752
VL - 37
SP - 1352
EP - 1364
JO - Indonesian Journal of Electrical Engineering and Computer Science
JF - Indonesian Journal of Electrical Engineering and Computer Science
IS - 2
ER -