Machine Learning-Based Analysis of Cancer Incidence in Jordan (2020–2021): A Decision Tree Approach


Abstract


This study aimed to predict cancer types across different regions of Jordan using a machine learning-based Decision Tree Algorithm (DTA) model. The model employed patients’ demographic information—specifically gender, age, and region of residence—as independent variables (IVs) to assess their interaction with cancer types as the dependent variable (DV). The objective was to determine the predictive relationship between these demographic factors and cancer types to support regional cancer profiling and inform targeted public health planning. The research utilized secondary data from the Ministry of Health, the Directorate of Non-Communicable Diseases, and the Jordan Cancer Registry for the years 2020 to 2021. A total of 9,547 cancer cases were analyzed using the DTA model, which effectively identified significant incidence patterns, with the central region of Jordan accounting for the highest number of cases (n = 6,815; 71.4%). The model classified cancer into 25 distinct types based on demographic attributes, with breast cancer being the most prevalent, particularly among middle-aged females residing in the central region. The DTA model demonstrated high efficacy in handling and stratifying large-scale medical data, predicting cancer type interactions, categorizing and labeling datasets, and suggesting potential category mergers. These findings have important implications for the development of focused cancer prevention strategies and the efficient allocation of healthcare resources. However, a key limitation of the study is the incomplete characterization of cancer patient attributes across all Jordanian regions.


Keywords: Cancer incidence; Machine learning; Risk factors; Predictive modeling; Jordan

References


Ministry of Health. Annual report of registered cancer incidents in Jordan for 2021; Avaliable from https://www.moh.gov.jo.

IBM Corp. IBM SPSS Statistics for Windows, Version 25.0. Armonk, NY: IBM Corp. 2020.

World Population Review. Jordan population 2017 (demographics, maps, graphs). World Population Review. (Archived version). Retrieved from https://worldpopulationreview.com. 2019, March 5.

National Cancer Institute. Cancer statistics. National Cancer Institute. Retrieved from https://www.cancer.gov/about-cancer/understanding/statistics. 2024.

Department of Statistics - Jordan. Department of Statistics - Jordan. Retrieved from https://dosweb.dos.gov.jo. 2024.

Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J Clin. 2021 May;71(3):209-249. doi: 10.3322/caac.21660.

Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2018 Nov;68(6):394-424. doi: 10.3322/caac.21492.

World Health Organization. (2020). *Global health estimates 2020: Deaths by cause, age, sex, by country and by region, 2000-2019. World Health Organization. Retrieved December 11, 2020, Available from https://www.who.int/data/gho/data/themes/mortality-and-global-health-estimates/ghe

Ministry of Health. Annual report of registered cancer incidents in Jordan for 2021. Ministry of Health. Available from https://www.moh.gov.jo. 2021.

Ministry of Health. Annual Statistical Book 2019. Ministry of Health, Jordan, 2019.

Abdel-Razeq H, Al-Ibraheem A, Al-Rabi K, Shamiah O, Al-Husaini M, Mansour A. Cancer Care in Resource-Limited Countries: Jordan as an Example. JCO Glob Oncol. 2024;10:e2400237. doi:10.1200/GO.24.00237

Ferlay J, Shin HR, Bray F, Forman D, Mathers C, Parkin DM. Estimates of worldwide burden of cancer in 2008: GLOBOCAN 2008. Int J Cancer. 2010 Dec 15;127(12):2893-917. doi: 10.1002/ijc.25516.

Masoumi ZV, Genderen JL, Mesgari MS. Modeling and predicting the spatial dispersion of skin cancer considering environmental and socio-economic factors using a digital earth approach. Int J Digit Earth. 2018;13(6):661–682. doi:10.1080/17538947.2018.1551944

Kim J, Gosnell JE, Roman SA. Geographic influences in the global rise of thyroid cancer. Nat Rev Endocrinol. 2020 Jan;16(1):17-29. doi: 10.1038/s41574-019-0263-x.

Xie Y, Shi L, He X, Luo Y. Gastrointestinal cancers in China, the USA, and Europe. Gastroenterol Rep (Oxf). 2021 Mar 29; 9(2):91-104. doi: 10.1093/gastro/goab010.

Yu M, Hazelton WD, Luebeck GE, Grady WM. Epigenetic Aging: More Than Just a Clock When It Comes to Cancer. Cancer Res. 2020 Feb 1; 80(3):367-374. doi: 10.1158/0008-5472.CAN-19-0924.

Bhatia S, Landier W, Paskett ED, Peters KB, Merrill JK, Phillips J, Osarogiagbon RU. Rural-Urban Disparities in Cancer Outcomes: Opportunities for Future Research. J Natl Cancer Inst. 2022 Jul 11;114(7):940-952. doi: 10.1093/jnci/djac030. PMID: 35148389; PMCID: PMC9275775.

Salem HS. Cancer status in the Occupied Palestinian Territories: types; incidence; mortality; sex, age, and geography distribution; and possible causes. J Cancer Res Clin Oncol. 2023 Jul;149(8):5139-5163. doi: 10.1007/s00432-022-04430-2. Epub 2022 Nov 9. PMID: 36350411; PMCID: PMC9645346.

McMahon KM, Eaton V, Srikanth KK, Tupper C, Merwin M, Morris M, et al. Odds of Stage IV bone cancer diagnosis based on socioeconomic and geographical factors: a National Cancer Database (NCDB) review. Cureus. 2023 Feb 9;15(2): e34819.doi:10.7759/cureus.34819. PMID: 36919067; PMCID: PMC10008125.

Almaani N, Juweid ME, Alduraidi H, Ganem N, Abu-Tayeh FA, Alrawi R, et al. Incidence trends of melanoma and nonmelanoma skin cancers in Jordan from 2000 to 2016. JCO Glob Oncol. 2023 Feb; 9: e2200338. doi:10.1200/GO.22.00338. PMID: 36812449; PMCID: PMC10166427.

James G, Witten D, Hastie T, Tibshirani R. An introduction to statistical learning: with applications in R. New York: Springer; 2013.

Tayefi M, Esmaeili H, Saberi KM, Amirabadi ZA, Ebrahimi M, Safarian M, et al. The application of a decision tree to establish the parameters associated with hypertension. Comput Methods Programs Biomed. 2017 Feb;139: 83–91. doi:10.1016/j.cmpb.2016.10.020. PMID: 28187897.

Teli S, Kanikar P. A survey on decision tree based approaches in data mining. Int J Adv Res Comput Sci Softw Eng. 2015;5(4):613–7.

Rokach L, Maimon OZ. Data mining with decision trees: theory and applications. Singapore: World Scientific Publishing; 2008

Kass GV. An exploratory technique for investigating large quantities of categorical data. J R Stat Soc Ser C Appl Stat. 1980; 29(2):119–27. doi:10.2307/2986296.

King Hussein Cancer Foundation & Center. (2022). KHCF & KHCC in numbers 2022. Avaliable from https://www.khcc.jo/en/news/khcf-khcc-in-numbers-.

Nisbet R, Elder J, Miner G. Handbook of statistical analysis and data mining applications. Amsterdam: Elsevier Inc.; 2009.

SaravanaKumar K, Arthanariee AM. Evaluate the multiple breast cancer factors and calculate the risk by software tool breast cancer risk evaluator. Indian J Sci Technol. 2015;8(Suppl 7):1–6. doi:10.17485/ijst/2015/v8iS7/69486.

Rokach L, Maimon O. Decision trees. In: Maimon O, Rokach L, editors. Data mining and knowledge discovery handbook. Boston: Springer; 2025. doi:10.1007/0-387-25465-X_9.

Sall J. Monte Carlo Calibration of Distributions of Partition Statistics. Vol. 15. Cary, NC: SAS Institute Inc. 2015.


Full Text: pdf
کاغذ a4 ویزای استارتاپ

Creative Commons License
This work is licensed under a Creative Commons Attribuzione - Non commerciale - Non opere derivate 3.0 Italia License.