Imputation of Missing Values with Adaptive Elastic Net for Gene Selection in High-dimensional Data
Abstract
References
Algamal, Z. Y., & Lee, M. H. (2019). A two-stage sparse logistic regression for optimal gene selection in high-dimensional microarray data classification. Advances in Data Analysis and Classification, 13(3), 753–771. https://doi.org/10.1007/s11634-018-0334-1
Algamal, Z. Y., Lee, M. H., Al-Fakih, A. M., & Aziz, M. (2017). High-dimensional QSAR classification model for anti-hepatitis C virus activity of thiourea derivatives based on the sparse logistic regression model with a bridge penalty. Journal of Chemometrics, 31(6), e2889. https://doi.org/10.1002/cem.2889
Alharthi, A. M., Lee, M. H., & Algamal, Z. Y. (2022). Improving Penalized Logistic Regression Model with Missing Values in High-Dimensional Data. International Journal of Online and Biomedical Engineering, 18(2), 40–54. https://doi.org/10.3991/ijoe.v18i02.25047
Alon, U., Barka, N., Notterman, D. A., Gish, K., Ybarra, S., Mack, D., & Levine, A. J. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences of the United States of America, 96(12), 6745–6750. https://doi.org/10.1073/pnas.96.12.6745
Bühlmann, P., & van de Geer, S. (2011). Statistics for High-Dimensional Data. In Springer Series in Statistics. Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-20192-9
Chen, Q., & Wang, S. (2013). Variable selection for multiply-imputed data with application to dioxin exposure study. Statistics in Medicine, 32(21), 3646–3659. https://doi.org/10.1002/sim.5783
Chen, Y., Wang, A., Ding, H., Que, X., Li, Y., An, N., & Jiang, L. (2016). A global learning with local preservation method for microarray data imputation. Computers in Biology and Medicine, 77, 76–89. https://doi.org/10.1016/j.compbiomed.2016.08.005
Deng, Y., Chang, C., Ido, M. S., & Long, Q. (2016). Multiple Imputation for General Missing Data Patterns in the Presence of High-dimensional Data. Scientific Reports, 6(1), 21689. https://doi.org/10.1038/srep21689
Doerken, S., Avalos, M., Lagarde, E., & Schumacher, M. (2019). Penalized logistic regression with low prevalence exposures beyond high dimensional settings. PLOS ONE, 14(5), e0217057. https://doi.org/10.1371/journal.pone.0217057
El Guide, M., Jbilou, K., Koukouvinos, C., & Lappa, A. (2020). Comparative study of L 1 regularized logistic regression methods for variable selection. Communications in Statistics - Simulation and Computation, 1–16. https://doi.org/10.1080/03610918.2020.1752379
Fan, J., & Li, R. (2001). Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties. Journal of the American Statistical Association, 96(456), 1348–1360. https://doi.org/10.1198/016214501753382273
Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1–22. https://doi.org/https://www.ncbi.nlm.nih.gov/pubmed/20808728
Geronimi, J., & Saporta, G. (2017). Variable selection for multiply-imputed data with penalized generalized estimating equations. Computational Statistics & Data Analysis, 110, 103–114. https://doi.org/10.1016/j.csda.2017.01.001
Ghosh, S. (2011). On the grouped selection and model complexity of the adaptive elastic net. Statistics and Computing, 21(3), 451–462. https://doi.org/10.1007/s11222-010-9181-4
Holman, R., & Glas, C. A. W. (2005). Modelling non‐ignorable missing‐data mechanisms with item response theory models. British Journal of Mathematical and Statistical Psychology, 58(1), 1–17.
Honaker, J., King, G., & Blackwell, M. (2011). Amelia II: A program for missing data. Journal of Statistical Software, 45(7), 1–47.
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning (Vol. 112). Springer.
Jiang, W., Josse, J., & Lavielle, M. (2020). Logistic regression with missing covariates—Parameter estimation, model selection and prediction within a joint-modeling framework. Computational Statistics & Data Analysis, 145, 106907. https://doi.org/10.1016/j.csda.2019.106907
Khan, S. I., & Hoque, A. S. M. L. (2020). SICE: an improved missing data imputation technique. Journal of Big Data, 7(1), 37. https://doi.org/10.1186/s40537-020-00313-w
Kwak, S. K., & Kim, J. H. (2017). Statistical data preparation: management of missing values and outliers. Korean Journal of Anesthesiology, 70(4), 407.
Li, X., Wang, Y., & Ruiz, R. (2020). A Survey on Sparse Learning Models for Feature Selection. IEEE Transactions on Cybernetics, 1–19. https://doi.org/10.1109/TCYB.2020.2982445
Liang, Y., Liu, C., Luan, X.-Z., Leung, K.-S., Chan, T.-M., Xu, Z.-B., & Zhang, H. (2013). Sparse logistic regression with a L1/2 penalty for gene selection in cancer classification. BMC Bioinformatics, 14(1), 198. https://doi.org/10.1186/1471-2105-14-198
Little, R. J. A., & Rubin, D. B. (2019). Statistical analysis with missing data (Vol. 793). John Wiley & Sons.
Liu, C., & Wong, H. S. (2019). Structured Penalized Logistic Regression for Gene Selection in Gene Expression Data Analysis. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 16(1), 312–321. https://doi.org/10.1109/TCBB.2017.2767589
Manhrawy, I. I. M., Qaraad, M., & El‐Kafrawy, P. (2021). Hybrid feature selection model based on relief‐based algorithms and regulizer algorithms for cancer classification. Concurrency and Computation: Practice and Experience, January, 1–17.
https://doi.org/10.1002/cpe.6200
Pelckmans, K., De Brabanter, J., Suykens, J. A. K., & De Moor, B. (2005). Handling missing values in support vector machine classifiers. Neural Networks, 18(5–6), 684–692. https://doi.org/10.1016/j.neunet.2005.06.025
Peng, H., Fu, Y., Liu, J., Fang, X., & Jiang, C. (2013). Optimal gene subset selection using the modified SFFS algorithm for tumor classification. Neural Computing and Applications, 23(6), 1531–1538. https://doi.org/10.1007/s00521-012-1148-2
Rubin, D. B. (1996). Multiple imputation after 18+ years. Journal of the American Statistical Association, 91(434), 473–489.
Rubin, D. B. (2004). Multiple imputation for nonresponse in surveys (Vol. 81). John Wiley & Sons.
Singh, D., Febbo, P. G., Ross, K., Jackson, D. G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A. A., D’Amico, A. V, & Richie, J. P. (2002). Gene expression correlates of clinical prostate cancer behavior. Cancer Cell, 1(2), 203–209.
Su, Y.-S., Gelman, A. E., Hill, J., & Yajima, M. (2011). Multiple imputation with diagnostics (mi) in R: Opening windows into the black box. Journal of Statistical Software, 45(2), 1–31. https://doi.org/https:// doi/10.7916/D8VQ3CD3
Tharwat, A. (2021). Classification assessment methods. Applied Computing and Informatics, 17(1), 168–192. https://doi.org/10.1016/j.aci.2018.08.003
Tibshirani, R. (1996). Regression Shrinkage and Selection Via the Lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267–288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x
Van Buuren, S., & Groothuis-Oudshoorn, K. (2011). Multivariate imputation by chained equations in R. Journal of Statistical Software, 45(3), 1–67.
Wang, A., Yang, J., & An, N. (2021). Regularized Sparse Modelling for Microarray Missing Value Estimation. IEEE Access, 9, 16899–16913. https://doi.org/10.1109/ACCESS.2021.3053631
Zahid, F. M., Faisal, S., & Heumann, C. (2020). Variable selection techniques after multiple imputation in high-dimensional data. Statistical Methods & Applications, 29(3), 553–580. https://doi.org/10.1007/s10260-019-00493-7
Zahid, F. M., Faisal, S., & Heumann, C. (2021). Multiple imputation with compatibility for high-dimensional data. PLOS ONE, 16(7), e0254112. https://doi.org/10.1371/journal.pone.0254112
Zahid, F. M., & Heumann, C. (2019). Multiple imputation with sequential penalized regression. Statistical Methods in Medical Research, 28(5), 1311–1327. https://doi.org/10.1177/0962280218755574
Zhang, Z. (2015). Missing values in big data research: some basic skills. Annals of Translational Medicine, 3(21), 323. https://doi.org/https://doi.org/10.3978/j.issn.2305-5839.2015.12.11
Zhao, Y., & Long, Q. (2016). Multiple imputation in the presence of high-dimensional data. Statistical Methods in Medical Research, 25(5), 2021–2035. https://doi.org/10.1177/0962280213511027
Zou, H. (2006). The Adaptive Lasso and Its Oracle Properties. Journal of the American Statistical Association, 101(476), 1418–1429. https://doi.org/10.1198/016214506000000735
Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320. https://doi.org/10.1111/j.1467-9868.2005.00503.x
Zou, H., & Zhang, H. H. (2009). On the adaptive elastic-net with a diverging number of parameters. The Annals of Statistics, 37(4), 1733–1751. https://doi.org/10.1214/08-AOS625
Full Text: pdf


