Improving Stacking Methodology for Combining Classifiers; Applications to Cosmetic Industry
Abstract
way of linearly combining several models. We modify the usual stacking
methodology when the response is binary and predictions highly correlated,by combining predictions with PLS-Discriminant Analysis instead of ordinary least squares. For small data sets we develop a strategy based on repeated split samples in order to select relevant variables and ensure the robustness of the nal model. Five base (or level-0) classiers are combined in order to get an improved rule which is applied to a classical benchmark of UCI Machine Learning Repository. Our methodology is then applied to the prediction of dangerousness of 165 chemicals used in the cosmetic industry, described by 35 in vitro and in silico characteristics, since faced to safety constraints, one cannot rely on a single prediction method, especially when the sample sizeis low.
References
Anderson, J.A. (1995). An Introduction to Neural Networks. Cambridge. MA: MIT Press.
Barker, M. and Rayens, W. (2003). PLS for discrimination. J. Chemometrics, 17 :166-173.
Bastien, P., Esposito-Vinzi, V. and Tenenhaus, M. (2005). PLS generalised linear regression. Computational Statistics & Data Analysis, 48 : 1746.
Bertrand, F., Magnanensi, J., Meyer, N. and Maumy-Bertrand, M. (2014). plsRglm:Algorithmic insights and applications. Edited: June 2014; Compiled: July 17, 2014 https://cran.r-project.org/web/packages/plsRglm/plsRglm.pdf
Bougeard, S., Hanafi, M., Nocairi, H., and Qannari, E.M. (2006). Multiblock canonical correlation analysis for categorical variables: application to epidemiological data. In Greenacre, M., Blasius, J. Eds Multiple correspondence analysis and related methods.
Chapman & Hall/CRC, 393-404.
Breiman, L. (1996); Stacked regressions. Machine Learning, 24 : 49-64.
Buhlmann, P., and Hothorn, T. (2007). Boosting Algorithms: Regularization, Prediction and Model Fitting. Statistical Science, 22, 7, 477-505.
Chun, H. and Keles, S. (2010). Sparse partial least squares for simultaneous dimension reduction and variable selection. Journal of the Royal Statistical Society, Series B ,Vol. 72, p. 3-25.
Clarke, B. (2003); Comparing Bayes Model Averaging and Stacking When Model Approximation.Journal of Machine Learning Research, 4, 683-712.
Cortes, C. and Vapnik, V. (1995). Support-vector network. Machine Learning, 20, 1-25.
Deville, J.C. and Tillé ,Y. (2004). Ecient balanced sampling : The cube method.Biometrika, 91 (4) : 893-912.
Freund, Y. and Schapire, R. (1997). A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. Journal of Computer and System Sciences,55, 1, 119-139.
Friedman, J. H., Hastie, T. and Tibshirani, R. (2000). Additive Logistic Regression: a Statistical View of Boosting. The Annals of Statistics, Vol. 28, No. 2, 337-407.
Gomes, C., Nocairi , H. , Thomas, M. , Ibanez, F. , Collin, J. , Saporta, G. (2012). Stacking prediction for a binary outcome. Proceedings of Compstat 2012, 20th International Conference on Computational Statistics, Limassol, Cyprus, 271-282.
Gomes, C., Nocairi , H. , Thomas, M., Collin, J. , Saporta, G. (2014). A simple and robust scoring technique for binary classication. Articial Intelligence Research, vol. 3(1), 52-58.
Hand, D.J. and Yu, K. (2001). Idiot's Bayes-not so stupid after all? International Statistical Review, 69, 385-398.
Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning. 2nd edition, Springer.
Jacobs, R.A. (1995). Methods for combining experts' probability assessments. Neural computation, 7, 867-888.
Kaufman, L. and Rousseeuw, P.J. (1987). Clustering by means of medoids (PAM). In Dodge,Y. (ed.). Statistical Data Analysis Based on the L1-norm and Related Methods. North Holland, Amsterdam, 405-416.
Klimisch, H.J. , Andreae, M. and Tillmann, U. (1997). A Systematic Approach for Evaluating the Quality of Experimental Toxicological and Ecotoxicological Data. Regulatory Toxicology and Pharmacology,25, 1-5.
Kuncheva, L. (2014). Combining pattern classiers, methods and algorithms. 2nd edition, J.Wiley & sons.
Leblanc, M. and Tibshirani, R. (1996). Combining Estimates in Regression and Classification. Journal of the American Statistical Association, 91, 436, 1641-1650.
Nocairi, H., Qannari, E.M., Vigneau E., and Bertrand D. (2005). Discrimination on latent components with respect to patterns. Application to multicollinear data. Computational Statistics & Data Analysis, 48, 139-147.
Quinlan, J.R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann.
Shapire, R.E. (1990). The strength of weak learnability. Machine Learning, 5(2),197-227.
Spath, H. (1979). Clusterwise linear regression. Computing, 22, 367-373.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist.Soc. Ser., B 58, 267-288.
Ting, K.M. and Witten, I.H. (1999). Issues in stacked generalization. J. Artif. Intell.Res, 10, 271-289.
Wold H. (1966). Estimation of principal components and related models by iterative least squares, in Multivariate Analysis. Krishnaiah P. R. (Ed.), Academic Press, New York,391-420.
Wold S., Martens, H. and Wold, H. (1983). The multivariate calibration problem in chemistry solved by the PLS method. In : Matrix pencils. Springer Berlin Heidelberg,286-293.
Wolpert, D. (1992). Stacked Generalization. Neural Networks, 5, 41-259
Zhou, Z-H. (2012). Ensemble Methods: Foundations and Algorithms. Chapman & Hall.
Full Text: pdf