Feautre stability based machine learning model selection method with usage of Shapley values

Authors

  • Alexander V. Vorobyev Kursk State University

DOI:

https://doi.org/10.52575/2687-0932-2021-48-2-350-359

Keywords:

machine learning, ensemble algorithms, Shapley value, model precision, data noise resistance

Abstract

In this article the usage of Shapley vectors in regression analysis as a method to reduce destabilizing impact of feature multicollinearity and its usage in interpreting machine learning results is considered. The limitations of its application were defined. A Shapley value based method of stable model selection, allowing for stabilization of models’ precision in event of feature and noise distortion, and for increasing precision of classic and innovative ensemble algorithms while shrinking the dataset is proposed. The developed algorithm was tested on both synthetic and publicly available popular machine learning datasets with different amounts of attributes and observation periods. The experiments showed a stable positive effect of decreasing WMAPE and increasing of the effect upon increasing the feature amount of sampling. The suggested algorithm can be used as a tool to increase the efficiency of the ensemble machine learning algorithms, including the high-speed and high-efficiency algorithms.

Downloads

Download data is not yet available.

Author Biography

Alexander V. Vorobyev, Kursk State University

ostgraduate Cathedra of SISA

References

Багутдинов Р.А., Саргсян Н.А., Красноплахтыч М.А. 2020. Аналитика, инструменты и интеллектуальный анализ больших разнородных и разномасштабных данных. Научные ведомости Белгородского государственного университета. Серия: Экономика. Информатика. 47 (4): 792–802.

Конкурсная платформа по исследованию данных Kaggle Machine Learning Competition Platform (Google). 2020. [Электронный ресурс]. URL: https://www.kaggle.com/datasets (Дата обращения 04.10.2020).

Михеенко А.М., Савич Д.С. 2020. Вестник Балтийского федерального университета им. И. Канта. Сер.: Физико-математические и технические науки. № 2. 84–94.

Ресурс данных для машинного обучения Data.world. 2020. [Электронный ресурс]. URL:https://data.world/ (Дата обращения 26.11.2020).

Aas K., Jullum M., LØland A.2021. Explaining individual predictions when features are dependent: More accurate approximations to shapley values. Artificial Intelligence. 298:103502. DOI10.1016/j.artint.2021.103502.

Alvin E. Roth. 1988. The Shapley value: essays in honor of Lloyd S. Shapley. Cambridge University Press. ISBN0-521-36177-X.

Bochkarev V., Tyurin V., Savinkov A., Gizatullin B. 2018. Application of the LASSO algorithm for fitting the multiexponential data of the NMR relaxometry. Journal of Physics Conference Series. 1141(1):012148. DOI10.1088/1742-6596/1141/1/012148.

Chen T., Guestrin C. 2016. XGBoost: A Scalable Tree Boosting System. arXiv:1603.02754. DOI 10.1145/2939672.2939785.

Ghasemi J.B. Tavakoli H. 2013. Application of Random Forest Regression to Spectral Multivariate Calibration. Analytical Methods. 5 (7):1863–1871. DOI10.1039/C3AY26338J.

Hoerl R. 1987.The Application of Ridge Techniques to Mixture Data: Ridge Analysis. Technometrics. 29 (2):161–172. DOI10.1080/00401706.1987.10488207.

Ke G., Meng Q., Finley T., Wang T., Chen W., Ma W., Ye Q., Tie-Yan Liu. 2017. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. Advances in Neural Information Processing Systems 30 (NIPS).

Landinez-Lamadrid D.C., Ramirez-Ríos D.G., Neira Rodado D., Parra Negrete K. and Combita Niño J.P. 2017. Shapley Value: its algorithms and application to supply chains. INGE CUC, 13 (1): 61–69.

Mason Ch. H., Perreault Jr. W.D. 1991.Collinearity, power, and interpretation of multiple regression analysis. Journal of Marketing Research. 28: 268–280.

Merrick L. and Taly A. 2020. The Explanation Game: Explaining Machine Learning Models Using Shapley Values. Fiddler Labs, Palo Alto, USA. arXiv:1909.08128. DOI10.1007/978-3-030-57321-8_2.

Simske S. J. 2015.The rationale for ensemble and meta-algorithmic architectures in signal and information processing. APSIPA Transactions on Signal and Information Processing. 4: 1–9. DOI10.1017/ATSIP.2015.10.


Abstract views: 337

Share

Published

2021-06-30

How to Cite

Vorobyev, A. V. (2021). Feautre stability based machine learning model selection method with usage of Shapley values. Economics. Information Technologies, 48(2), 350-359. https://doi.org/10.52575/2687-0932-2021-48-2-350-359

Issue

Section

SYSTEM ANALYSIS AND PROCESSING OF KNOWLEDGE