Automated Reconstruction of Missing Geospatial Data at the Pre-Modeling Stage of Exogenous Geological Processes

Digital technologies for Agricultural and Spatial Territory Planning

Authors

First and Last Name Academic degree E-mail Affiliation
Vasyl Hudak No gudak_vasyl [at] knu.ua Taras Shevchenko National University of Kyiv
Kyiv, Ukraine
Tetiana Kril Ph.D. kotkotmag [at] gmail.com Institute of Geological Sciences of the NAS of Ukraine
Kyiv, Ukraine
Vitalii Zatserkovnyі Sc.D. vitalii.zatserkovnyi [at] gmail.com Taras Shevchenko National University of Kyiv
Kyiv, Ukraine

I and my co-authors (if any) authorize the use of the Paper in accordance with the Creative Commons CC BY license

First published on this website: 17.08.2025 - 18:05
Abstract 

An approach is proposed for preparing geospatial datasets with missing values for the modeling of exogenous geological processes using neural networks. Based on an analysis of current data reconstruction techniques, the choice of the iterative imputation algorithm with Random Forest as the base estimator is justified. Key advantages of this method include its ability to capture complex nonlinear relationships, robustness to noise, and preservation of the correlation structure of features. The algorithm was implemented in the Python environment. The methodology was tested using monitoring data on the technical condition of engineering structures of the Kyiv-Pechersk Lavra. In the test case, 19% of missing values were reconstructed, achieving a root mean square error of approximately 5% and a coefficient of determination (R²) exceeding 0.9, which confirmed the high accuracy of recovery. The reconstructed data improved the quality of training datasets and the stability of geohazard zone predictions by neural networks. The proposed approach is recommended for the preprocessing of complex-structured geospatial data in tasks related to the prediction of hazardous geological processes.

References 

Guzzetti, F., Reichenbach, P., Cardinali, M., Galli, M., & Ardizzone, F. (2006). Landslide hazard assessment in the Collazzone area, Umbria, Central Italy. Natural Hazards and Earth System Sciences, 6(1), 115–131. https://doi.org/10.5194/nhess-6-115-2006

Hudak, V., & Kril, T. (2025). Gaussian process regression for D-InSAR analysis of vertical surface displacements above underground structures. In 18th International Conference Monitoring of Geological Processes and Ecological Condition of the Environment (Vol. 2025, pp. 1–5). European Association of Geoscientists & Engineers. https://doi.org/10.3997/2214-4609.2025510054

Hudak, V., Kril, T., & Zatserkovnyi, V. (2025). Remote monitoring of vertical surface displacements as indicators of deformation of underground structures. Visnyk of Taras Shevchenko National University of Kyiv. Geology, 1(108), 94–102. https://doi.org/10.17721/1728-2713.108.13

Kanevski, M., Timonin, V., & Pozdnukhov, A. (2009). Machine learning for spatial environmental data: Theory, applications, and software (1st ed.). EPFL Press. https://doi.org/10.1201/9781439808085

Kril, T., Cherevko, I., & Shekhunova, S. (2024). A ranking analysis of geological and engineering factors of historical monuments’ stability response: A case study of Kyiv-Pechersk Lavra, Ukraine. Buildings, 14(10), 3152. https://doi.org/10.3390/buildings14103152

Pierdicca, R., & Paolanti, M. (2022). GeoAI: A review of artificial intelligence approaches for the interpretation of complex geomatics data. Geoscientific Instrumentation, Methods and Data Systems, 11, 1–28. https://doi.org/10.5194/gi-2021-32

Zhou, C., Liu, J., Wang, J., & Li, Y. (2020). Geological disaster monitoring and early warning system based on big data analysis. Arabian Journal of Geosciences, 13(18), 1–12. https://doi.org/10.1007/s12517-020-05951-1

Mousavi, S. M., Ellsworth, W. L., Zhu, W., Yoon, C. E., & Beroza, G. C. (2020). Earthquake transformer—An attentive deep-learning model for simultaneous earthquake detection and phase picking. Nature Communications, 11, 3952. https://doi.org/10.1038/s41467-020-17591-w

Pantanowitz, A., & Marwala, T. (2008). Evaluating the impact of missing data imputation through the use of the random forest algorithm. arXiv. https://doi.org/10.48550/arXiv.0812.2412

Tang, F., & Ishwaran, H. (2017). Random forest missing data algorithms. arXiv. https://doi.org/10.48550/arXiv.1701.05305

Zhao, C., Chen, N., Huang, R., Zhang, L., Li, W., Liu, Y., & Zhang, Y. (2024). Deep learning for exploring landslides with remote sensing and geo-environmental data: Frameworks, progress, challenges, and opportunities. Remote Sensing, 16(8), 1344. https://doi.org/10.3390/rs16081344

Goetz, J., Guthrie, R., & Brenning, A. (2011). Integrating physical and empirical landslide susceptibility models using generalized additive models. Geomorphology, 129(3), 376–386. https://doi.org/10.1016/j.geomorph.2011.03.001