Taylor & Francis Group
Browse

Coupling video vision transformer (ViVit) into land change simulation: a comparison with three-dimensional convolutional neural network (3DCNN)

Download (504.4 kB)
journal contribution
posted on 2024-02-26, 15:00 authored by Haiyang Li, Liang Fan, Yifan Gao, Zhao Liu, Peichao Gao

To enhance land use/cover change (LUCC) simulation accuracy, we introduced ViViT-ANN-CA, blending video vision transformer’s spatio-temporal features extraction ability, artificial neural network‘s (ANN) non-linearity computing ability, and CA’s spatial computing. Compared to 3DCNN-ANN-CA, ViViT-ANN-CA showed higher accuracy in simulating water bodies and vegetation, with overall improvements in Hailing District and Wuxi City. ViViT demonstrates comparable spatio-temporal feature extraction ability to three-dimensional convolutional neural network (3DCNN), promising for future ynamic LUCC simulations.

Funding

This work was supported by the National Natural Science Foundation of China [42171088]; State Key Laboratory of Earth Surface Processes and Resource Ecology [2022-ZD-04].

History