Coupling video vision transformer (ViVit) into land change simulation: a comparison with three-dimensional convolutional neural network (3DCNN)

Li, Haiyang; Fan, Liang; Gao, Yifan; Liu, Zhao; Gao, Peichao

doi:10.6084/m9.figshare.25289186.v1

Coupling video vision transformer (ViVit) into land change simulation: a comparison with three-dimensional convolutional neural network (3DCNN)

journal contribution

posted on 2024-02-26, 15:00 authored by Haiyang Li, Liang Fan, Yifan Gao, Zhao Liu, Peichao Gao

To enhance land use/cover change (LUCC) simulation accuracy, we introduced ViViT-ANN-CA, blending video vision transformer’s spatio-temporal features extraction ability, artificial neural network‘s (ANN) non-linearity computing ability, and CA’s spatial computing. Compared to 3DCNN-ANN-CA, ViViT-ANN-CA showed higher accuracy in simulating water bodies and vegetation, with overall improvements in Hailing District and Wuxi City. ViViT demonstrates comparable spatio-temporal feature extraction ability to three-dimensional convolutional neural network (3DCNN), promising for future ynamic LUCC simulations.