Spatial autoregressive partially linear varying coefficient models

ABSTRACT In this article, we consider a class of partially linear spatially varying coefficient autoregressive models for data distributed over complex domains. We propose approximating the varying coefficient functions via bivariate splines over triangulation to deal with the complex boundary of the spatial domain. Under some regularity conditions, the estimated constant coefficients are asymptotically normally distributed, and the estimated varying coefficients are consistent and possess the optimal convergence rate. A penalized bivariate spline estimation method with a more flexible choice of triangulation is proposed. We further develop a fast algorithm to calculate the geodesic distance. The proposed method is much more computationally efficient than the local smoothing methods, and thus capable of handling large scales of spatial data. In addition, we propose a model selection approach to identify predictors with constant and varying effects. The performance of the proposed method is evaluated by simulation examples and the Sydney real estate dataset.


Introduction
Recently, the wide availability of data observed over space, due to the widespread collection of networks and inexpensive geographical information systems, has stimulated the spatial data analysis. The scale and the complexity of the spatial data at the moment are far beyond our imagination. Complex data call for statistical models that are sufficiently flexible to adapt to underlying signals. Varying coefficient models have attracted significant attention in semiparametric or nonparametric regression studies, see for examples Hastie and Tibshirani (1993), Huang, Wu, and Zhou (2002), Fan and Huang (2005), and Yang, Park, Xue, and Härdle (2006). In spatial regression, spatially varying coefficient models (SVCMs) have also gained widespread popularity in recent years, enhancing the capability of spatial analysis by exploring spatial non-stationarity of a regression relationship in geo-referenced data analysis.
To estimate the varying coefficient functions in SVCMs, there are three popular methods, including Bayesian approach (Gelfand, Kim, Sirmans, and Banerjee 2003), local smoothing method (Brunsdon, Fotheringham, and Charlton 1996;Qingguo 2013) and basis expansion approach (Mu, Wang, and Wang 2018). Most existing articles on SVCMs focused on the modeling and methodology developments (Fotheringham, Brunsdon, and Charlton 2002), and those with theoretical justifications usually considered the situation where the data are observed on grid points over a rectangular domain. Mu et al. (2018) studied the asymptotic properties of the estimator of SVCM for spatial data collected over a complex domain, however, the errors are assumed to be independent, which is too strong to be realistic for spatial data.
To incorporate the spatial dependence and balance the interpretability and flexibility of the SVCM, in this article, we consider a spatial autoregressive partially linear varying coefficient model (SAR-PLVCM). For i = 1, . . . , n, let Y i represent the response variable, Z i = (Z i1 , . . . , Z ip 1 ) be a p 1 -dimensional vector of explanatory variables which are linearly associated with the response variable, and X i = (X i1 , . . . , X ip 2 ) be a p 2 -dimensional vector of explanatory variables, which have varying relationship with the response across different locations. Next, let U i = (U i1 , U i2 ) be the location of the i-th observation randomly sampled from an arbitrary shaped spatial domain, . Then the SAR-PLVCM can be expressed as: where α 0 ∈ [0, 1] is a global parameter, w ij is the weight of the neighbor effects, satisfying w ii = 0, and j =i w ij = 1 for any i, j = 1, . . . , n, γ 0l 's are unknown constant coefficient parameters, and β 0k (·)'s are unknown varying-coefficient functions, the i 's are i.i.d random noises with E( i ) = 0 and Var( i ) = σ 2 0 , and each i is independent of Z i and X i . Model (1) encompasses several existing models as special cases, such as the SVCM in Mu et al. (2018) when all the covariates are assumed to have a varying effect on the response and neighbor effects are not considered; and the SAR model when all the covariates are assumed to have a constant effect on the response.
To estimate the unknown parameters in (1), Sun, Yan, Zhang, and Lu (2014) proposed a profile likelihood-based estimation procedure using the local smoothing technique. However, with the increasing volume of the spatial data, the computation burden of the local smoothing method is extremely high and usually out of the reach of a typical personal computer. In addition, although the traditional local smoothing method works well for a rectangular domain, it will encounter the well known 'leakage' problem when smoothing across the complex boundary which is common in practice. To overcome those difficulties, in this article, we propose using the bivariate spline over triangulation method for estimation. Furthermore, to handle the spatial dependence, SAR-PLVCM adds a weighted average of the neighbors to the explanatory variables. As a result, to estimate the SAR-PLVC model in (1), one very important step is to determine the weight matrix W ≡ {w ij } n i,j=1 , which is typically based on the distance between observations. One conventional way to define the distance is by using the Euclidean distance. However, it completely ignores the shape of the domain when finding the shortest path from the start point to the endpoint. For a complicated and irregular shaped domain, the line segment that connects two points is often not included in the domain, and thus the weights based on Euclidean distance may not be appropriate. A better strategy is using the geodesic distance, which constrains the path between two points to be within the shape of the spatial domain. For example, Wang and Ranalli (2007) proposed the great-circle distance, which is the shortest distance measured along the surface of a sphere. They considered a restricted graph G with every node only connected to its k nearest neighbors. However, k must be carefully selected to make sure that G is connected and avoid the case that endpoint is also among the k-nearest neighbors of the other. In addition, the computational burden dramatically increases for a large-scale dataset. To approximate of the geodesic distance accurately and efficiently, in this article we propose two algorithms based on the triangulation technique.
Another major concern when fitting model (1) is the lack of prior knowledge on true model structure, specifically, which coefficients are constant and which are really varying over space. In this article, we propose a backward selection via Bayesian information criterion (BIC) for determining the linear and varying components in the SAR-PLVCM. Our numerical studies confirm the superb performances of our method in terms of estimation accuracy, model structure selection, and computational time.
The rest of the article is organized as follows. In Section 2 we present our estimation procedure. Section 3 is devoted to the asymptotic analysis of the proposed estimators. In Section 4, we discuss the details of the implementation, such as how to estimate the geodesic distance, and how to choose the penalty, and how to separate the constant and varying coefficient in the SAR-PLVCM. Section 5 presents simulation results comparing our method with its competitors. Section 6 illustrates our method using the Sydney housing price dataset. Technical details are provided in Appendix A and the online Supplementary Material.

Bivariate spline over triangulation
Note that in the above maximum likelihood estimation, we need to implement some appropriate smoothing method to approximate the coefficient functions. We consider the bivariate spline approximation over triangulations (BST) proposed in Mu et al. (2018). Below we briefly describe the techniques of triangulations and BST when α is fixed in (4). As shown in Lai and Wang (2013) and Mu et al. (2018), triangulation is an effective strategy to handle data distributed on the domain with complex boundaries. We use τ to denote a triangle over a plane, and then a triangulation, , of is defined as a collection of K triangles, = {τ 1 , . . . , τ K }, within which if any two triangles intersect, their intersection is either a common vertex or a shared edge. Let |τ | denote the longest edge length for triangle τ ∈ , and ρ τ denote the inscribed circle radius with respect to τ . Define the shape parameter of τ as R τ = |τ |/ρ τ , and the size of by the length of the longest edge of , i.e. | | = max{|τ |, τ ∈ }.
For nonnegative integers d, r, and a triangulation , define S r d ( ) = {g ∈ C r ( ) : s| τ ∈ P d (τ ), τ ∈ }, where C r ( ) is the collection of all rth continuously differentiable functions over , s| τ is the polynomial piece of spline s restricted on triangle τ , and P d is the space of all polynomials of degree less than or equal to d. It has been proved in Lai and Schumaker (2007) that when the datasets are noise-free, a spline space with d = 3r + 2 has the optimal approximation rate. In this article, we consider the spline space S ≡ S r 3r+2 ( ).
, we consider the following minimization problem: min γ l ∈R,l=1,...,p 1 g k ∈S,k=1,...,p 2 where g k (·), k = 1, . . . , p 2 , can be approximated by bivariate splines. For any k = 1, . . . , p 2 , denote {B j } j∈J k the set of bivariate Bernstein basis polynomials for S. Then g k (u) can be represented as the linear combination of Bernstein basis: is the spline coefficient vector. To guarantee the smoothness requirement of the splines, it is necessary to add some constraints on the spline coefficients. Denote H k the constraint matrix on the coefficients c k , which depends on d, r and the structure of the triangulation. We refer to Section B.2.1 of the Supplementary Materials of Yu, Wang, Wang, Liu, and Yang (2019) for examples of H k . Thus, we consider the following constrained minimization: Next, we eliminate the constraint by applying a QR decomposition on H k = Q k R k = (Q 1,k Q 2,k ) R 1,k R 2,k , where Q k is an orthogonal matrix and R 1,k is an upper triangle matrix, the submatrix Q 1,k is the first r k columns of Q k , where r k is the rank of matrix H k , and R 2,k is a matrix of zeros. We then consider the minimization of c k using c k = Q 2,k θ k , then it is guaranteed that H k c k = 0. Thus, the above minimization problem can be converted to a conventional regression problem without any constraint: For simplicity, we assume B(u) = B 1 (u) = · · · = B p 2 (u) = {B j (u)} j∈J , then H 1 = · · · = H p 2 and Q 2,1 = · · · = Q 2,p 2 . Denote θ = (θ 1 , . . . , Then the minimization problem in (6) can be written as Let '⊗' denote the Kronecker product. Let Z = (Z 1 , . . . , Z n ) , X B = (X 1 ⊗ B * (U 1 ), . . . , X n ⊗ B * (U n )) , and = (Z, X B ). Solving the least squares problem in (7), we obtain (5). Therefore, for any fixed α, one obtains the bivariate spline estimator of c k and β k (·) as follows: c k (α) = Q 2 θ k (α) and β k (u; α) = B(u) c k (α).
Then, we replace α by α to obtain the estimator of β 0 and σ 2 0 .

Asymptotic results
This section provides the asymptotic results for the proposed estimators. To facilitate the presentation of the main results, we introduce the following notations. Denote G = WT −1 . By (2) and (3), and the fact that α 0 G + I n = T −1 , we have The proofs of the following theorems are provided in the Appendix. Theorem 3.1 shows the consistency of the estimators α, σ 2 , and based on this result, we are able to derive the following asymptotic normality. Let ψ = (α, σ 2 , γ ) , and ψ 0 = Lemma A.3 in the Supplementary Material gives the explicit formula of n and n .
Theorem 3.2 implies that the constant coefficient estimators in the proposed method have the convergence rate of order n −1/2 when is nonsingular. Using these two theoretical results, we can establish the convergence rate of the BST estimator β k , k = 1, . . . , p 2 , as stated in Theorem 3.3 below.

Implementation
In this section, we discuss some details in the implementation of the proposed method.

Distance and shortest path over irregular domain
There are two popular row-normalized weight functions in spatially autoregressive regression (SAR) models: (i) exponential weights: where d ij measures the distance between locations i and j.
In the following, we develop two efficient algorithms to approximate the geodesic distance over a complex domain based on the triangulation technique. When the sample size is moderately large, for example, at hundreds or thousands level, we propose Algorithm 1 for calculating the distance between any sampled locations. In our extensive simulation studies, we find the number of auxiliary points m doesn't need to be very large, usually, m = 1000 is sufficient to construct a fine triangulation. In Algorithm 1, δ is used for determining the threshold of the convergence, which can be set as | depending on the interest is the distances or the weights. Section S.5 in the Supplementary Material provides the comparison of the weights calculated based on the estimated distances by Algorithm 1, and those calculated based on the 'oracle' geodesic distances using the approach proposed by Miller and Wood (2014). It shows the convergence of this algorithm as S increases. Based on these numerical studies, we recommend setting S = 30-50 for small sample sizes and S = 10-30 for sample size larger than 2000. Algorithm 1 is much faster than the traditional methods in the literature, however, according to the simulation studies, it is still not fast enough when a dataset contains more than tens of thousands of observations. For such a large scale, we propose a fast version of Algorithm 1; see Algorithm 2 below. Simulation studies in Section S.5 in the Supplementary Material demonstrate the performance of Algorithm 2. From Table S.2, one sees that, when the sample size is large, Algorithm 2 is much faster than Algorithm 1 without sacrificing much accuracy. The Matlab code for Algorithms 1 and 2 can be found from https://github.com/funstatpackages/SARVCM.

Bivariate penalized spline
For fitting sparse spatial data, bivariate penalized splines over triangulation (Lai and Wang 2013, BPST) is more flexible than the unpenalized bivariate splines method referred to as the BST method in Section 2.2. To define the penalized spline method, for directions u 1 and u 2 , denote D q u j f (x) as the q-th order derivative in the direction u j at the point u.
, we now consider the following penalized minimization: where E(g k ) = {(D 2 u 1 g k ) 2 + 2(D u 1 D u 2 g k ) 2 + (D 2 u 2 g k ) 2 } du 1 du 2 , and λ k > 0, k = 1, . . . , p 2 , are the tuning parameters which control the smoothness of the functions g k 's. Adopting the bivariate splines approximation approach, we solve the following minimization Algorithm 1: Algorithm to estimate the geodesic distance matrix.
Data: Location of sample observations {U i } n i=1 and boundary of the domain Output: (ii) add m random auxiliary points over 5 (iii) construct triangulation mesh using the auxiliary points and the original sample points as vertices 6 (iv) for any two points U i and U j , do , the shortest path through the edges of based on the edge lengths} Similar to Section 2.2, we wipe out the smoothness constraint by the QR decomposition of H k . Then the minimization problem is now converted to the following: Let = diag(λ 1 , . . . , λ p 2 ), and D = diag{0, ⊗ (Q 2 PQ 2 )}, where 0 is a p 1 × p 1 matrix of zeros. Assuming the same set of spline basis functions is used for all the coefficient functions, we obtain that by solving the least squares problem in (14). We choose the best combination of (λ 1 , . . . , λ p 2 ) by minimizing the generalized cross-validation (GCV): where m = tr(S ) can be regarded as the effective degrees of freedom (edf) for the model.

Model selection
A natural question in SAR-PLVCM is how to determine which explanatory variables have a linear effect and which ones have a varying effect. As shown in Section 3, if the choice Set d 0 ij = ∞ 6 Create triangulation mesh, = {τ 1 , . . . , τ K }, and find the centroid of each triangle τ k ∈ , k = 1, . . . , K 7 Using Algorithm 1, find the geo-desic distance matrix, C = {c kl } K n k,l=1 , where c kl is the geodesic distance between the centroids of τ k and τ l 8 for any two points U i ∈ τ k and U j ∈ τ l do 9 if τ k = τ l then 10 Set d ij = the Euclidean distance between U i and U j 11 else if τ k = τ l then 12 (i) Calculate the Euclidean distance between the point and the corresponding centroid, v ik and v jl of constant effect is correctly specified, the bias in the estimation of these components is eliminated and root-n convergence rates can be obtained for the constant coefficients.
To separate the constant and varying coefficients, we propose a backward selection type of algorithm based on BIC = −2 log( n ) + m log(n), where m is the edf as in (15). In Algorithm 3, we present our model selection procedure using BIC.

Simulation studies
In this section, we investigate the numerical performance of the proposed model selection and estimation method. R code of the simulation studies presented in this section can be accessed from Github: https://github.com/funstatpackages/SARVCM. We conduct two simulation studies based on two types of spatial domains: (1) a regular rectangular domain, and (2) an irregular horseshoe domain.
In both simulation studies, for any two different locations i and j, we consider the exponential weights w ij = exp(−10d ij )/ k =i exp(−10d ik ), where d ij is the Euclidean distance for the rectangular domain in Simulation Study 1, and d ij is the geodesic distance for the horseshoe domain with a complex boundary in Simulation Study 2. In both examples, when implementing the proposed BPST method, we use Algorithm 1 to approximate the true distance. For the kernel method proposed in Sun et al. (2014), the Euclidean distance is used.

Simulation study 1
Note that the Kernel method proposed in Sun et al. (2014) only works for data distributed on rectangular domains, in this example, we consider a regular squared domain, = [0, 1] 2 . Similar to Sun et al. (2014), we generate data from the following SAR-PLVCM: Following Sun et al. (2014), we set α 0 = 0.5, β 01 (u) = sin( u 2 π), β 02 (u) = cos( u 2 π) and β 03 (u) = exp( u 2 ). Moreover, we add two constant components into model (16)  We first investigate the performance of the BST and BPST methods when the true model structure is known, referred to as BST-ORACLE and BPST-ORACLE. To examine the accuracy of the proposed estimation procedure, we calculate the mean squared error (MSE) for the estimators of the constant coefficients and the estimator of the neighborhood effect parameter. We also calculate the mean integrated squared error (MISE) for the functional coefficient estimators. Table 1 Table 1 and Figures S.2 and S.3, it is easy to see that the estimation accuracy improves for both the BST and BPST methods as n increases. From Table 1, we also observe that, for the BPST method, two different triangulations yield similar results, while for the BST method, when the sample size is moderate, a fine triangulation is usually not  suggested. Overall, the BPST is more convenient for model fitting than the BST since it is less sensitive to the triangulations used in the estimation. Next, we investigate the performance of the proposed model identification method, abbreviated as BST-BIC and BPST-BIC, in terms of model selection and estimation accuracy. We start with the spatial autoregressive model with all varying coefficients and then implement our proposed Algorithm 3 to choose the model structure. For the BST-BIC and BPST-BIC methods, we still consider two triangulations 1 and 2 as described above. For the Kernel-BIC method, if the component is identified to be a constant, Sun et al. (2014) used the average of the estimated coefficient function of all sampled locations to estimate the constant coefficient. It is well known that the bandwidth of the kernel exhibits a strong influence on the model selection and estimation. Since Sun et al. (2014) did not give explicit suggestion on the bandwidth selection in the model selection, to illustrate the effect of the bandwidth, we consider different bandwidths in the model  selection. In all the simulation examples, the bandwidth for kernel estimation is selected by cross-validation first, and we denote it as h cov . For model selection, we adjust h cov by multiplying a constant c, where c = 0.5, 1.0, 2.0, 4.0. We then refit the selected model using bandwidth h cov . Table 2 presents the proportion of M c selected among 200 replications using different methods. From Table 2, one sees that both the BST-BIC and the BPST-BIC far outperform the Kernel-BIC in identifying the correct model structure. The Kernel-BIC seems to be very sensitive to the choice of the bandwidth. When a smaller bandwidth is used, more varying components will be identified, whereas when a larger bandwidth is used, more linear components will be identified. In contrast, BST-BIC and BPST-BIC methods can effectively identify constant and varying components regardless of the choice of the triangulation. The proportions of correctly selected nonzero constant and varying components are very close to perfect conditions in almost of all the scenarios. The estimation results are displayed in Table 1, in which the 'ORACLE' estimators serve as a gold standard. In this setting, the proposed BPST-BIC performs much better than the Kernel-BIC, as indicated by MISEs, and is much closer to 'ORACLE', regardless of the sample size.
Finally, we compare the computing efficiency of our method with the kernel method. Note that for the kernel method, a locally weighted regression is necessary at every observation in the dataset, which is computationally expensive, especially when the sample size n is large. In contrast, both the BST and the BPST method are much more computationally efficient as they only require solving a global optimization problem for any fixed α. The advantage enjoyed by the BST and the BPST is even more pronounced in the model selection problem because the estimation procedure has to be done in many iterations to achieve convergence.

Simulation study 2
In this example, we consider a more complex domain, and perform the data analysis based on a modified horseshoe domain proposed by Wood, Bravington, and Hedley (2008).
In this simulation study, the lattice points are obtained by divided the entire domain into N = 80 × 180 = 14,400 evenly. In this example, we set α 0 = 0.5. The first coefficient function β 1 (·) is the same as in Wood (2003), and the second coefficient function β 2 (u 1 , u 2 ) = 4 sin((π/2)u 1 u 2 ). The contour plots for true coefficient functions are shown in Figure 1. Similar as in Simulation Study 1, the linear coefficients are γ 01 = 1 and γ 02 = −1. The explanatory variables, X i 's and Z i 's, and random noises, 's, are generated independently from the standard normal distribution. We perform the study with sample size n = 500, and 1000.
To approximate the complex horseshoe domain, we consider the BPST method based on three triangulations in this study. The triangulations used in this example are shown in the first column of Figure 1. In the first triangulation, there are 89 triangles and 73 vertices; 2 contains 165 triangles with 120 vertices, and 3 has 238 triangles and 161 vertices. For the Kernel-BIC, as described in Simulation Study 1, the bandwidth used in model selection is adjusted by multiplying a constant c (c = 0.5, 1.0, 2.0, 4.0), then the model is refitted using bandwidth h cov .
For each simulation setting, we first calculate the average MSEs and MISEs for the BPST-ORACLE method across 200 replications when the true model structure is known; see Table 3. It also summarizes the MSEs and MISEs for the BPST-BIC and Kernel-BIC methods, which shows that the proposed estimates are far more accurate than the local smoothing estimates.
Next, we compare the performance of the proposed BPST-BIC method with the the Kernel-BIC method (Sun et al. 2014). Note that the true constant set is M c = {1, 2} and the true varying set is M v = {3, 4}. Table 4 lists the proportion of M c selected. From Table 4, one sees that the BPST-BIC far outperform the Kernel-BIC in identifying the correct model structure. Similar to what we find in Simulation Example 1, the Kernel-BIC is very sensitive to the choice of the bandwidth, and it fails to identify the correct model for some bandwidths. In contrast, the BPST-BIC can consistently identify the right model regardless of the choice of the triangulation.
The estimated functions under two methods for n = 1000 can be visualized in Figure 1. The performance of the Kernel-BIC method is impaired over the complex domain while the results of BPST and BPST-BIC method are stable. For the sake of space saving, Figures S.4 for case n = 500 are illustrated in the Supplementary Material.

Application
We apply the proposed method to Sydney real estate data analysis. In this study, we are interested in examining how some economics and social factors affect housing prices in Sydney. The dataset can be obtained from the R package HRW (Harezlak, Ruppert, and Wand 2018), and it contains 37,676 properties sold in the Sydney Statistical Division (an official geographical region including Sydney) in the calendar year of 2001. We focus on the winter quarter only to avoid the temporal issue, and there are 7291 properties. Based on the values of housing price, we classify the observations in the dataset into five different groups: (1) less than 250K, (2) 250-500K, (3) 500-750K, (4) 750-1000K, and (6) greater than 1000K. These groups are displayed in Figure 2(a).
The factors we consider including lot size (LS), average weekly income (Income), distance from house to main road in kilometers (DR), levels of particulate matter with a  Note: BST-ORACLE( j ): bivariate spline estimator based on triangulation j when the true model structure is known; BPST-ORACLE( j ): bivariate penalized spline estimator based on triangulation j when the true model structure is known; BST-BIC( j ): bivariate spline estimator based on triangulation j when the proposed BIC method is used for model selection; BPST-BIC( j ): bivariate penalized spline estimator based on triangulation j when the proposed BIC method is used for model selection; Kernel-BIC(ch cov , h cov ): the BIC local smoothing method (Sun et al. 2014) with bandwidth c * h cov for model selection and bandwidth h cov for model refitting after selection. diameter of under 10 micrometers level recorded at the air pollution monitoring station nearest to the house (PM 10 ) and distance from house to the nearest coastline location in kilometers (DC). We apply model (2) to fit the natural logarithm of the sale price in Australian dollars and with the explanatory variables mentioned above. We adopt Algorithm 3 to perform the model selection and the selected model is As shown in Figure 2(b), we use a triangulation with 197 triangle and 172 vertices for the bivariate spline smoothing. For comparison, we also conduct the Kernel-BIC, the linear regression model (LM), and the SAR-LM. Unfortunately, due to the size of the dataset and huge computational cost, the Kernel-BIC cannot be implemented using regular PCs. The estimated housing price using BPST-BIC, LM and SAR-LM are plotted in Figure 2(c-e). All of these plots indicate that the inland housing prices are obviously lower than the coastal regions. We observe that the LM significantly overestimate the inland house values while it underestimates the coastal housing prices. The SAR-LM method has a similar estimation issue when using the Euclidean distance. In contrast, SAR-PLVCM is able to produce much  ( 1 )   more accurate estimation results. Figure 3 summarizes the coefficient estimation results via the proposed method. As shown in Figure 3(a), compared with the northwest region, lot size has a higher positive effect on the housing price in the southeast region. Income is generally positively associated with the housing price as shown in Figure 3(b). The impact of distance to coastline on the housing price is negative especially for the houses located on the east coast.
To evaluate different methods, we adopt the 10-fold cross-validation to calculate the outof sample predication errors. The estimation and prediction results are listed in Table 5.  From this table, it is obvious that BPST-BIC provides the most accurate result among the three methods. A histogram and a scatter plot of the residuals of log-transformed housing prices via BPST-BIC are showed in Figure 4(a,b), respectively.

Conclusion and discussion
We built a semiparametric spatial model for for data distributed over complex domains accounting for spatial nonstationarity and spatial dependence through a spatially varying coefficient autoregressive model. We propose approximating the varying coefficient functions via bivariate splines over a triangulation. This type of approximations avoids the problem of 'leakage' across the complex domains. To facilitate the calculation of the weight matrix in the model, large spatial datasets, we developed two efficient algorithms to approximate the geodesic distance over a complex domain based on the triangulation technique. Our empirical results confirmed that the distance estimated using our method is really close to the geodesic distance. For the model selection, we proposed a BIC method which directly identifies predictors with constant and varying effects. Monte Carlo simulations showed that our model selection approach performs better than some of the existing methods in finite samples. A few more issues still merit further research. For instance, for the model selection procedure, one could also consider some regularization methods, such as group lasso. It is interesting to study how the regularization methods work for spatial regression models. It is also interesting to consider a unified approach to perform estimation, variable selection and model structure identification simultaneously for the SAR-PLVCMs when the number of predictors is large. In this paper we addressed only the spatial data. The wide availability of data observed over time and space due to widespread collection of network and inexpensive geographical information systems, has stimulated many studies in a variety of disciplines. It would be interesting to extend our approach to spatiotemporal problems.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Funding
Li Wang's research was partially supported by the National Science Foundation grants DMS-1542332 and DMS-1916204; Division of Mathematical Sciences.

A.2.2 Proof of Theorem 3.2
To show the proof of the asymptotic results, we first introduce some notations. For X B = (X 1 ⊗ B * (U 1 ), . . . , X n ⊗ B * (U n )) , let P X B = X B (X B X B ) −1 X B , and denote with probability approaching to one.