Estimation and Inference for Generalized Geoadditive Models

Abstract In many application areas, data are collected on a count or binary response with spatial covariate information. In this article, we introduce a new class of generalized geoadditive models (GGAMs) for spatial data distributed over complex domains. Through a link function, the proposed GGAM assumes that the mean of the discrete response variable depends on additive univariate functions of explanatory variables and a bivariate function to adjust for the spatial effect. We propose a two-stage approach for estimating and making inferences of the components in the GGAM. In the first stage, the univariate components and the geographical component in the model are approximated via univariate polynomial splines and bivariate penalized splines over triangulation, respectively. In the second stage, local polynomial smoothing is applied to the cleaned univariate data to average out the variation of the first-stage estimators. We investigate the consistency of the proposed estimators and the asymptotic normality of the univariate components. We also establish the simultaneous confidence band for each of the univariate components. The performance of the proposed method is evaluated by two simulation studies. We apply the proposed method to analyze the crash counts data in the Tampa-St. Petersburg urbanized area in Florida. Supplementary materials for this article are available online.


Introduction
Regression methods are commonly used in statistics to examine associations between the response and the set of explanatory variables. When the relationship between the variables is complex and cannot be easily modeled by specific linear or nonlinear functions, a generalized additive model (GAM) provides a flexible regression relationship. GAMs were first introduced by Hastie andTibshirani (1987, 1990), which assume that the mean of the discrete response variable depends on an additive set of predictors through a link function. Since then GAMs have been widely used in many areas. The focus of this work is on GAM for spatial data randomly distributed over a particular geographical region.
To motivate the study, consider the analysis of road traffic crashes, which have been one of the major sources of fatalities and injuries in the United States. Count data models are often used to identify factors significant to crash frequencies, where the influence of demographical and socioeconomic conditions on the crash occurrence and spatial patterns of crashes can assist decision-makers in implementing appropriate road safety management actions (Miaou, Song, and Mallick 2003;Sharma 2017, 2018). Figure 1(a) shows the Tampa-St. Petersburg urbanized area in Florida, which is consisted of 1761 census block groups. A census block group is a geographical unit with homogeneous population characteristics, economic status, and living conditions for the population survey. Rural areas were excluded from this study, as their population densities often drop to nearly zero, thus, it is not meaningful to analyze their crash data. The frequency of crashes off the state highway system within each census block group in the year of 2014 were collected from the Florida Department of Transportation; see Figure 1(b). In addition, explanatory variables, including vehicle miles traveled, demographic and commuting variables, were collected from the Florida Department of Transportation and U.S. Census Bureau, and many of them exhibit pronounced nonlinear relationships with the response variable.
To analyze such kind of data, researchers face at least three main challenges. First of all, traditional regression methods often assume the data are distributed on a regular sampling grid over a rectangular domain, however, in this study and many other similar cases in spatial studies, observations are dense at some locations while sparse at others, and the shape of the domains is complicated or even shows gaps and holes (see Ramsay 2002;Wood, Bravington, and Hedley 2008). Conventional smoothing tools suffer from the problem of "leakage" across the complex domains, which refers to the poor estimation over difficult regions by smoothing inappropriately across boundary features, such as peninsulas. The second challenge is how to adjust for the spatial effect and provide measures of the nonlinear effect of covariates and assess the impact of uncertainty. In general, inference for GAM for spatial data is underdeveloped and often made based on ad hoc methods with little understanding of the statistical properties. Last but not least, the computational issue usually is a big challenge for spatial data analysis, as the sample size tends to be large due to the development of remote sensing technology and automated sensor networks. Thus, there is a great need of methodology that is practical, computationally efficient and theoretically reliable.
To address the above-mentioned challenges, in this article we introduce a class of generalized geoadditive models (GGAMs), a synthesis of geostatistics and GAMs, for spatial data randomly distributed over irregular domains. We aim at developing the corresponding estimation and inference procedure for the GGAMs. In the literature, there are two main streams of spatial regression modeling approaches to include spatial information into the model. The first approach adds spatial weights or spatial correlation into a regression modeling, for example, the spatial autoregressive (SAR) model or conditional autoregressive (CAR) model (Lee 2004;Zhu, Huang, and Reyes 2010). The second approach is based on some smoothing techniques, for example, kernel, wavelet, or spline smoothing, Ramsay (2002) use a deterministic smooth surface function to describe the variations and connections among values at different locations. Kammann and Wand (2003) combine the kriging method with the penalized spline regression, and suggest a mixed model representation for model fitting. In our paper, we take the second approach, where the effect of explanatory variables are modeled with additive univariate functions and the spatial effect is modeled via a bivariate function.
In the proposed GGAM, when the spatial component is ignored, the model becomes the traditional GAM since all the components left in the model are univariate functions. There have been a number of proposals for fitting the GAMs, for instance, the spline method of Stone (1986), Xue and Liang (2010), and Wang et al. (2011); the kernel method of Yu, Park, and Mammen (2008), Yang, Sperlich, and Härdle (2003), and Liang et al. (2008); and the two-stage methods of Horowitz and Mammen (2004), Wang and Yang (2007), and Liu, Yang, and Härdle (2013). For estimating bivariate functions defined over rectangular domains, there are several popular smoothing tools: kernel, wavelet, bivariate P-splines (Marx and Eilers 2005) and thin plate splines (Wood 2003). In the past three decades, there has been a great interest in developing the smoothing techniques which can handle irregular domains with complex boundaries (see, e.g., Ramsay 2002;Wang and Ranalli 2007;Wood, Bravington, and Hedley 2008;Sangalli, Ramsay, and Ramsay 2013;Lai and Wang 2013;Miller and Wood 2014;. In this paper, we approximate the bivariate function of the spatial effect using the bivariate splines (smooth piecewise polynomial functions over a triangulation of the domain of interest) over triangulations in Lai and Schumaker (2007). We prefer the bivariate penalized splines (BPS) due to their (i) convenient representations with flexible degrees and various smoothness, (ii) computational efficiency, and (iii) great ability of handling the sparse designs.
Although the above-mentioned spline smoothing seems to be incredibly useful, it provides only convergence rates of the estimators but no asymptotic distributions unless we assume the errors are iid Gaussian, so it is difficult to assign any measure of confidence to the estimators. The simultaneous confidence bands (SCBs) is a powerful tool to evaluate and visualize the variability of the estimators and to make global inferences (see Wang and Yang 2009;Krivobokova, Kneib, and Claeskens 2010;McKeague and Zhao 2006;Wiesenfarth et al. 2012;Zheng et al. 2016 for some related theory and applications of SCBs). To develop the SCBs for each individual component function of the explanatory variables in GGAMs, we propose a one-step spline backfitted local polynomial estimator, referred to as the SBL estimator. In the first stage, we use spline smoothers to approximate the univariate additive components and the geographical component. In the second stage, local polynomial smoothing is then applied to the cleaned univariate data to average out the variation of the first-stage estimators and obtain SCBs.
Under some regularity conditions, we obtain the asymptotically normal distribution of the SBL estimators and establish the SCBs for the functions of the covariates. Our approach merges the advantages of three smoothing methods (polynomial spline, bivariate spline, and local polynomial) that balance each other out. It enables the spline smoothing to act as an efficient pilot, quickly guiding the fitting toward good solutions. In addition, it allows us to keep the asymptotically normal distribution of the local polynomial estimator, without its computational burden. Furthermore, it properly accounts for all covariate information and spatially improves the estimation accuracy. The entire fitting and inference procedure can be implemented easily and efficiently using standard methodology and software.
The rest of the article is structured as follows. In Section 2, we describe our model, then we give a brief review of univariate splines and bivariate splines over triangulations and introduce the penalized quasi-likelihood estimation method. Section 3 provides the SCBs for the functions of the covariates. Section 4 introduces how to implement the proposed procedure in practice. In Section 5, we conduct simulation studies to evaluate the finite sample performance of the proposed method. Section 6 illustrates our method using a real dataset. Some concluding remarks are given in Section 7. The regularity assumptions are deferred in Appendix A. Proofs and other technical details are given in the online supplementary materials.

Model Setup
In the following, let S i = (S i1 , S i2 ) be the location of ith point, i = 1, . . . , n, which ranges over a bounded domain ⊆ R 2 of arbitrary shape, for example, a domain with polygon boundary. Let Y i be the response variable and X i = (X i1 , . . . , X ip ) be the explanatory variables at location S i .
We assume that the conditional density of Y given (X, S) = (x, s) belongs to the exponential family f Y|X,S y x, s = exp yξ (x, s) − B {ξ (x, s)} + C y , for known functions B and C, where ξ is the so-called natural parameter, and is related to the unknown mean response by μ (x, s) = E (Y|X = x, S = s) = B {ξ (x, s)} . In this article, μ (x, s) is modeled via a link function g in the following additive form where β = (β 1 , . . . , β p ) are unknown univariate smooth functions, and α(·) is an unknown bivariate smooth function.
If var (Y|X = x, S = s) = σ 2 V {μ (x, s)} for some known positive function V, then estimation of the mean can be achieved by replacing the conditional log-likelihood function log{f Y|X,S y x, s } with a quasi-likelihood function ϑ, y , which satisfies ∇ ϑ ϑ, y = y−ϑ σ 2 V(ϑ) . Our estimation method is based on a nonparametric quasi-likelihood approach as described below.

Spline Approximation
In the first stage, we approximate each of the univariate additive components in the model via univariate polynomial splines. The geographical component is approximated using bivariate penalized splines over triangulation, which is proved to be efficient to deal with data distributed on irregular domains with complicated boundaries. Below we start with a brief review of univariate splines and bivariate splines.

Bivariate Spline Over Triangulation
In this article, the spatial domain is a polygon of arbitrary shape, which can be partitioned into finitely many triangles. According to Lai and Schumaker (2007), a collection = {τ 1 , . . . , τ N } of N triangles is called a triangulation of = ∪ N i=1 τ i provided that any nonempty intersection between a pair of triangles in is either a shared vertex or a shared edge. Given a triangle τ ∈ , let R τ be the radius of the largest disk contained in τ , and let |τ | be the length of the longest edge. Define the shape parameter of τ as the ratio π τ = |τ |/R τ . Note that when π τ is small, the triangles are relatively uniform in the sense that all angles of triangles in the triangulation are relatively the same. Denote the size of by | | := max{|τ |, τ ∈ }.
For a triangle with nonzero area τ ∈ and any fixed point v ∈ R 2 , let b 1 , b 2 , and b 3 be the barycentric coordinates of v relative to τ . The Bernstein basis polynomials of degree d ≥ 1 relative to triangle τ is defined as For any integer d ≥ 1 and triangle τ , let P d (τ ) be the space of all polynomials of degree less than or equal to d on τ . Then, any polynomial ζ ∈ P d (τ ) can be written as For any integer r ≥ 0, let C r ( ) be the collection of all rth continuously differentiable functions over . Given a triangulation , we define the spline space of degree d and smoothness r over as S r Let {B m } m∈M be the set of bivariate Bernstein basis polynomials for S r d ( ), where M is an index set of |M| = N(d + 1)(d + 2)/2 basis functions. Then we can represent any function ζ ∈ S r d ( ) using the following basis expansion where γ = (γ m , m ∈ M) is the spline coefficient vector. For smooth join between two polynomials on adjoining triangles, we need to impose some linear constraints on the spline coefficients γ in (2). To be more specific, we assume that γ satisfies γ = 0, where is the matrix that collects the smoothness conditions across all the shared edges of triangles. An example of is given in Section B.2.1 in the supplementary materials. The above bivariate spline basis can be easily constructed via the R package BPST (Wang et al. 2019).

Penalized Quasi-Likelihood Estimators
In data-sparse regions, penalized splines provide a more convenient tool for data fitting than the unpenalized splines. As discussed in Lai and Wang (2013), the number of triangles for bivariate penalized smoothing is not essential when the number is above some minimum depending upon the degree of the splines. Another advantage is that when we have regions of sparse data, BPS can be considered as a direct ridge regression or shrinkage type method which can alleviate multicollinearity issue. Thus, to reduce complexity in triangulation selection and enhance the performance of bivariate splines in data fitting, we exploit the BPS in the quasi-likelihood of the model below.
To define the penalized spline method, for any direction s j , To fit the GGAM, we seek a bivariate function α(·) and univariate functions β k (·), k = 1, . . . , p, that maximize the following penalized quasi-likelihood function Directly solving this penalized maximization is challenging since it involves unstructured nonparametric functions which are subject to the "curse of dimensionality. " To overcome this difficulty, we consider some smoothing method. Under some suitable smoothness conditions, β k 's and α can be well approximated by the univariate spline basis functions and the Bernstein basis polynomials introduced in Sections 2.2.1 and 2.2.2, respectively. Let u kj (x k ), j ∈ J be the original B-spline basis functions for the kth covariate, where J is the index set of the basis Suppose the nonlinear component can be well approximated by a spline function so that, for all be the collection of all covariates. For i = 1, . . . , n, let Then maximizing the penalized quasi-likelihood function in (4) is equivalent to minimizing where P is the block diagonal penalty matrix satisfying that γ Pγ = E(Bγ ). See Section B.2.2 in the supplementary materials for the formula used to construct P. It is worth noting that according to Assumption (A3), the optimization problem in (5) has a unique solution.
To solve the constrained minimization problem (5), we first remove the constraint via the following QR decomposition: where Q is an orthogonal matrix and R 1 is an upper triangle matrix, the submatrix Q 1 is the first r (r is the rank of matrix ) columns of Q, and R 2 is a matrix of zeros. We reparametrize using γ = Q 2 γ * for some γ * , then it is guaranteed that γ = 0. The problem (5) is now converted to a conventional penalized minimization problem without any restriction For practitioners, the construction of P and Q 2 can be carried out via the R package BPST . Let θ and γ * be the minimizer of (6), that is, ( θ , γ * ) = arg min L P (θ , γ * ), then, the univariate spline estimator of β k (x k ) and bivariate spline estimator of α(s) are We first establish the L 2 convergence rate of the spline estimators β k (x k ) and α(s). For any Lebesgue measurable function The proof is provided in the supplementary materials.

Two-Stage Estimator and Simultaneous Confidence Band
It is very difficult to derive the asymptotic distribution for the spline estimators introduced in Section 2, so no measures of confidence can be assigned to these estimators. To represent the uncertainty in the estimate of the nonlinear effect of the covariates, we propose the SCB for β k 's via the backfitting method using spline estimators obtained in Section 2 as pilot followed by local polynomial estimators. The basic idea is that for every k = 1, . . . , p, we estimate the kth additive function β k (·) in model (1) nonparametrically by assuming that other nonparametric components {β k (·) : k = k} are known. The problem turns into a univariate function estimation problem. Denote K a continuous kernel function, Asymptotic properties of smoothers of β o k (x k ) can be easily established based on these assumptions. We say x ∈ χ k = [a k , b k ] is a boundary point if and only if x = a k + ch k or x = b k − ch k for some 0 ≤ c < 1 and an interior point otherwise. Let χ h k be the interior of the support χ k . Let ν 2 = u 2 K (u) du and f k be the probability density function of X k .
, and for any α ∈ (0, 1), denote the quantile Theorem 2 shows the pointwise and uniform asymptotically normality of the "oracle" estimator β o k (x k ).
Theorem 2. Suppose Assumptions (A1)-(A4) and (A7) in Appendix A hold. Then, for any k = 1, . . . , p, and if Assumptions (A1)-(A4), (A7), and (A8)(ii) are satisfied, then lim n→∞ P sup Since the true functions β k 's and α are unknown, we replace those with the pilot estimators β k and α obtained in Section 2 and apply the above backfitting idea to obtain the splinebackfitted local polynomial (SBL) estimator Theorem 3. Under Assumptions (A1)-(A5), (A6 ), (A7), and (A8)(i) in Appendix A, the SBL estimator satisfies The proofs of Theorems 2 and 3 are given in the supplementary materials. Theorem 3 implies that the difference between the SBL estimator and the "oracle" estimator is negligible compared to the difference between the "oracle" estimator β o k (x k ) and the true function β k (x k ), that is, the SBL estimator β SBL k (x k ) is oracle-efficient. Consequently, the SBL estimator inherits the same asymptotic normality (both pointwise and uniform asymptotically normality) from the "oracle" estimator, as stated in Corollary 1 below, which follows directly from Theorems 2 and 3, thus, the proof is omitted.

Knots Selection
For univariate spline smoothing, we suggest placing knots on a grid of evenly spaced sample quantiles. Assumption (A6) in Appendix A suggests that the number of knots J n needs to satisfy: n 1/(2 +2) (log n) −1/(2 +2) J n n 2/5 , where ≥ 1 is the degree of the polynomial spline basis functions. The widely used quadratic/cubic splines both satisfy this condition. In practice we suggest taking the following rule-of-thumb number of interior knots: J n = min{ c 1 n 1/(2 +2) , n/(4p) } + 1, where c 1 is a tuning parameter (typically, c 1 ∈ [1, 5]), and the term n/(4p) is to guarantee that there are at least four observations in each subinterval between two adjacent knots to avoid getting (near) singular design matrices in smoothing.

Triangulation Selection
An optimal triangulation is a partition of the domain which is best according to some criterion that measures the shape, size, or number of triangles. For example, a "good" triangulation usually refers to those with well-shaped triangles, no small angles or/and no obtuse angles. Other criteria include the density control (adaptivity) and optimal size (number of triangles), etc. For a fixed number of triangles, Lai and Schumaker (2007) and Lindgren, Rue, and Lindström (2011) recommend selecting the triangulation according to "max-min" criterion which maximizes the minimum angle of all the angles of the triangles in the triangulation. Monte Carlo experiments in our simulation studies show that there must be enough triangles to capture the features of the surface, but once this minimum necessary number of triangles has been reached, further increasing the number of triangles usually has little effect on the fitting process. Therefore, one needs to make certain that the triangulation is sufficiently fine to capture the feature in the dataset and not so large that computational burden is unnecessarily heavy. In practice, if the boundary of the spatial domain is not too complicated, we suggest taking the number of triangles as the following: N n = min{ c 2 n 1/(d+1) , n/4} + 1, for some tuning parameter c 2 (typically, c 2 ∈ [1, 20]). However, when the spatial domain does look complicated, N n can be set (much) larger than n so that the domain can be very well approximated by the triangulation, and the penalty automatically and gracefully handles the situation. Once N n is chosen, one can build the triangulated meshes using typical triangulation construction methods such as Delaunay triangulation.

Roughness Penalty Selection
The roughness penalty parameter λ can be selected using datadriven approaches, for example, the generalized cross-validation (GCV; Craven and Wahba 1979;Wahba 1990) is such a criterion and is widely used for choosing the penalty parameter. Let ρ 2 be the diagonal matrix with elements ρ 2 ( μ i ), and Denote the smoothing or hat matrix as S(λ) = (U BQ 2 )

Second Stage
The performance of the SBL estimator is also dependent upon the bandwidth selection and other estimators of the parameters.

Bandwidth Selection
Note that Assumption (A8) in Appendix A requires that the bandwidths in the backfitting are of order around n −1/5 . Thus, the bandwidth selection can be done using a standard routine in the literature, see discussions in Fan, Heckman, and Wand (1995) and Fan and Gijbels (1996). By minimizing the asymptotic mean integrated squared errors (AMISE), we can obtain the optimal bandwidth The optimal bandwidth could be approximated by where f k is estimated by kernel density estimation, β k is obtained from the spline estimator β k and ρ 2,k (X ik ) is an estimator of ρ 2,k (X ik ).
x k ) with a rule-of-the-thumb bandwidth h. To construct pointwise confidence intervals or SCBs of the univariate functions, we suggest taking h k = h k,AMISE (log n) −1/4 to satisfy the requirement of Assumption (A8).

Estimating σ n,k (x k )
For any k = 1, . . . , p, we obtain the estimator of σ n,k (x k ) by

Simulation
In this section, we conduct two simulation studies to evaluate the finite sample performance of the proposed method.

Example 1
We consider a modified horseshoe domain in Sangalli, Ramsay, and Ramsay (2013) and randomly sample n = 1000 and 2000 locations on the domain. The response variable Y i is generated from Poisson distribution and negative binomial distribution with log link functions as described in the following two cases.
To implement the proposed procedure, one needs to select the knots for a univariate spline, the triangulation for a bivariate spline, the bandwidth for the local linear method, as well as the smoothness penalty parameter. We have conducted extensive simulations to examine whether and how sensitive the choice of knots, triangulation, and bandwidth on the performance of the proposed method. For all the univariate splines, we use cubic B-splines with the number of interior knots J n = 2, 4, 6, 8 equally spaced on the sample quantiles. For the bivariate spline smoothing, we consider d = 2, r = 1 with three different triangulations shown in Figure 3(a)-(c). There are 109 triangles (95 vertices), 163 triangles (124 vertices), and 237 triangles (165 vertices) in 1 , 2 , and 3 , respectively.
We compare our method with the thin plate spline method (TPS) and the soap film smoother (SOAP), which are commonly used for fitting GAMs. The TPS and the SOAP estimators are obtained from the mgcv package in R (Wood 2017). For all three methods, GCV is used to choose the values of the penalty parameter. In Case II, the estimator of θ is chosen to ensure that the Pearson estimate of the scale parameter is as close as possible to 1.
To evaluate the accuracy of the estimators, we calculate the mean integrated squared error (MISE) for each of the components based on 1000 Monte Carlo samples. Also, to illustrate the prediction capability, we conduct the 10-fold cross-validation for each Monte Carlo sample and compare the cross-validated mean squared prediction error (MSPE). Table 1 presents the MISE of β 1 , β 2 , β 3 , α and the 10-fold cross-validated MSPE, where the SBL results are based on using four interior knots for univariate splines and 1 for bivariate splines. Table 1 shows that the performance of our method and the TPS method. All three methods are similar in terms of the estimation of β k (·)'s, however, our method significantly outperforms the TPS and SOAP when estimating the bivariate function α(·), which results in a big improvement of the cross-validated MSPE.    Figure 4 depicts the true univariate function β k (dotted curve). It also shows the corresponding estimator β SBL k (solid curve) and the 95% SCB for β k (gray bands) from a typical run generated from Case I or Case II with sample size n = 2000, where the estimation and SCB construction are based on four interior quantile knots for the univariate splines and triangulation 1 for the bivariate splines. Figure 5(b)-(g) shows the contour maps of the estimated bivariate functions α over a grid of 500 × 200 points using our method, TPS and SOAP.
We evaluate the coverage of the proposed SCBs over 20 equally spaced points on [0, 1] and test whether the true functions are covered by the SCBs at these points. Table 2 summarizes the empirical coverage rate of the 95% SCBs from 1000 Monte Carlo experiments. The results clearly show a very good coverage rate of the SCBs. Figures B.2 and B.3 in the supplementary materials present the MISEs of the estimators based on different combinations of knots and triangulations for Case I and Case II. For the univariate components, one sees that the MISEs are very similar regardless to the choice of knots and triangulations. For the bivariate function α, we have found in our simulation studies that there is a minimum adequate value of the number of triangles in the fitting. Fits using fewer than this minimum number of triangles have low statistical accuracy. We have also found that, when this minimum number of triangles is reached, further refining the triangulation will have little effect on the fitting process, but make the computational burden unnecessarily heavy.
Finally, our proposed method is very user-friendly and computationally efficient. Take the case of fitting GGAMs with the Poisson distribution as an example. Remarkably, it takes only 3.1 sec to estimate all the components in the GGAM with 2000 observations on a standard PC with processor Core i5 @2.7GHz CPU and 8.00GB RAM. This is extremely fast considering that the entire nonparametric regression is done without WARPing.

Example 2
We conduct another simulation study using the covariates and domain of the crash data analyzed in Section 6. We consider the following negative binomial setting: = 1, . . . , 1761, and X ik , k = 1, . . . , 12, are the same covariates as in the crash dataset described in Section 6. The significant univariate functions and bivariate function are set to be the same as the estimates obtained in the crash data analysis, and insignificant univariate functions are set as zero.
The average MISE of each functional component from 1000 Monte Carlo experiments is reported in Table 3. From Table 3, one sees that the MISE are all very small which indicates our method performs very well. We also examine the behavior of the proposed SCBs for the univariate functions. The "coverage" rows in Table 3

Domain of Interest
Traffic crashes have been one of the major sources of fatalities and injuries in the United States. Crash frequency analysis is critical for developing and implementing effective safety  Figure 1(a) and (b). A census block group is a geographical unit designed by the U.S. Census Bureau to be "relatively homogeneous units with respect to population characteristics, economic status, and living conditions" (Song et al. 2006). Rural areas were excluded from this study as their population densities are often too small to have meaningful crash frequency data. The Tampa Bay and Wilderness Preserve in the north-east are also excluded since they have no traffic. Crash frequency data occurred on a roadway owned or operated by a nonstate entity for each census tract in the year of 2014 were collected from the Florida Department of Transportation.
In addition, 12 covariates involving in demography, economy, and transportation characteristics were collected from the Florida Department of Transportation and U.S. Census Bureau, respectively. See Table 4 for details. Note that the covariates with " * " are transformed from the original value by f (x) = log(x + 0.01). For example, VMT * = log(VMT + 0.01). The whole dataset contains 1761 observations with 12 covariates.

Analysis and Findings
We analyze the crash frequency data using the following GGAM: where β j (·)'s are unknown univariate functions, α(·) is an unknown bivariate function used to control for the effect of the spatial structure of the observations. For both TPS and our method, the roughness parameter is selected by the GCV. Unlike other traditional methods, the BPS smoother does not smooth over the Tampa Bay area and the Wilderness Preserve, which makes more sense as there are no traffic across these uninhabited areas and our estimate does not link data points on either side of these areas. Figure 6(a) shows the triangulation adopted by our method, which contains 1869 triangles with 1098 vertices.  the model using our method. It identifies relatively high crash frequency in the cities, such as Brandon, St. Petersburg, New Port Richey, and the college town area. Next we examine the effect of the predictors and test the following hypothesis of the individual functions H 0 : β k (·) = 0, k = 1, . . . , 12. In Figure 7, the central black line shows the SBL fit. The gray bands represent the 95% SCBs constructed according to (7). The corresponding p-values of the tests are given in Table 4, which are calculated as the biggest value of α such that the 100(1 − α) % SCB covers the zero horizontal line.
Clearly, the null hypothesis that β 1 (VMT * ) = 0, β 2 (Population * ) = 0, β 3 (Rmale * ) = 0, β 4 (Rhispanic) = 0, β 5 (Rold * ) = 0 and β 6 (Runemployed) = 0 are rejected at significance level 0.05, since they are not totally covered by the 95% SCBs. For VMT and population, the fitted spline curves present a strong deviation from strict linearity. VMT and population are often considered to have a linear relationship with crash frequency in most studies, however, recently a few studies have shown that these relationships might be heterogeneous or nonlinear (Mannering, Shankar, and Bhat  Figure 7, one sees clearly the evidence of threshold effects of the logarithm of VMT and population on crash frequencies. For example, there is a constant effect for VMT (population) below −1 and above 10, while a linear effect of some significance exists in between. Our findings are reasonable: when VMT or population are either very small or very big, crash frequency should be constant, whereas in-between values, crash frequency should increase with VMT or population. The proportion of unemployment may have mixed effects on crash frequency (Leigh and Waldon 1991), as unemployment increase may lead to less driving, but also more aggressive driving due to the mental stress of job loss or fear of job loss. The net effects are found to be negative in Michigan (Wagenaar 1983) and Iowa (Liu and Sharma 2018), but positive in the Tamp Bay area, which may be attributed to different travel modes and cultures. The proportion of males shows significantly positive. It might be attributed to that males travel more than females for business and work (Collins and Tisdell 2002), and they are also more likely to drive aggressively than female (Shinar and Compton 2004). The proportion of Hispanics is positively significant, which implies that Hispanics tend to have more crashes. Several studies have found that Hispanic drivers have higher rates of safety belt nonuse, speeding, invalid licensure, and alcohol involvement in Colorado (Harper et al. 2000), and Hispanics have more pedestrian crashes in New York City (Ukkusuri, Hasan, and Aziz 2011) and Los Angeles (Loukaitou-Sideris, Liggett, and Sung 2007) as they walk more due to low income. Thus, we believe the same situations may also exist in the study area. The proportion of old people is significant for crash frequency in our study, that is, the region with a higher proportion of old drivers lead to fewer crashes. One interpretation is that old people tend to travel less, which could reduce the traffic crashes within a specific area.
Income is statistically insignificant, which is consistent with another study (Liu and Sharma 2018), where they found travel expenses may only occupy small proportions of incomes for most census tracts, thus, whether the income is lower or higher does not impact people's travel decisions. The health insurance coverage rate is also insignificant. In terms of commuting characteristics, travel time is insignificant at significance level 0.05 and the proportions of travel modes other than driving are also insignificant.
To compare the SBL and the TPS, we adopt two criteria. We use mean squared error (MSE) to measure the model fit and the MSPE based on 10-fold cv to measure the predictive performance. Table 5 provides the MSE and the 10-fold cv MSPE of SBL and TPS. Both of them are in favor of the SBL, which not only gives the best model fit, but also provides the most accurate out-of-sample prediction results.
Finally, we conduct the spatial autoregression test based on the deviance residuals. The deviance residuals are calculated as  Figure 8 shows the histogram of these deviance residuals of our method. Then we use the Moran's I to test the spatial autoregression. The Moran's I statistic is calculated as where w ij is the spatial weights and we use the basic binary coding. If ith observation and jth observation are neighborhood, then w ij = 1, otherwise, w ij = 0. The test statistic is 0.0098, and the p-value for the Moran's I test is 0.45. Thus, we cannot reject the null hypothesis. It is quite possible that the spatial distribution of feature values is the result of independent random spatial processes.

Concluding Remarks
In this paper, we couple the ideas of geostatistics with additive modeling under the framework of penalized quasi-likelihood. We have developed a two-stage procedure accompanied by efficient computational algorithms to carry out estimation and statistical inference for GGAMs. Our approach is balanced in terms of theory, computation, and interpretation. It greatly enhances the application of GAMs to spatial data analysis. We do not require the data to be evenly distributed or on a regular-spaced grid. Our estimation is computationally fast and efficient since our first stage estimation can be formulated as a penalized regression problem. The proposed SCBs provide measures of the effect of covariates after adjusting for the spatial effect, thus, the users can gain valuable insights into the accuracy of their estimation of the GGAM. This study can be further extended to study the following aspects. First, we have only analyzed the crashes off the state highway system here, whereas crashes on the state highway system should also be analyzed in future studies, thus, they could construct the full picture of traffic safety. When both crash types are considered, the multivariate models may be considered (Zhao et al. 2017;Ma and Kockelman 2006). Second, it may be also interesting to explore the long-term crash trends with multiple years' data, where spatiotemporal correlations should be considered in modeling (Miaou, Song, and Mallick 2003;Sharma 2017, 2018;Boulieri et al. 2017).