The package hgwrr is used to calibrate Hierarchical and Geographically Weighted Regression (HGWR) model on spatial data. It requires the spatial hierarchical structure in the data; i.e., samples are grouped by their locations. All the variables are either in the group level or sample level. For the group-level variables, they can have fixed effects (globally constant) or spatially weighted effects (varying with the location). For the sample-level variables, they can have fixed effects or random effects (varying among groups). We note the fixed effects as , the group-level spatially weighted (GLSW) effects as , and sample-level random (SLR) effects as . The HGWR model consists of these three kinds of effects and estimates the three kinds of effects considering the spatial heterogeneity.
library(hgwrr)
#> Loading required package: sf
#> Linking to GEOS 3.10.2, GDAL 3.4.1, PROJ 8.2.1; sf_use_s2() is TRUE
#> Loading required package: MASS
Usage
Model calibration
To calibrate a HGWR model, use the function hgwr()
.
hgwr(
formula, data, ..., bw = "CV",
kernel = c("gaussian", "bisquared"),
alpha = 0.01, eps_iter = 1e-6, eps_gradient = 1e-6,
max_iters = 1e6, max_retries = 1e6,
ml_type = c("D_Only", "D_Beta"), verbose = 0
)
The following is explanation of some important parameters.
formula
This parameter specifies the model form. Recall that the three kinds of effects are GLSW, fixed, and SLR effects. They are specified in different parts of the formula.
response ~ L(GLSW) + fixed + (SLR | group)
In the formula, L()
is used to mark some effects as GLSW
effects, and ( | group)
is used to set the SLR effects and
grouping indicator. Only group-level variables can have GLSW
effects.
data
sf
objects
From version 0.3-1, this parameter supports sf
objects.
In this case, no further arguments in ...
are required.
Here is an example.
data(wuhan.hp)
m_sf <- hgwr(
formula = Price ~ L(d.Water + d.Commercial) + BuildingArea + (Floor.High | group),
data = wuhan.hp,
bw = 299
)
data.frame
objects
If the data is a normal data.frame
object, an extra
argument coords
is required to specify the coordinates of
each group. Note that the row order of coords
needs to
match that of the group
variable. Here is an example.
bw
and kernel
Argument bw
is the bandwidth used to estimate GLSW
effects. It can be either of the following options:
- A integer value representing the number of nearest neighbours.
-
"CV"
letting the algorithm select one.
Argument kernel
is the kernel function used to estimate
GLSW effects. Currently, there are only two choices:
"gaussian"
and "bisquared"
.
Results
The output of returned object of hgwr()
shows the
estimates of the effects.
m_df
#> Hierarchical and geographically weighted regression model
#> =========================================================
#> Formula: y ~ L(g1 + g2) + x1 + (z1 | group)
#> Method: Back-fitting and Maximum likelihood
#> Data: mulsam.test$data
#>
#> Fixed Effects
#> -------------
#> Intercept x1
#> 4.056760 1.967648
#>
#> Group-level Spatially Weighted Effects
#> --------------------------------------
#> Bandwidth: 9.35816 (nearest neighbours)
#>
#> Coefficient estimates:
#> Coefficient Min 1st Quartile Median 3rd Quartile Max
#> Intercept -2.769060 -2.708289 -2.356463 -2.225995 -2.022646
#> g1 0.876505 1.253144 1.702822 1.939969 2.336628
#> g2 1.082775 1.279601 1.424307 1.607909 1.722892
#>
#> Sample-level Random Effects
#> ---------------------------
#> Groups Name Std.Dev. Corr
#> group Intercept 1.032962
#> z1 1.032962 0.000000
#> Residual 1.032962
#>
#> Other Information
#> -----------------
#> Number of Obs: 873
#> Groups: group , 25
And the summary()
method shows some diagnostic
information.
summary(m_df)
#> Hierarchical and geographically weighted regression model
#> =========================================================
#> Formula: y ~ L(g1 + g2) + x1 + (z1 | group)
#> Method: Back-fitting and Maximum likelihood
#> Data: mulsam.test$data
#>
#> Parameter Estimates
#> -------------------
#> Fixed effects:
#> Estimated Sd. Err t.val Pr(>|t|)
#> Intercept 4.056760 0.203079 19.976270 0.000000 ***
#> x1 1.967648 0.033827 58.168658 0.000000 ***
#>
#> Bandwidth: 9.35816 (nearest neighbours)
#>
#> GLSW effects:
#> Mean Est. Mean Sd. *** ** * .
#> Intercept -2.421973 0.251700 100.0% 0.0% 0.0% 0.0%
#> g1 1.641343 1.823056 0.0% 0.0% 0.0% 0.0%
#> g2 1.435709 1.506236 0.0% 0.0% 0.0% 0.0%
#>
#> SLR effects:
#> Groups Name Mean Std.Dev. Corr
#> group Intercept 0.000000 1.032962
#> z1 1.869552 1.032962 0.000000
#> Residual 0.088510 1.032962
#>
#>
#> Diagnostics
#> -----------
#> rsquared 0.905066
#> logLik NaN
#> AIC NaN
#>
#> Scaled Residuals
#> ----------------
#> Min 1Q Median 3Q Max
#> -3.408088 -0.576387 0.100854 0.734105 3.036324
#>
#> Other Information
#> -----------------
#> Number of Obs: 873
#> Groups: group , 25
The significance level of spatial heterogeneity in GLSW effects can be tested with the following codes.
summary(m_df, test_hetero = T)
#> Hierarchical and geographically weighted regression model
#> =========================================================
#> Formula: y ~ L(g1 + g2) + x1 + (z1 | group)
#> Method: Back-fitting and Maximum likelihood
#> Data: mulsam.test$data
#>
#> Parameter Estimates
#> -------------------
#> Fixed effects:
#> Estimated Sd. Err t.val Pr(>|t|)
#> Intercept 4.056760 0.203079 19.976270 0.000000 ***
#> x1 1.967648 0.033827 58.168658 0.000000 ***
#>
#> Bandwidth: 9.35816 (nearest neighbours)
#>
#> GLSW effects:
#> Mean Est. Mean Sd. *** ** * .
#> Intercept -2.421973 0.251700 100.0% 0.0% 0.0% 0.0%
#> g1 1.641343 1.823056 0.0% 0.0% 0.0% 0.0%
#> g2 1.435709 1.506236 0.0% 0.0% 0.0% 0.0%
#>
#> SLR effects:
#> Groups Name Mean Std.Dev. Corr
#> group Intercept 0.000000 1.032962
#> z1 1.869552 1.032962 0.000000
#> Residual 0.088510 1.032962
#>
#>
#> Diagnostics
#> -----------
#> rsquared 0.905066
#> logLik NaN
#> AIC NaN
#>
#> Scaled Residuals
#> ----------------
#> Min 1Q Median 3Q Max
#> -3.408088 -0.576387 0.100854 0.734105 3.036324
#>
#> Other Information
#> -----------------
#> Number of Obs: 873
#> Groups: group , 25
Some other methods are provided.
head(coef(m_df))
#> Intercept g1 g2 x1 z1
#> 1 0.9143066 2.336628 1.633698 1.967648 1.817139
#> 2 1.1269566 1.932128 1.626517 1.967648 2.305685
#> 3 1.8867179 2.027690 1.659433 1.967648 2.251592
#> 4 1.1245250 2.265663 1.536906 1.967648 1.591036
#> 5 1.7726751 2.219179 1.607909 1.967648 1.698600
#> 6 0.7008420 2.082628 1.421329 1.967648 1.855599
head(fitted(m_df))
#> [1] 3.659871 4.317510 6.929765 1.768491 0.511762 -2.023591
head(residuals(m_df))
#> [1] -0.5654830 -0.7380541 0.9197850 0.5707894 -0.3850239 -0.1648946
Further reading
Model comparison
- Article HGWR Model and How to Use It in this site compares HGWR with GWR and HLM on the simulation data.
- This short paper compares HGWR, GWR, MGWR, and HLM with a simulation data. All the codes are shown in this site.
Mathematical basis
The following papers shows more details about the mathematical basis about the HGWR model.
- Yigong Hu, Richard Harris, Richard Timmerman, and Binbin Lu. A Hierarchical and Geographically Weighted Regression Model and Its Backfitting Maximum Likelihood Estimator (Short Paper). In 12th International Conference on Geographic Information Science (GIScience 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 277, pp. 39:1-39:6, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023) DOI
- Hu, Yigong, Lu, Binbin, Ge, Yong, Dong, Guanpeng, 2022. Uncovering spatial heterogeneity in real estate prices via combined hierarchical linear model and geographically weighted regression. Environment and Planning B: Urban Analytics and City Science. DOI