Information Consistency-Based Measures for Spatial Stratified Heterogeneity
Wenbo Lv
2024-12-17
Source:vignettes/sshicm.Rmd
sshicm.Rmd
1. Introduction to sshicm
package
1.1 The sshicm
package can be used to address following
issues:
Information consistency-based measures of spatial stratified heterogeneity intensity for continuous and nominal variables.
Strength of spatial pattern associations based on information consistency measures.
1.2 Example data in the sshicm
package
baltim data
“baltim” consists of Baltimore home sale prices and hedonics. In total, there are 221 instances in “baltim” data. The explanatory variables are whether it is a detached unit (DWELL), whether it has a patio (PATIO), whether it has a fireplace (FIREPL), whether it has air conditioning (AC), and whether the dwelling is in Baltimore County (CITCOU, while the target variable is the sale price of the home (PRICE).
cinc data
“cinc” is derived from the 2008 Cincinnati Crime + Socio-Demographics dataset. It includes spatial data on 457 objects located on an irregular lattice. The explanatory variables are male population (MALE), female population (FEMALE), median age (MEDIAN_AGE), average family size (AVG_FAMSIZ), and population density (DENSITY), while the target variable is the existence of theft (THEFT_D).
1.3 Functions in the sshicm
package
Regression-style data frame modeling function
A function sshicm()
that yields all results in a single
line, with the type
parameter set to IC
(Continuous) or IN
(Nominal) to specify whether the
dependent variable is a continuous or nominal variable.
2. The principle of measuring spatial stratified heterogeneity based on information consistency
Note: All explanatory variables must be discretized in advance or inherently be discrete nominal variables.
2.1 When the dependent variable is a continuous variable:
\[ I_{C}\left(d,s\right) = \sum_{s_{i} \in S}p\left(s_{i}\right)\frac{ \arctan \left(\textbf{RelE} \left( f_{d_{i}} \mid \mid f \right) \right)}{\pi / 2} \]
where \(d_i\) is the random variable corresponding to the target variable in stratum \(s_i\) , and \(f_{d_i}\) and \(f\) are the density functions of \(d_i\) and \(d\), respectively. Additionally, \(\textbf{RelE} \left( f_{d_{i}} \mid \mid f \right)\) is the relative entropy of \(f_{d_i}\) and \(f\).
\[ \textbf{RelE} \left( f_{d_{i}} \mid \mid f \right) = H \left(f_{d_{i}} , f\right) - H \left(f_{d_{i}}\right) = \sum_{i = 1}^{n} f_{d_{i}} \log \frac{1}{f} - \sum_{i = 1}^{n} f_{d_{i}} \log \frac{1}{f_{d_{i}}} = \sum_{i = 1}^{n} f_{d_{i}} \log \frac{f_{d_{i}}}{f} \]
2.2 When the dependent variable is a nominal variable:
\[ I_{N}\left(d,s\right) = \frac{I \left(d,s\right)}{I \left(d\right)} = \frac{I \left(d\right) - I \left(d \mid s\right)}{I \left(d\right)} = 1 - \frac{\sum_{s_i \in S}\sum_{x \in V_d} p\left(s_i,x\right) \log p\left(x \mid s_i\right)}{\sum_{x \in V_d} p\left(x\right) \log p\left(x\right)} \]
where \(p\left(x\right)\) is the probability of observing \(x\) in \(U\), \(p\left(s_i,x\right)\) is the probability of observing \(s_i\) and \(x\) in \(U\), and \(p\left(x \mid s_i\right)\) is the probability of observing \(x\) given that the stratum is \(s_i\).
3. Examples of the sshicm
package
install.packages("sshicm", dep = TRUE)
baltim = sf::read_sf(system.file("extdata/baltim.gpkg",package = "sshicm"))
sshicm(PRICE ~ .,baltim,type = "IC")
## # A tibble: 5 × 3
## Variable Ic Pv
## <chr> <dbl> <dbl>
## 1 AC 0.223 0
## 2 PATIO 0.162 0.643
## 3 FIREPL 0.135 0.657
## 4 DWELL 0.124 0.716
## 5 CITCOU 0.0898 0.988
cinc = sf::read_sf(system.file("extdata/cinc.gpkg",package = "sshicm"))
sshicm(THEFT_D ~ .,cinc,type = "IN")
## # A tibble: 5 × 3
## Variable In Pv
## <chr> <dbl> <dbl>
## 1 DENSITY 0.776 0.0681
## 2 MEDIAN_AGE 0.228 0.0230
## 3 MALE 0.0367 0
## 4 AVG_FAMSIZ 0.0205 0.00300
## 5 FEMALE 0.00584 0.0200
Reference
Wang, J., Haining, R., Zhang, T., Xu, C., Hu, M., Yin, Q., … Chen, H. (2024). Statistical Modeling of Spatially Stratified Heterogeneous Data. Annals of the American Association of Geographers, 114(3), 499–519. https://doi.org/10.1080/24694452.2023.2289982.
Bai, H., Wang, H., Li, D., & Ge, Y. (2023). Information Consistency-Based Measures for Spatial Stratified Heterogeneity. Annals of the American Association of Geographers, 113(10), 2512–2524. https://doi.org/10.1080/24694452.2023.2223700.
Wang, J., Li, X., Christakos, G., Liao, Y., Zhang, T., Gu, X., & Zheng, X. (2010). Geographical Detectors‐Based Health Risk Assessment and its Application in the Neural Tube Defects Study of the Heshun Region, China. International Journal of Geographical Information Science, 24(1), 107–127. https://doi.org/10.1080/13658810802443457.
Wang, J. F., Zhang, T. L., & Fu, B. J. A measure of spatial stratified heterogeneity. Ecological indicators, 2016. 67, 250-256. https://doi.org/10.1016/j.ecolind.2016.02.052.