Skip to contents

Function for determining the optimal spatial data discretization based on SPADE q-statistics.

Usage

cpsd_disc(
  formula,
  data,
  wt,
  discnum = 3:8,
  discmethod = "quantile",
  strategy = 2L,
  increase_rate = 0.05,
  cores = 1,
  seed = 123456789,
  ...
)

Arguments

formula

A formula of optimal spatial data discretization.

data

A data.frame, tibble or sf object of observation data.

wt

The spatial weight matrix.

discnum

(optional) A vector of number of classes for discretization. Default is 3:8.

discmethod

(optional) The discretization methods. Default all use quantile. Noted that rpart will use rpart_disc(); Others use sdsfun::discretize_vector().

strategy

(optional) Discretization strategy. When strategy is 1L, choose the highest SPADE model q-statistics to determinate optimal spatial data discretization parameters. When strategy is 2L, The optimal discrete parameters of spatial data are selected by combining LOESS model.

increase_rate

(optional) The critical increase rate of the number of discretization. Default is 5%.

cores

(optional) Positive integer (default is 1). When cores are greater than 1, use multi-core parallel computing.

seed

(optional) Random seed number, default is 123456789.

...

(optional) Other arguments passed to sdsfun::discretize_vector() or rpart_disc().

Value

A list.

x

discretization variable name

k

optimal number of spatial data discreteization

method

optimal spatial data discretization method

disc

the result of optimal spatial data discretization

Note

When the discmethod is configured to robust, it will operate at a significantly reduced speed. Consequently, the use of robust discretization is not advised.

References

Yongze Song & Peng Wu (2021) An interactive detector for spatial associations, International Journal of Geographical Information Science, 35:8, 1676-1701, DOI:10.1080/13658816.2021.1882680

Author

Wenbo Lv lyu.geosocial@gmail.com

Examples

data('sim')
wt = sdsfun::inverse_distance_swm(sf::st_as_sf(sim,coords = c('lo','la')))
cpsd_disc(y ~ xa + xb + xc, data = sim, wt = wt)
#> $x
#> [1] "xa" "xb" "xc"
#> 
#> $k
#> [1] 5 5 7
#> 
#> $method
#> [1] "quantile" "quantile" "quantile"
#> 
#> $disv
#> # A tibble: 80 × 3
#>       xa    xb    xc
#>    <int> <int> <int>
#>  1     1     4     3
#>  2     4     4     6
#>  3     2     5     3
#>  4     1     3     2
#>  5     4     4     5
#>  6     2     4     4
#>  7     4     4     6
#>  8     1     2     2
#>  9     4     3     4
#> 10     5     1     6
#> # ℹ 70 more rows
#>