Skip to contents

Function for determining the best univariate discretization based on geodetector q-statistic.

Usage

gd_bestunidisc(
  formula,
  data,
  discnum = 3:22,
  discmethod = c("sd", "equal", "geometric", "quantile", "natural"),
  cores = 1,
  return_disc = TRUE,
  seed = 123456789,
  ...
)

Arguments

formula

A formula of best univariate discretization.

data

A data.frame or tibble of observation data.

discnum

(optional) A vector of number of classes for discretization. Default is 3:22.

discmethod

(optional) A vector of methods for discretization, default is using c("sd","equal","geometric","quantile","natural") by invoking sdsfun.

cores

(optional) Positive integer (default is 1). When cores are greater than 1, use multi-core parallel computing.

return_disc

(optional) Whether or not return discretized result used the optimal parameter. Default is TRUE.

seed

(optional) Random seed number, default is 123456789.

...

(optional) Other arguments passed to sdsfun::discretize_vector().

Value

A list with the optimal parameter in the provided parameter combination with k, method and disc(when return_disc is TRUE).

x

the name of the variable that needs to be discretized

k

optimal discretization number

method

optimal discretization method

disc

optimal discretization results

Author

Wenbo Lv lyu.geosocial@gmail.com

Examples

data('sim')
gd_bestunidisc(y ~ xa + xb + xc,
               data = sim,
               discnum = 3:6)
#> $x
#> [1] "xa" "xb" "xc"
#> 
#> $k
#> [1] 6 6 6
#> 
#> $method
#> [1] "geometric" "geometric" "geometric"
#> 
#> $disv
#> # A tibble: 80 × 3
#>       xa    xb    xc
#>    <int> <int> <int>
#>  1     2     5     4
#>  2     5     5     5
#>  3     3     5     4
#>  4     3     4     3
#>  5     5     5     5
#>  6     4     5     4
#>  7     5     5     5
#>  8     2     3     2
#>  9     5     4     4
#> 10     6     3     5
#> # ℹ 70 more rows
#>