best univariate discretization based on geodetector q-statistic
Source:R/gd_bestunidisc.R
gd_bestunidisc.Rd
Function for determining the best univariate discretization based on geodetector q-statistic.
Usage
gd_bestunidisc(
formula,
data,
discnum = 3:8,
discmethod = c("sd", "equal", "geometric", "quantile", "natural"),
cores = 1,
return_disc = TRUE,
seed = 123456789,
...
)
Arguments
- formula
A formula of best univariate discretization.
- data
A
data.frame
ortibble
of observation data.- discnum
(optional) A vector of number of classes for discretization. Default is
3:8
.- discmethod
(optional) A vector of methods for discretization, default is using
c("sd","equal","geometric","quantile","natural")
by invokingsdsfun
.- cores
(optional) Positive integer (default is 1). When cores are greater than 1, use multi-core parallel computing.
- return_disc
(optional) Whether or not return discretized result used the optimal parameter. Default is
TRUE
.- seed
(optional) Random seed number, default is
123456789
.- ...
(optional) Other arguments passed to
sdsfun::discretize_vector()
.
Value
A list.
x
the name of the variable that needs to be discretized
k
optimal discretization number
method
optimal discretization method
disc
optimal discretization results
Author
Wenbo Lv lyu.geosocial@gmail.com
Examples
data('sim')
gd_bestunidisc(y ~ xa + xb + xc,
data = sim,
discnum = 3:6)
#> $x
#> [1] "xa" "xb" "xc"
#>
#> $k
#> [1] 6 6 6
#>
#> $method
#> [1] "geometric" "geometric" "geometric"
#>
#> $disv
#> # A tibble: 80 × 3
#> xa xb xc
#> <int> <int> <int>
#> 1 2 5 4
#> 2 5 5 5
#> 3 3 5 4
#> 4 3 4 3
#> 5 5 5 5
#> 6 4 5 4
#> 7 5 5 5
#> 8 2 3 2
#> 9 5 4 4
#> 10 6 3 5
#> # ℹ 70 more rows
#>