1. The principle of the information-theoretical V-measure
Let us denote the area of the domain as A. Consider two different regionalizations of the domain. To make a further discussion more lucid, we will refer to the first one as a regionalization and to the second one as a partition. The regionalization \(R\) divides the domain into \(n\) regions \(r_i \mid i = 1,\ldots,n\). The partition \(Z\) divides the domain into \(m\) zones \(z_j \mid j = 1,\ldots,n\). Both \(R\) and \(Z\) are essentially integer-type vectors with equal elements.
\[ h = 1 - \sum\limits_{j=1}^m \frac{A_j}{A} \frac{S_j^R}{S^R} \]
where \(S^R = - \sum\limits_{i=1}^n \frac{A_i}{A} \log\frac{A_i}{A}\), \(S_j^R = - \sum\limits_{i=1}^n \frac{a_{i,j}}{A_j} \log \frac{a_{i,j}}{A_j}\), and \(a_{i,j}\) represents the count of elements where \(R==i\) and \(Z==j\). \(A_i\) is the number of elements in the vector where \(R==i\), and \(A_j\) is the number of elements in the vector where \(Z==j\).
By swapping \(R\) and \(Z\), \(c\) can be calculated. Finally, the v-measure can be calculated useing the below formula:
\[ V_{\beta} = \frac{(1+\beta)hc}{(\beta h) + c} \]
2. Example
install.packages("itmsa", dep = TRUE)
install.packages("gdverse", dep = TRUE)
ntds = gdverse::NTDs
ntds$incidence = sdsfun::discretize_vector(ntds$incidence, 5)
itm(incidence ~ watershed + elevation + soiltype,
data = ntds, method = "vm")
## # A tibble: 3 × 3
## Variable Iv Pv
## <chr> <dbl> <dbl>
## 1 watershed 0.373 0
## 2 elevation 0.365 0
## 3 soiltype 0.213 0