Skip to contents

Methodological Background

Geographical Pattern Causality (GPC) infers causal relations from spatial cross-sectional data by reconstructing a symbolic approximation of the underlying spatial dynamical system.

Let \(x(s)\) and \(y(s)\) denote two spatial cross-sections over spatial units \(s \in \mathcal{S}\).

(1) Spatial Embedding

For each spatial unit \(s_i\), GPC constructs an embedding vector

\[ \mathbf{E}_{x(s_i)} = \big( x(s_{i}^{(1)}), x(s_{i}^{(2)}), \dots, x(s_{i}^{(E\tau)}) \big), \]

where \(s_{i}^{(k)}\) denotes the \(k\)-th spatially lagged value of the spatial unit \(s_i\), determined by embedding dimension \(E\) and lag \(\tau\). This yields two reconstructed state spaces \(\mathcal{M}_x, \mathcal{M}_y \subset \mathbb{R}^E\).

(2) Symbolic Pattern Extraction

Local geometric transitions in each manifold are mapped to symbols

\[ \sigma_x(s_i),; \sigma_y(s_i) \in \mathcal{A}, \]

encoding increasing, decreasing, or non-changing modes. These symbolic trajectories summarize local pattern evolution.

(3) Cross-Pattern Mapping

Causality from \(x \to y\) is assessed by predicting:

\[ \hat{\sigma}_y(s_i) = F\big( \sigma_x(s_j): s_j \in \mathcal{N}_k(s_i) \big), \]

where \(\mathcal{N}_k\) denotes the set of \(k\) nearest neighbors in \(\mathcal{M}_x\). The agreement structure between \(\hat{\sigma}_y(s_i)\) and \(\sigma_y(s_i)\) determines the causal mode:

  • Positive: \(\hat{\sigma}_y = \sigma_y\)
  • Negative: \(\hat{\sigma}_y = -\sigma_y\)
  • Dark: neither agreement nor opposition

(4) Causal Strength

The global causal strength is the normalized consistency of symbol matches:

\[ C_{x \to y} = \frac{1}{|\mathcal{S}|} \sum_{s_i \in \mathcal{S}} \mathbb{I}\big[ \hat{\sigma}_y(s_i) \bowtie \sigma_y(s_i) \big], \]

where \(\bowtie\) encodes positive, negative, or dark matching rules.

Usage examples

Example of spatial vector data

Load the spEDM package and its columbus spatial analysis data:

library(spEDM)

columbus = sf::read_sf(system.file("case/columbus.gpkg", package="spEDM"))
columbus
## Simple feature collection with 49 features and 6 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: 5.874907 ymin: 10.78863 xmax: 11.28742 ymax: 14.74245
## Projected CRS: Undefined Cartesian SRS with unknown unit
## # A tibble: 49 × 7
##    hoval   inc  crime  open plumb discbd                                    geom
##    <dbl> <dbl>  <dbl> <dbl> <dbl>  <dbl>                               <POLYGON>
##  1  80.5 19.5  15.7   2.85  0.217   5.03 ((8.624129 14.23698, 8.5597 14.74245, …
##  2  44.6 21.2  18.8   5.30  0.321   4.27 ((8.25279 14.23694, 8.282758 14.22994,…
##  3  26.4 16.0  30.6   4.53  0.374   3.89 ((8.653305 14.00809, 8.81814 14.00205,…
##  4  33.2  4.48 32.4   0.394 1.19    3.7  ((8.459499 13.82035, 8.473408 13.83227…
##  5  23.2 11.3  50.7   0.406 0.625   2.83 ((8.685274 13.63952, 8.677577 13.72221…
##  6  28.8 16.0  26.1   0.563 0.254   3.78 ((9.401384 13.5504, 9.434411 13.69427,…
##  7  75    8.44  0.178 0     2.40    2.74 ((8.037741 13.60752, 8.062716 13.60452…
##  8  37.1 11.3  38.4   3.48  2.74    2.89 ((8.247527 13.58651, 8.2795 13.5965, 8…
##  9  52.6 17.6  30.5   0.527 0.891   3.17 ((9.333297 13.27242, 9.671007 13.27361…
## 10  96.4 13.6  34.0   1.55  0.558   4.33 ((10.08251 13.03377, 10.0925 13.05275,…
## # ℹ 39 more rows

The false nearest neighbours (FNN) method helps identify the appropriate minimal embedding dimension for reconstructing the state space of a time series or spatial cross-sectional data.

spEDM::fnn(columbus, "crime", E = 1:10, eps = stats::sd(columbus$crime))
## [spEDM] Output 'E:i' corresponds to the i-th valid embedding dimension.
## [spEDM] Input E values exceeding max embeddable dimension were truncated.
## [spEDM] Please map output indices to original E inputs before interpretation.
##        E:1        E:2        E:3        E:4        E:5        E:6        E:7 
## 0.59183673 0.04081633 0.04081633 0.10204082 0.00000000 0.00000000 0.00000000 
##        E:8 
## 0.00000000

The false nearest neighbours (FNN) ratio decreased to approximately 0.001 when the embedding dimension E reached 7, and remained relatively stable thereafter. Therefore, we adopted \(E = 7\) as the minimal embedding dimension for subsequent parameter search.

Then, search optimal parameters:

# determine the type of causality using correlation
stats::cor.test(columbus$hoval,columbus$crime)
## 
##  Pearson's product-moment correlation
## 
## data:  columbus$hoval and columbus$crime
## t = -4.8117, df = 47, p-value = 1.585e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.7366777 -0.3497978
## sample estimates:
##        cor 
## -0.5744867

# since the correlation is -0.574, negative causality is selected as the metric to maximize in the optimal parameter search
spEDM::pc(columbus, "hoval", "crime", E = 5:6, k = 7:10, tau = 1, maximize = "negative")
## The suggested E,k,tau for variable crime is 6, 8 and 1

Run geographical pattern causality analysis

spEDM::gpc(columbus, "hoval", "crime", E = 6, k = 8)
##       type   strength      direction
## 1 positive        NaN hoval -> crime
## 2 negative 0.10339376 hoval -> crime
## 3     dark 0.09591306 hoval -> crime
## 4 positive        NaN crime -> hoval
## 5 negative 0.66263003 crime -> hoval
## 6     dark 0.17201123 crime -> hoval

Convergence diagnostics

crime_convergence = spEDM::gpc(columbus, "hoval", "crime",
                               libsizes = seq(5, 45, by = 5),
                               E = 6, k = 8, progressbar = FALSE)
crime_convergence
##    libsizes     type   strength      direction
## 1        10 positive 0.00000000 hoval -> crime
## 2        15 positive 0.00000000 hoval -> crime
## 3        20 positive 0.00000000 hoval -> crime
## 4        25 positive 0.00000000 hoval -> crime
## 5        30 positive 0.00000000 hoval -> crime
## 6        35 positive 0.00000000 hoval -> crime
## 7        40 positive 0.00000000 hoval -> crime
## 8        45 positive 0.00000000 hoval -> crime
## 9        10 negative 0.00000000 hoval -> crime
## 10       15 negative 0.07911243 hoval -> crime
## 11       20 negative 0.09306701 hoval -> crime
## 12       25 negative 0.11800623 hoval -> crime
## 13       30 negative 0.12405268 hoval -> crime
## 14       35 negative 0.12273355 hoval -> crime
## 15       40 negative 0.11469876 hoval -> crime
## 16       45 negative 0.11305740 hoval -> crime
## 17       10     dark 0.01365380 hoval -> crime
## 18       15     dark 0.02102581 hoval -> crime
## 19       20     dark 0.01839746 hoval -> crime
## 20       25     dark 0.02450954 hoval -> crime
## 21       30     dark 0.04140131 hoval -> crime
## 22       35     dark 0.05119651 hoval -> crime
## 23       40     dark 0.06686337 hoval -> crime
## 24       45     dark 0.07935427 hoval -> crime
## 25       10 positive 0.00000000 crime -> hoval
## 26       15 positive 0.00000000 crime -> hoval
## 27       20 positive 0.00000000 crime -> hoval
## 28       25 positive 0.00000000 crime -> hoval
## 29       30 positive 0.00000000 crime -> hoval
## 30       35 positive 0.00000000 crime -> hoval
## 31       40 positive 0.00000000 crime -> hoval
## 32       45 positive 0.00000000 crime -> hoval
## 33       10 negative 0.00000000 crime -> hoval
## 34       15 negative 0.00000000 crime -> hoval
## 35       20 negative 0.00000000 crime -> hoval
## 36       25 negative 0.00000000 crime -> hoval
## 37       30 negative 0.00000000 crime -> hoval
## 38       35 negative 0.30524907 crime -> hoval
## 39       40 negative 0.58603540 crime -> hoval
## 40       45 negative 0.66263003 crime -> hoval
## 41       10     dark 0.07055205 crime -> hoval
## 42       15     dark 0.08995711 crime -> hoval
## 43       20     dark 0.11123206 crime -> hoval
## 44       25     dark 0.12403194 crime -> hoval
## 45       30     dark 0.13377971 crime -> hoval
## 46       35     dark 0.14469520 crime -> hoval
## 47       40     dark 0.15517056 crime -> hoval
## 48       45     dark 0.16563975 crime -> hoval
plot(crime_convergence, ylimits = c(-0.01,1),
     xlimits = c(9,46), xbreaks = seq(10, 45, 10))
Figure 1. Convergence curves of causal strengths among house value and crime.
Figure 1. Convergence curves of causal strengths among house value and crime.


Example of spatial raster data

Load the spEDM package and its farmland NPP data:

library(spEDM)

npp = terra::rast(system.file("case/npp.tif", package = "spEDM"))
# To save the computation time, we will aggregate the data by 3 times
npp = terra::aggregate(npp, fact = 3, na.rm = TRUE)
npp
## class       : SpatRaster 
## size        : 135, 161, 5  (nrow, ncol, nlyr)
## resolution  : 30000, 30000  (x, y)
## extent      : -2625763, 2204237, 1867078, 5917078  (xmin, xmax, ymin, ymax)
## coord. ref. : CGCS2000_Albers 
## source(s)   : memory
## names       :      npp,        pre,      tem,      elev,         hfp 
## min values  :   187.50,   390.3351, -47.8194, -110.1494,  0.04434316 
## max values  : 15381.89, 23734.5330, 262.8576, 5217.6431, 42.68803711

# Check the validated cell number
nnamat = terra::as.matrix(npp[[1]], wide = TRUE)
nnaindice = which(!is.na(nnamat), arr.ind = TRUE)
dim(nnaindice)
## [1] 6920    2

Determining minimal embedding dimension:

spEDM::fnn(npp, "npp", E = 1:15,
           eps = stats::sd(terra::values(npp[["npp"]]),na.rm = TRUE))
## [spEDM] Output 'E:i' corresponds to the i-th valid embedding dimension.
## [spEDM] Input E values exceeding max embeddable dimension were truncated.
## [spEDM] Please map output indices to original E inputs before interpretation.
##          E:1          E:2          E:3          E:4          E:5          E:6 
## 0.9813070569 0.5309427415 0.1322254335 0.0167630058 0.0017341040 0.0000000000 
##          E:7          E:8          E:9         E:10         E:11         E:12 
## 0.0001445087 0.0000000000 0.0000000000 0.0000000000 0.0002890173 0.0000000000 
##         E:13         E:14 
## 0.0001445087 0.0001445087

At \(E = 6\), the false nearest neighbor ratio stabilizes approximately at 0.0001 and remains constant thereafter. Therefore, \(E = 6\) is selected as minimal embedding dimension for the subsequent GPC analysis.

Then, search optimal parameters:

stats::cor.test(~ pre + npp,
                data = terra::values(npp[[c("pre","npp")]],
                                     dataframe = TRUE, na.rm = TRUE))
## 
##  Pearson's product-moment correlation
## 
## data:  pre and npp
## t = 108.74, df = 6912, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.7855441 0.8029426
## sample estimates:
##       cor 
## 0.7944062


g1 = spEDM::pc(npp, "npp", "pre", E = 6:10, k = 12, tau = 1:5, maximize = "positive")
g1
## The suggested E,k,tau for variable pre is 8, 12 and 5

Run geographical pattern causality analysis

spEDM::gpc(npp, "pre", "npp", E = 8, k = 12, tau = 5)
##       type  strength  direction
## 1 positive 0.5587734 pre -> npp
## 2 negative       NaN pre -> npp
## 3     dark 0.4228069 pre -> npp
## 4 positive 0.4426710 npp -> pre
## 5 negative 0.0000000 npp -> pre
## 6     dark 0.3793368 npp -> pre

Convergence diagnostics

npp_convergence = spEDM::gpc(npp, "pre", "npp",
                             libsizes = matrix(rep(seq(10,80,10),2),ncol = 2),
                             E = 8, k = 12, tau = 5, progressbar = FALSE)
npp_convergence
##    libsizes     type   strength  direction
## 1       100 positive 0.11924559 pre -> npp
## 2       400 positive 0.18641533 pre -> npp
## 3       900 positive 0.25999206 pre -> npp
## 4      1600 positive 0.32186441 pre -> npp
## 5      2500 positive 0.39667771 pre -> npp
## 6      3600 positive 0.44540723 pre -> npp
## 7      4900 positive 0.47912447 pre -> npp
## 8      6400 positive 0.52691047 pre -> npp
## 9       100 negative 0.00000000 pre -> npp
## 10      400 negative 0.00000000 pre -> npp
## 11      900 negative 0.00000000 pre -> npp
## 12     1600 negative 0.00000000 pre -> npp
## 13     2500 negative 0.00000000 pre -> npp
## 14     3600 negative 0.00000000 pre -> npp
## 15     4900 negative 0.00000000 pre -> npp
## 16     6400 negative 0.00000000 pre -> npp
## 17      100     dark 0.05477351 pre -> npp
## 18      400     dark 0.10639504 pre -> npp
## 19      900     dark 0.16627407 pre -> npp
## 20     1600     dark 0.23190492 pre -> npp
## 21     2500     dark 0.29044339 pre -> npp
## 22     3600     dark 0.33700220 pre -> npp
## 23     4900     dark 0.37823271 pre -> npp
## 24     6400     dark 0.41256952 pre -> npp
## 25      100 positive 0.08070530 npp -> pre
## 26      400 positive 0.11426176 npp -> pre
## 27      900 positive 0.18021979 npp -> pre
## 28     1600 positive 0.25729359 npp -> pre
## 29     2500 positive 0.33203272 npp -> pre
## 30     3600 positive 0.37662754 npp -> pre
## 31     4900 positive 0.41361082 npp -> pre
## 32     6400 positive 0.42959642 npp -> pre
## 33      100 negative 0.00000000 npp -> pre
## 34      400 negative 0.00000000 npp -> pre
## 35      900 negative 0.00000000 npp -> pre
## 36     1600 negative 0.00000000 npp -> pre
## 37     2500 negative 0.00000000 npp -> pre
## 38     3600 negative 0.00000000 npp -> pre
## 39     4900 negative 0.00000000 npp -> pre
## 40     6400 negative 0.00000000 npp -> pre
## 41      100     dark 0.02628157 npp -> pre
## 42      400     dark 0.06125898 npp -> pre
## 43      900     dark 0.11418591 npp -> pre
## 44     1600     dark 0.17595918 npp -> pre
## 45     2500     dark 0.23454296 npp -> pre
## 46     3600     dark 0.28152186 npp -> pre
## 47     4900     dark 0.32463349 npp -> pre
## 48     6400     dark 0.36526339 npp -> pre
plot(npp_convergence, ylimits = c(-0.01,0.65),
     xlimits = c(0,6500), xbreaks = seq(100, 6400, 500))
Figure 2. Convergence curves of causal strengths among precipitation and NPP.
Figure 2. Convergence curves of causal strengths among precipitation and NPP.