Geographical Convergent Cross Mapping (GCCM)
Wenbo Lv
2025-01-11
Source:vignettes/GCCM.Rmd
GCCM.Rmd
1. The principle of geographical convergent cross mapping (GCCM)
Takens’ theory proves that, for a dynamic system \(\phi\), if its trajectory converges to an attractor manifold \(M\), which are consisted by a bounded and invariant set of states, then the mapping between \(\phi\) and M can be built and time series observations of \(\phi\) can be used to construct \(M\).
According to the generalized embedding theorem, for a compact \(d\)-dimensional manifold \(M\) and a set of observation functions \(\left<h_1,h_2,\ldots,h_L\right>\), the map \(\psi_{\phi,h} = \left<h_1\left(x\right),h_2\left(x\right),\ldots,h_L\left(x\right)\right>\) is an embedding of \(M\) with \(L = 2d + 1\). Here embedding means a one-to-one map resolving all singularities of the original manifold. The elements \(h_i\) can be lags of observations from single time series observations, lags of observations from multiple time series, or multiple observation functions. The first two constructions are only special cases of the third one.
By taking the measured values at one specific unit and its neighbors (named as spatial lags in spatial statistics) as a set of observation functions, \(\psi_{\phi,h} \left(x,s\right) = \left<h_s\left(x\right),h_{s\left(1\right)}\left(x\right),\ldots,h_{s\left(L-1\right)}\left(x\right)\right>\) is a embedding, where \(s\) is the focal unit currently under investigation and \(s\left(i\right)\) is its \(i\)-th order of spatial lags. \(h_s\left(x\right)\) and \(h_{s\left(i\right)}\left(x\right)\) are their observation functions respectively. (Hereinafter, we will use \(\psi \left(x,s\right)\) to present \(\psi_{\phi,h} \left(x,s\right)\) for short). For two spatial variables \(X\) and \(Y\) on the same set of spatial units, their values and spatial lags can be regarded as observation functions reading values from each spatial unit. As the spatial lags in each order contain more than one spatial units, the observation function can be set as the mean of the spatial units or other summary functions considering the spatial direction, to assure the one-to-one mapping of the original manifold \(M\).
The cross-mapping prediction is defined as:
\[ \hat{Y}_s \mid M_x = \sum\limits_{i=1}^{L+1} \left(\omega_{si}Y_{si} \mid M_x \right) \]
where \(s\) represents a spatial unit at which the value of \(Y\) needs to be predicted, \(\hat{Y}_s\) is the prediction result, \(L\) is the number of dimensions of the embedding, \(si\) is the spatial unit used in the prediction, \(Y_{si}\) is the observation value at \(si\) and simultaneously the first component of a state in \(M_y\), noted as \(\psi\left(y,s_i\right)\). In further, \(\psi\left(y,s_i\right)\) is determined by its one-to-one mapping point \(\psi\left(x,s_i\right)\), which is in turn one of the \(L+1\) nearest neighbors of the focal state in \(M_x\). \(\omega_{si}\) is the corresponding weight defined as:
\[ \omega_{si} \mid M_x = \frac{weight \left(\psi\left(x,s_i\right),\psi\left(x,s\right)\right)}{\sum_{i=1}^{L+1}weight \left(\psi\left(x,s_i\right),\psi\left(x,s\right)\right)} \] where \(weight \left(\ast,\ast\right)\) is the weight function between two states in the shadow manifold, defined as:
\[ weight \left(\psi\left(x,s_i\right),\psi\left(x,s\right)\right) = \exp \left(- \frac{dis \left(\psi\left(x,s_i\right),\psi\left(x,s\right)\right)}{dis \left(\psi\left(x,s_1\right),\psi\left(x,s\right)\right)} \right) \]
where \(\exp\) is the exponential function and \(dis \left(\ast,\ast\right)\) represents the distance function between two states in the shadow manifold defined as:
\[ dis \left(\psi\left(x,s_i\right),\psi\left(x,s\right)\right) = \frac{1}{L} \left(\left|h_{si}\left(x\right)-h_{s}\left(x\right)\right| + \sum_{k=1}^{L-1}abs \left[h_{si\left(k\right)}\left(x\right),h_{s\left(k\right)}\left(x\right)\right]\right) \] Note that the absolute value distance is used here.
The skill of cross-mapping prediction is measured by the Pearson correlation coefficient between the true observations and corresponding predictions:
\[ \rho = \frac{Cov\left(Y,\hat{Y}\right)}{\sqrt{Var\left(Y\right) Var\left(\hat{Y}\right)}} \]
The prediction skill \(\rho\) varies by setting different sizes of libraries, which means the quantity of observations used in reconstruction of the shadow manifold. We can use the convergence of \(\rho\) to infer the causal associations. For GCCM, the convergence means that \(\rho\) increases with the size of libraries and is statistically significant when the library becomes largest. And the confidence interval of \(\rho\) can be estimated based the \(z\)-statistics with the normal distribution:
\[ t = \rho \sqrt{\frac{n-2}{1-\rho^2}} \] where \(n\) is the number of observations to be predicted, and
\[ z = \frac{1}{2} \ln \left(\frac{1+\rho}{1-\rho}\right) \]
2. Examples
2.2 An example of lattice data about county-level population density
Load data and package:
popd_nb = spdep::read.gal(system.file("extdata/popdensity_nb.gal",
package = "spEDM"))
## Warning in spdep::read.gal(system.file("extdata/popdensity_nb.gal", package = "spEDM")):
## neighbour object has 4 sub-graphs
popd_nb
## Neighbour list object:
## Number of regions: 2806
## Number of nonzero links: 15942
## Percentage nonzero weights: 0.2024732
## Average number of links: 5.681397
## 4 disjoint connected subgraphs
popdensity = readr::read_csv(system.file("extdata/popdensity.csv",
package = "spEDM"))
## Rows: 2806 Columns: 7
## ── Column specification ───────────────────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (7): x, y, popDensity, DEM, Tem, Pre, slop
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
popdensity
## # A tibble: 2,806 × 7
## x y popDensity DEM Tem Pre slop
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 117. 30.5 780. 8 17.4 1528. 0.452
## 2 117. 30.6 395. 48 17.2 1487. 0.842
## 3 117. 30.8 261. 49 16.0 1456. 3.56
## 4 116. 30.1 258. 23 17.4 1555. 0.932
## 5 116. 30.5 211. 101 16.3 1494. 3.34
## 6 117. 31.0 386. 10 16.6 1382. 1.65
## 7 117. 30.2 350. 23 17.5 1569. 0.346
## 8 117. 30.7 470. 22 17.1 1493. 1.88
## 9 117. 30.6 1226. 11 17.4 1526. 0.208
## 10 116. 30.9 137. 598 13.9 1458. 5.92
## # ℹ 2,796 more rows
popd_sf = sf::st_as_sf(popdensity, coords = c("x","y"), crs = 4326)
popd_sf
## Simple feature collection with 2806 features and 5 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 74.9055 ymin: 18.2698 xmax: 134.269 ymax: 52.9346
## Geodetic CRS: WGS 84
## # A tibble: 2,806 × 6
## popDensity DEM Tem Pre slop geometry
## * <dbl> <dbl> <dbl> <dbl> <dbl> <POINT [°]>
## 1 780. 8 17.4 1528. 0.452 (116.912 30.4879)
## 2 395. 48 17.2 1487. 0.842 (116.755 30.5877)
## 3 261. 49 16.0 1456. 3.56 (116.541 30.7548)
## 4 258. 23 17.4 1555. 0.932 (116.241 30.104)
## 5 211. 101 16.3 1494. 3.34 (116.173 30.495)
## 6 386. 10 16.6 1382. 1.65 (116.935 30.9839)
## 7 350. 23 17.5 1569. 0.346 (116.677 30.2412)
## 8 470. 22 17.1 1493. 1.88 (117.066 30.6514)
## 9 1226. 11 17.4 1526. 0.208 (117.171 30.5558)
## 10 137. 598 13.9 1458. 5.92 (116.208 30.8983)
## # ℹ 2,796 more rows
Run GCCM:
startTime = Sys.time()
pd_res = gccm(data = popd_sf,
cause = "Pre",
effect = "popDensity",
libsizes = seq(10, 2800, by = 100),
E = 3,
k = 4,
nb = popd_nb,
progressbar = FALSE)
endTime = Sys.time()
print(difftime(endTime,startTime, units ="mins"))
## Time difference of 19.59433 mins
pd_res
## libsizes popDensity->Pre Pre->popDensity
## 1 10 0.01130395 0.06607465
## 2 110 0.03306905 0.22689926
## 3 210 0.04821180 0.28196395
## 4 310 0.06475999 0.30785228
## 5 410 0.07922017 0.33124862
## 6 510 0.09428975 0.36226576
## 7 610 0.10951603 0.39693375
## 8 710 0.12249230 0.42302102
## 9 810 0.13594244 0.44164444
## 10 910 0.14767067 0.45689198
## 11 1010 0.15863133 0.46853529
## 12 1110 0.16852300 0.47657811
## 13 1210 0.17808456 0.48683486
## 14 1310 0.18761109 0.49750321
## 15 1410 0.19697038 0.50708625
## 16 1510 0.20507772 0.51660029
## 17 1610 0.21264081 0.52744922
## 18 1710 0.21969130 0.53845530
## 19 1810 0.22618694 0.54912681
## 20 1910 0.23207300 0.55982092
## 21 2010 0.23780738 0.57043510
## 22 2110 0.24300452 0.58089481
## 23 2210 0.24773394 0.59104138
## 24 2310 0.25208726 0.60092354
## 25 2410 0.25628422 0.61050007
## 26 2510 0.26039117 0.62015838
## 27 2610 0.26436854 0.62986485
## 28 2710 0.26826114 0.63868014
Visualize the result:
2.3 An example of grid data about farmland NPP
Load data and package:
npp = terra::rast(system.file("extdata/npp.tif", package = "spEDM"))
npp
## class : SpatRaster
## dimensions : 404, 483, 3 (nrow, ncol, nlyr)
## resolution : 10000, 10000 (x, y)
## extent : -2625763, 2204237, 1877078, 5917078 (xmin, xmax, ymin, ymax)
## coord. ref. : CGCS2000_Albers
## source : npp.tif
## names : npp, pre, tem
## min values : 164.00, 384.3409, -47.8194
## max values : 16606.33, 23878.3555, 263.6938
terra::plot(npp, nc = 3,
mar = rep(0.1,4),
oma = rep(0.1,4),
axes = FALSE,
legend = FALSE)
To save the computation time, we will aggregate the data by 3 times and select 3000 non-NA pixels to predict:
npp = terra::aggregate(npp, fact = 3, na.rm = TRUE)
terra::global(npp,"isNA")
## isNA
## npp 14815
## pre 14766
## tem 14766
terra::ncell(npp)
## [1] 21735
namat = terra::as.matrix(!is.na(npp[[1]]), wide = TRUE)
pred = terra::rowColFromCell(npp,which(namat))
dim(pred)
## [1] 6920 2
set.seed(42)
indices = sample(nrow(pred), size = 3000, replace = FALSE)
pred = pred[indices,]
Run GCCM:
startTime = Sys.time()
npp_res = gccm(data = npp,
cause = "pre",
effect = "npp",
libsizes = seq(10,130,5),
E = 2,
k = 5,
RowCol = pred,
progressbar = FALSE)
endTime = Sys.time()
print(difftime(endTime,startTime, units ="mins"))
## Time difference of 24.49013 mins
npp_res
## libsizes npp->pre pre->npp
## 1 10 0.049237491 0.07312462
## 2 15 0.063939456 0.10307461
## 3 20 0.080009483 0.12935898
## 4 25 0.097903289 0.14718470
## 5 30 0.104758061 0.15943448
## 6 35 0.111686523 0.17026742
## 7 40 0.119169359 0.18985868
## 8 45 0.132677547 0.21607528
## 9 50 0.148968939 0.24764600
## 10 55 0.165193950 0.28759045
## 11 60 0.173868473 0.30697776
## 12 65 0.179777011 0.31209341
## 13 70 0.157231842 0.28574340
## 14 75 0.108711472 0.24817243
## 15 80 0.058977718 0.22701677
## 16 85 -0.003501071 0.23364337
## 17 90 -0.022853416 0.23577169
## 18 95 -0.042780283 0.34364441
## 19 100 -0.034298266 0.72561637
## 20 105 -0.070605553 0.77999128
Visualize the result:
plot(npp_res,xlimits = c(9, 101),ylimits = c(-0.05,1))
## Warning: Removed 1 row containing missing values or values outside the scale range (`geom_line()`).
## Removed 1 row containing missing values or values outside the scale range (`geom_line()`).