Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct the cluster label of localG() #170

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

microly
Copy link

@microly microly commented Oct 19, 2024

The cluster label of localG() should be based on the z-score other than x.

(1)G
xi is not included in the G statistic of location i, so its cluster label has nothing to do with xi.

(2)Gstar
xi dose not have decisive impact on the Cluster label of location i.
For example, when the Gstar statistic of location i have significantly high value, but xi is slightly lower than mean(x), its cluster label should be "High" but not "Low".

(3)Rey et al.(2023) confirm that the cluster label of localG() should be based on the z-score other than x (p174-175) .
In p174, they say "When (G statistics) standardized, positive values imply clustering of high values, while negative implies grouping of low values."
The codes on p174-175 can also be referred.

Rey, S., Arribas-Bel, D., & Wolf, L.J. (2023). Geographic Data Science with Python (1st ed.). Chapman and Hall/CRC. https://doi.org/10.1201/9780429292507

p174
p174

p175
p175

@rsbivand
Copy link
Member

rsbivand commented Oct 21, 2024

Please change your PR, and note that this implementation defines the quadrants from the input variable, not the standard variate of the local indicator. Indeed, GeoDa takes the mean of the local G values as the starting point: https://github.com/GeoDaCenter/libgeoda/blob/b512f742ff95fba89bbc9958972667c7643edba5/sa/UniG.cpp#L54-L110. It should not replace ´mean(x)´ by 0, but add extra vectors (maybe one for pysal based on standard deviate values, and one for Geoda based on the values of G), as in:

spdep/R/localmoran.R

Lines 47 to 56 in dbd3074

quadr_ps <- interaction(cut(z, c(-Inf, 0, Inf), labels=lbs),
cut(lz, c(-Inf, 0, Inf), labels=lbs), sep="-")
lx <- lag.listw(listw, x, zero.policy=zero.policy, NAOK=NAOK)
lxx <- mean(lx, na.rm=NAOK)
quadr <- interaction(cut(x, c(-Inf, xx, Inf), labels=lbs),
cut(lx, c(-Inf, lxx, Inf), labels=lbs), sep="-")
xmed <- median(x, na.rm=NAOK)
lxmed <- median(lx, na.rm=NAOK)
quadr_med <- interaction(cut(x, c(-Inf, xmed, Inf), labels=lbs),
cut(lx, c(-Inf, lxmed, Inf), labels=lbs), sep="-")
and

spdep/R/localmoran.R

Lines 126 to 127 in dbd3074

attr(res, "quadr") <- data.frame(mean=quadr, median=quadr_med,
pysal=quadr_ps)
, and check that

spdep/R/hotspotmap.R

Lines 63 to 75 in dbd3074

hotspot.localG <- function(obj, Prname, cutoff=0.005, p.adjust="fdr", droplevels=TRUE, ...) {
stopifnot(!is.null(attr(obj, "internals")))
if (is.null(attr(obj, "cluster"))) stop("object has no cluster attribute")
res <- attr(obj, "cluster")
if (!is.factor(res)) stop("cluster is not a factor")
x <- attr(obj, "internals")
nms <- colnames(x)
i <- match(Prname, nms)
if (is.na(i)) stop("Prname not in column names: ", paste(nms, collapse=" "))
is.na(res) <- p.adjust(x[,i], p.adjust) >= cutoff
if (droplevels) res <- droplevels(res)
res
}
uses the quadrant.type argument as in

spdep/R/hotspotmap.R

Lines 9 to 21 in dbd3074

hotspot.localmoran <- function(obj, Prname, cutoff=0.005, quadrant.type="mean", p.adjust="fdr", droplevels=TRUE, ...) {
quadrant.type <- match.arg(quadrant.type, c("mean", "median", "pysal"))
if (is.null(attr(obj, "quadr"))) stop("object has no quadr attribute")
res <- attr(obj, "quadr")[[quadrant.type]]
if (!is.factor(res)) stop("chosen quadrant type is not a factor")
nms <- colnames(obj)
i <- match(Prname, nms)
if (is.na(i)) stop("Prname not in column names: ", paste(nms, collapse=" "))
if (is.null(attr(obj, "quadr"))) stop("object has no quadr attribute")
is.na(res) <- p.adjust(obj[,i], p.adjust) >= cutoff
if (droplevels) res <- droplevels(res)
res
}
.

Please also provide a full reproducible example using a standard data set, also with rgeoda and PySAL/esda through reticulate, as for example in https://github.com/rsbivand/gaSI21/blob/cd74f88877d7c48320e502bed7e06a9263275ec6/GA_bivand_SI_2.R#L302. Please note the versions of the python components used.

@microly
Copy link
Author

microly commented Oct 22, 2024

Thanks for your kind reply.
I will finish this as soon as I can.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants