Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PR for issue 177 (simulation covariate prevalence) #178

Merged
merged 3 commits into from
Mar 21, 2025

Conversation

ob325
Copy link

@ob325 ob325 commented Feb 28, 2025

Adds optional minCellCount parameter to createCohortMethodDataSimulationProfile. Default value is 10, with the idea it's better to protect the user from accidental disclosure of PHI. Adds non-exported function .truncateSimulationProfile that accepts a cohortDataSimulationProfile object and returns a modified copy having small cell counts reset to zero.

Example use:

x <- list()
> x$metaData <- list()
> x$metaData$populationSize <- 135000
> x$covariatePrevalence <- data.frame(covariateId = 1:1000, prevalence = runif(n = 1000, min = 0, max = .1))
> class(x) <- "CohortDataSimulationProfile"
> lapply(x, head)
$metaData
$metaData$populationSize
[1] 135000


$covariatePrevalence
  covariateId   prevalence
1           1 0.0005396071
2           2 0.0383829203
3           3 0.0393560038
4           4 0.0885898456
5           5 0.0140272565
6           6 0.0441554085

> 
> x0 <- .truncateSimulationProfile(x, 0)
Warning message:
In .truncateSimulationProfile(x, 0) :
  No truncation was done on low-prevalence covariates. Object may include low cell counts that enable identification of persons.
> identical(x, x0)
[1] TRUE
> x1 <- .truncateSimulationProfile(x, 100)
Before truncating simulation profile, lowest non-zero covariate prevalence is 0.00007826 (11 / 135000)
After truncating simulation profile, lowest non-zero covariate prevalence is 0.00077438 (105 / 135000)
> lapply(x1, head)
$metaData
$metaData$populationSize
[1] 135000


$covariatePrevalence
  covariateId prevalence
1           1 0.00000000
2           2 0.03838292
3           3 0.03935600
4           4 0.08858985
5           5 0.01402726
6           6 0.04415541

> 

@msuchard
Copy link
Member

@ob325 -- default truncation across the OHDSI tool-stack appears to be 5 and not 10.

@ob325
Copy link
Author

ob325 commented Feb 28, 2025

@msuchard just made new commit with default = 5

@schuemie schuemie merged commit 1f14c08 into OHDSI:develop Mar 21, 2025
1 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants