Skip to content

Commit

Permalink
Merge pull request #111 from ropensci/0.9.0-cran
Browse files Browse the repository at this point in the history
0.9.2
  • Loading branch information
sigmafelix authored Mar 2, 2025
2 parents 96391c4 + e1f0225 commit 7725d31
Show file tree
Hide file tree
Showing 9 changed files with 176 additions and 192 deletions.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: chopin
Title: Computation of Spatial Data by Hierarchical and Objective Partitioning of Inputs for Parallel Processing
Version: 0.9.1
Version: 0.9.2
Authors@R: c(
person("Insang", "Song", email = "geoissong@gmail.com", role = c("aut", "cre"),
comment = c(ORCID = "0000-0001-8732-3256")),
Expand Down
70 changes: 37 additions & 33 deletions R/scale_process.R
Original file line number Diff line number Diff line change
Expand Up @@ -816,58 +816,62 @@ par_multirasters <-
}


#' Map arguments to the desired names
#' @description This function maps the arguments of a target function
#' to the desired names. Users will use a named list `name_match` to
#' standardize the argument names, at least x and y, to the target function.
#' Map specified arguments to others in literals
#'
#' This function creates a new function that wraps an existing function,
#' remapping the argument names based on a user-specified literal mapping.
#' Specifically, arguments passed to the new function with
#' names on the left-hand side of the mapping are renamed to
#' the corresponding right-hand side names before being passed to
#' the original function. Users map two arguments without `x` and/or `y` to
#' standardize the argument names, x and y, to the target function.
#' This function is particularly useful to parallelize functions for spatial
#' data outside `sf` and `terra` packages that do not have arguments
#' named x and/or y. `par_*` functions could detect such functions by
#' wrapping nonstandardized functions to parallelize the computation.
#' @param fun A function to map arguments.
#' @param arg_map named character vector.
#' `c(x = "a", y = "i")` will map `a` and `i` in `fun` to
#' `x` and `y`, respectively.
#' @note `arg_map` should be defined carefully according to the characteristics
#' of `fun`. After mapping `x` and `y`, the resultant function will fail
#' if there remain arguments without default. It is recommended to map all
#' the arguments in `fun` to avoid any side effects.
#' @returns Function with arguments mapped.
#'
#' @param fun A function to be wrapped.
#' @param ... A set of named arguments representing the mapping from
#' the new function's argument names (left-hand side) to the original
#' function's argument names (right-hand side).
#' For example, \code{x = group, y = score} maps argument
#' \code{x} to \code{group} and \code{y} to \code{score}.
#'
#' @return A new function that accepts the remapped arguments and
#' calls the original function.
#'
#' @examples
#' cov_map <- arg_mapping <- c(x = "a", y = "b", z = "c", w = "d")
#' # Example original function
#' f1 <- function(a, b, c, d) {
#' return(a + b + c + d)
#' # Define an original function that expects arguments 'group' and 'score'
#' original_fun <- function(group, score, home = FALSE) {
#' list(group = group, score = score, home = home)
#' }
#' # Mapping of new argument names to original argument names
#' arg_mapping <- c(x = "a", y = "b", z = "c", w = "d")
#' f2 <- par_convert_f(f1, arg_mapping)
#'
#' # demonstrate f2 with the mapped arguments
#' f2(x = 1, y = 2, z = -1, w = 10)
#' # Create a new function that maps 'x' to 'group' and 'y' to 'score'
#' new_fun <- par_convert_f(original_fun, x = group, y = score)
#'
#' # Call the new function using the new argument names
#' result <- new_fun(x = 10, y = 20)
#' print(result)
#'
#' @export
par_convert_f <- function(fun, arg_map) {
par_convert_f <- function(fun, ...) {
arg_map_expr <- as.list(match.call(expand.dots = FALSE)$...)
arg_map <- lapply(arg_map_expr, function(x) deparse(x))

# Create a new function with the mapped arguments
# Create the new function that performs the remapping
new_fun <- function(...) {
# Capture the arguments passed to the new function
args_in <- list(...)

# Initialize an empty list for mapped arguments
mapped_args <- list()

# Loop through each argument in args_in
for (arg_name in names(args_in)) {
if (arg_name %in% names(arg_map)) {
# If the argument name is in arg_map, map it
mapped_args[[arg_map[[arg_name]]]] <- args_in[[arg_name]]
# Replace the argument name with the mapped name.
new_name <- arg_map[[arg_name]]
mapped_args[[new_name]] <- args_in[[arg_name]]
} else {
# Otherwise, keep the original argument name
mapped_args[[arg_name]] <- args_in[[arg_name]]
}
}

# Call the original function with the mapped arguments
do.call(fun, mapped_args)
}
return(new_fun)
Expand Down
19 changes: 18 additions & 1 deletion README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -251,6 +251,8 @@ ncpoints_srtm <-
radius = 1e4L
)
future::plan(future::sequential)
mirai::daemons(0L)
```


Expand All @@ -274,7 +276,6 @@ plot(ncpoints_m[, "mean_par"], main = "Multi-thread", pch = 19, cex = 0.33)
The same workflow operates on `mirai` dispatchers.

```{r demo-par-grid-mirai}
future::plan(future::sequential)
mirai::daemons(n = 4L, dispatcher = "process")
system.time(
Expand Down Expand Up @@ -491,6 +492,22 @@ all.equal(resj$distance.x, resj$distance.y)
Users should be mindful of caveats in the parallelization of nearest feature search, which may result in no or excess distance depending on the distribution of the target dataset to which the nearest feature is searched. For example, when one wants to calculate the nearest interstate from rural homes with fine grids, some grids may have no interstates then homes in such grids will not get any distance to the nearest interstate. Such problems can be avoided by choosing `nx`, `ny`, and `padding` values in `par_pad_grid()` meticulously.


## Map function arguments for `chopin` parallelization

If users' custom function with two main raster/vector arguments whose names are not `x` or `y`, users can use `par_convert_f` to map two arguments to `x` and `y`.

```r
sf_intersection_buffer <- function(a, b, buffer_dist = 100) {
intersected <- st_intersection(a, b)
buffered <- st_buffer(intersected, dist = buffer_dist)
return(buffered)
}

# Convert the function to remap arguments dynamically
custom_intersect_buffer <- par_convert_f(sf_intersection_buffer, x = "a", y = "b", dist = "buffer_dist")

```

## Caveats

### Why parallelization is slower than the ordinary function run?
Expand Down
80 changes: 51 additions & 29 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -176,7 +176,7 @@ library(dplyr)
library(sf)
#> Linking to GEOS 3.12.2, GDAL 3.9.3, PROJ 9.4.1; sf_use_s2() is TRUE
library(terra)
#> terra 1.7.83
#> terra 1.8.21
library(future)
library(future.mirai)
library(mirai)
Expand Down Expand Up @@ -266,7 +266,7 @@ system.time(
)
)
#> user system elapsed
#> 5.979 0.058 5.786
#> 5.716 0.131 5.568
```

#### Generate regular grid computational regions
Expand Down Expand Up @@ -355,7 +355,7 @@ system.time(
#> Input is a character. Attempt to read it with terra::rast...
#> ℹ Task at CGRIDID: 4 is successfully dispatched.
#> user system elapsed
#> 1.447 0.154 9.714
#> 1.388 0.292 8.105

ncpoints_srtm <-
extract_at(
Expand All @@ -364,6 +364,10 @@ ncpoints_srtm <-
id = "pid",
radius = 1e4L
)

future::plan(future::sequential)
mirai::daemons(0L)
#> [1] 0
```

``` r
Expand Down Expand Up @@ -394,7 +398,6 @@ plot(ncpoints_m[, "mean_par"], main = "Multi-thread", pch = 19, cex = 0.33)
The same workflow operates on `mirai` dispatchers.

``` r
future::plan(future::sequential)
mirai::daemons(n = 4L, dispatcher = "process")
#> [1] 4

Expand All @@ -413,8 +416,10 @@ system.time(
#> ℹ SpatRaster class input is detected.
#> Attempt to track the data source file path...
#> ℹ Input is not a character.
#> ■■■■■■■■■ 25% | ETA: 18s
#> ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 100% | ETA: 0s
#> user system elapsed
#> 0.847 0.132 7.721
#> 0.878 0.186 9.461

# remove mirai::daemons
mirai::daemons(0L)
Expand Down Expand Up @@ -449,14 +454,14 @@ download.file(hierarchy_url, hierarchy_path, mode = "wb")

nc_data <- hierarchy_path
nc_county <- sf::st_read(nc_data, layer = "county")
#> Reading layer `county' from data source `/tmp/RtmpBBOR7A/nc_hierarchy.gpkg' using driver `GPKG'
#> Reading layer `county' from data source `/tmp/RtmppW2oli/nc_hierarchy.gpkg' using driver `GPKG'
#> Simple feature collection with 100 features and 1 field
#> Geometry type: POLYGON
#> Dimension: XY
#> Bounding box: xmin: 1054155 ymin: 1341756 xmax: 1838923 ymax: 1690176
#> Projected CRS: NAD83 / Conus Albers
nc_tracts <- sf::st_read(nc_data, layer = "tracts")
#> Reading layer `tracts' from data source `/tmp/RtmpBBOR7A/nc_hierarchy.gpkg' using driver `GPKG'
#> Reading layer `tracts' from data source `/tmp/RtmppW2oli/nc_hierarchy.gpkg' using driver `GPKG'
#> Simple feature collection with 2672 features and 1 field
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
Expand Down Expand Up @@ -484,7 +489,7 @@ system.time(
)
)
#> user system elapsed
#> 0.710 0.000 0.679
#> 0.696 0.000 0.669

# hierarchical parallelization
system.time(
Expand Down Expand Up @@ -803,7 +808,7 @@ system.time(
#> Input is a character. Attempt to read it with terra::rast...
#> ℹ Your input function at 37047 is dispatched.
#> user system elapsed
#> 1.449 0.178 7.734
#> 1.515 0.225 8.100
```

### `par_multirasters()`: parallelize over multiple rasters
Expand All @@ -830,9 +835,9 @@ terra::writeRaster(ncelev, file.path(tdir, "test5.tif"), overwrite = TRUE)
# check if the raster files were exported as expected
testfiles <- list.files(tdir, pattern = "*.tif$", full.names = TRUE)
testfiles
#> [1] "/tmp/RtmpBBOR7A/nc_srtm15_otm.tif" "/tmp/RtmpBBOR7A/test1.tif"
#> [3] "/tmp/RtmpBBOR7A/test2.tif" "/tmp/RtmpBBOR7A/test3.tif"
#> [5] "/tmp/RtmpBBOR7A/test4.tif" "/tmp/RtmpBBOR7A/test5.tif"
#> [1] "/tmp/RtmppW2oli/nc_srtm15_otm.tif" "/tmp/RtmppW2oli/test1.tif"
#> [3] "/tmp/RtmppW2oli/test2.tif" "/tmp/RtmppW2oli/test3.tif"
#> [5] "/tmp/RtmppW2oli/test4.tif" "/tmp/RtmppW2oli/test5.tif"
```

``` r
Expand All @@ -849,35 +854,35 @@ system.time(
)
#> ℹ Input is not a character.
#> Input is a character. Attempt to read it with terra::rast...
#> ℹ Your input function at /tmp/RtmpBBOR7A/nc_srtm15_otm.tif is dispatched.
#> ℹ Your input function at /tmp/RtmppW2oli/nc_srtm15_otm.tif is dispatched.
#>
#> Input is a character. Attempt to read it with terra::rast...
#> ℹ Your input function at /tmp/RtmpBBOR7A/test1.tif is dispatched.
#> ℹ Your input function at /tmp/RtmppW2oli/test1.tif is dispatched.
#>
#> Input is a character. Attempt to read it with terra::rast...
#> ℹ Your input function at /tmp/RtmpBBOR7A/test2.tif is dispatched.
#> ℹ Your input function at /tmp/RtmppW2oli/test2.tif is dispatched.
#>
#> Input is a character. Attempt to read it with terra::rast...
#> ℹ Your input function at /tmp/RtmpBBOR7A/test3.tif is dispatched.
#> ℹ Your input function at /tmp/RtmppW2oli/test3.tif is dispatched.
#>
#> Input is a character. Attempt to read it with terra::rast...
#> ℹ Your input function at /tmp/RtmpBBOR7A/test4.tif is dispatched.
#> ℹ Your input function at /tmp/RtmppW2oli/test4.tif is dispatched.
#>
#> Input is a character. Attempt to read it with terra::rast...
#> ℹ Your input function at /tmp/RtmpBBOR7A/test5.tif is dispatched.
#> ℹ Your input function at /tmp/RtmppW2oli/test5.tif is dispatched.
#> user system elapsed
#> 1.418 0.107 3.952
#> 1.362 0.190 2.375
knitr::kable(head(res))
```

| mean | base_raster |
|----------:|:----------------------------------|
| 136.80203 | /tmp/RtmpBBOR7A/nc_srtm15_otm.tif |
| 189.76170 | /tmp/RtmpBBOR7A/nc_srtm15_otm.tif |
| 231.16968 | /tmp/RtmpBBOR7A/nc_srtm15_otm.tif |
| 98.03845 | /tmp/RtmpBBOR7A/nc_srtm15_otm.tif |
| 41.23463 | /tmp/RtmpBBOR7A/nc_srtm15_otm.tif |
| 270.96933 | /tmp/RtmpBBOR7A/nc_srtm15_otm.tif |
| 136.80203 | /tmp/RtmppW2oli/nc_srtm15_otm.tif |
| 189.76170 | /tmp/RtmppW2oli/nc_srtm15_otm.tif |
| 231.16968 | /tmp/RtmppW2oli/nc_srtm15_otm.tif |
| 98.03845 | /tmp/RtmppW2oli/nc_srtm15_otm.tif |
| 41.23463 | /tmp/RtmppW2oli/nc_srtm15_otm.tif |
| 270.96933 | /tmp/RtmppW2oli/nc_srtm15_otm.tif |

``` r

Expand Down Expand Up @@ -920,7 +925,7 @@ pnts <- sf::st_as_sf(pnts)
pnts$pid <- sprintf("RPID-%04d", seq(1, 5000))
rd1 <- sf::st_read(ncrd1_path)
#> Reading layer `ncroads_first' from data source
#> `/tmp/RtmpBBOR7A/ncroads_first.gpkg' using driver `GPKG'
#> `/tmp/RtmppW2oli/ncroads_first.gpkg' using driver `GPKG'
#> Simple feature collection with 620 features and 4 fields
#> Geometry type: MULTILINESTRING
#> Dimension: XY
Expand Down Expand Up @@ -972,11 +977,11 @@ system.time(
restr <- terra::nearest(x = terra::vect(pntst), y = terra::vect(rd1t))
)
#> user system elapsed
#> 0.471 0.000 0.449
#> 0.513 0.003 0.475

pnt_path <- file.path(tdir, "pntst.gpkg")
sf::st_write(pntst, pnt_path)
#> Writing layer `pntst' to data source `/tmp/RtmpBBOR7A/pntst.gpkg' using driver `GPKG'
#> Writing layer `pntst' to data source `/tmp/RtmppW2oli/pntst.gpkg' using driver `GPKG'
#> Writing 5000 features with 1 fields and geometry type Point.

# we use four threads that were configured above
Expand Down Expand Up @@ -1022,7 +1027,7 @@ system.time(
#> ℹ Input is a character. Trying to read with terra .
#> ℹ Task at CGRIDID: 8 is successfully dispatched.
#> user system elapsed
#> 0.077 0.001 0.384
#> 0.077 0.001 0.450
```

- We will compare the results from the single-thread and multi-thread
Expand All @@ -1043,6 +1048,23 @@ interstates then homes in such grids will not get any distance to the
nearest interstate. Such problems can be avoided by choosing `nx`, `ny`,
and `padding` values in `par_pad_grid()` meticulously.

## Map function arguments for `chopin` parallelization

If users’ custom function with two main raster/vector arguments whose
names are not `x` or `y`, users can use `par_convert_f` to map two
arguments to `x` and `y`.

``` r
sf_intersection_buffer <- function(a, b, buffer_dist = 100) {
intersected <- st_intersection(a, b)
buffered <- st_buffer(intersected, dist = buffer_dist)
return(buffered)
}

# Convert the function to remap arguments dynamically
custom_intersect_buffer <- par_convert_f(sf_intersection_buffer, x = "a", y = "b", dist = "buffer_dist")
```

## Caveats

### Why parallelization is slower than the ordinary function run?
Expand Down
4 changes: 2 additions & 2 deletions codemeta.json
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
"codeRepository": "https://github.com/ropensci/chopin",
"issueTracker": "https://github.com/ropensci/chopin/issues",
"license": "https://spdx.org/licenses/MIT",
"version": "0.9.1",
"version": "0.9.2",
"programmingLanguage": {
"@type": "ComputerLanguage",
"name": "R",
Expand Down Expand Up @@ -370,7 +370,7 @@
},
"SystemRequirements": "netcdf"
},
"fileSize": "2440.32KB",
"fileSize": "2440.709KB",
"releaseNotes": "https://github.com/ropensci/chopin/blob/master/NEWS.md",
"readme": "https://github.com/ropensci/chopin/blob/main/README.md",
"contIntegration": "https://github.com/ropensci/chopin/actions/workflows/check-standard.yaml",
Expand Down
Loading

0 comments on commit 7725d31

Please sign in to comment.