Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: timezone conversion and assertion #170

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

atsyplenkov
Copy link
Contributor

Hey @etiennebacher 👋

I decided to start by adding more support to lubridate package. I can see that it is a bit less structured than the other code in your package, so I am unsure how to best write it. If you are happy with the following approach, I will continue filling tidypolars with lubridate functions.

P.S. Note that I have added the check_timezone function, which throws an error earlier if the timezone code is not supported by polars. TIL that some polars functions allow a NULL timezone, while others do not.

@etiennebacher
Copy link
Owner

Thanks, just so you know I won't be able to review this for a few days

@etiennebacher
Copy link
Owner

etiennebacher commented Feb 1, 2025

Hi @atsyplenkov, the PR looks good but I'd like the testing to be a bit more thorough. I have been adding more property-based testing to tidypolars, including for lubridate functions. Those can be found in test-funs-date.R while regular unit tests are in test-funs_date.R (I know, those names could be made more explicit).

If you're not familiar with property-based testing in R, I have a blog post on this: https://www.etiennebacher.com/posts/2024-10-01-using-property-testing-in-r/

For example, for force_tz(), here's what it looks like (combined with parametric testing using patrick):

patrick::with_parameters_test_that(
  "changing timezone works", {
    for_all(
      tests = 20,
      datetime = posixct_(any_na = TRUE),
      property = function(datetime) {
        test_df <- data.frame(x1 = ymd_hms(datetime, tz = "UTC"))
        test <- pl$DataFrame(x1 = ymd_hms(datetime, tz = "UTC"))

        expect_equal_or_both_error(
          mutate(
            test,
            x1 = force_tz(x1, tz)
          ),
          mutate(
            test_df,
            x1 = force_tz(x1, tz)
          ),
          tolerance = 1e-6
        )
      }
    )
  },
  tz = list("Pacific/Auckland", "foobar", "", NA, c("Pacific/Auckland", "Africa/Abidjan"))
)

Running this may already uncover some bugs in the current implementation. It's more tedious to write and to pass but when it does then we can be confident that tidypolars and lubridate give the same results for a wide array of cases.

Could you rework the tests to include this type of testing? Don't hesitate to ask if you run into some issues while doing this. Note that if you need to convert some polars expressions to R values for the tzone argument then you can use polars_expr_to_r().

@atsyplenkov
Copy link
Contributor Author

Thanks for the review. I am not familiar with property-based testing, but your blog post looks promising. I'll revise the PR during this week and come back to you

@atsyplenkov
Copy link
Contributor Author

@etiennebacher I cannot yet write the property-based testing yet, because the current implementation of with_tz and force_tz don't work when tzone argument passed externally, i.e.:

library(tidyverse)
devtools::load_all()

TimeZone <- base::OlsonNames()[100]
TimeZone
#> "America/Creston"

test_df <- data.frame(
  dt_utc = c(
    ymd_hms(
      c(
        "2012-03-26 12:00:00",
        "2020-01-01 12:00:00",
        "2023-12-14 12:00:00",
        NA
      ),
      tz = "UTC"
    )
  )
)

as_polars_df(test_df) |>
  mutate(
    dt_utc = with_tz(dt_utc, "America/Creston")
  )

#> shape: (4, 1)
#> ┌───────────────────────────────┐
#> │ dt_utc                        │
#> │ ---                           │
#> │ datetime[ms, America/Creston] │
#> ╞═══════════════════════════════╡
#> │ 2012-03-26 05:00:00 MST       │
#> │ 2020-01-01 05:00:00 MST       │
#> │ 2023-12-14 05:00:00 MST       │
#> │ null                          │
#> └───────────────────────────────┘

as_polars_df(test_df) |>
  mutate(
    dt_utc = with_tz(dt_utc, TimeZone)
  )

#> Error in `mutate()`:
#>   ! Error while running function `with_tz()` in Polars.
#>   ✖ Invalid 'x' type in 'x && y'
#>   Hide Traceback
#>        ▆
#>     1. ├─dplyr::mutate(...)
#>     2. └─tidypolars:::mutate.RPolarsDataFrame(as_polars_df(test_df), dt_utc = with_tz(dt_utc, TimeZone))
#>     3.   └─tidypolars:::translate_dots(...) at tidypolars/R/mutate.R:117:3
#>     4.     └─base::lapply(...) at tidypolars/R/utils-expr.R:36:3
#>     5.       └─tidypolars (local) FUN(X[[i]], ...)
#>     6.         └─tidypolars:::translate_expr(...) at tidypolars/R/utils-expr.R:46:5
#>     7.           └─tidypolars:::translate(...) at tidypolars/R/utils-expr.R:145:5
#>     8.             └─base::tryCatch(...) at tidypolars/R/utils-expr.R:622:7
#>     9.               └─base (local) tryCatchList(expr, classes, parentenv, handlers)
#>    10.                 └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
#>    11.                   └─value[[3L]](cond)
#>    12.                     └─rlang::abort(...) at tidypolars/R/utils-expr.R:638:13

I cannot wrap my head around on how to solve this issue.

@etiennebacher
Copy link
Owner

etiennebacher commented Feb 7, 2025

The issue is that when we handle tz in check_timezone(), it has already been converted to a polars expression, e.g. pl$lit(tz). Therefore we cannot use length(tz) for instance. The solution is to convert this back to an R expression using polars_expr_to_r(). This is a bit unfortunate but I don't have better solution for now because some function arguments need to be translated (e.g. x in mean(x, na.rm = TRUE)) but others don't (e.g. na.rm in mean(x, na.rm = TRUE)), and I don't have a way of knowing this in advance.

I also just realized that lubridate allows several timezones in a single column, for instance by passing tz = c("Europe/London", "Pacific/Auckland"), but Polars doesn't. I have added a message for this, and it should be mentioned in the table notes.

This now works:

patrick::with_parameters_test_that(
  "changing timezone works", {
    for_all(
      tests = 20,
      datetime = posixct_(any_na = TRUE),
      property = function(datetime) {
        test_df <- data.frame(x1 = ymd_hms(datetime, tz = "UTC"))
        test <- pl$DataFrame(x1 = ymd_hms(datetime, tz = "UTC"))

        expect_equal_or_both_error(
          mutate(
            test,
            x1 = force_tz(x1, tz)
          ),
          mutate(
            test_df,
            x1 = force_tz(x1, tz)
          ),
          tolerance = 1e-6
        )
      }
    )
  },
  tz = list("Pacific/Auckland", "foobar", "", NA)
)

Can you also:

  • add tests for the various error messages in check_timezone()
  • add tests for with_tz()
  • add those two functions in the table of translated functions in the vignette linked above, mentioning the caveat regarding multiple timezones
  • add an item in NEWS

?

@atsyplenkov
Copy link
Contributor Author

Thanks! I'll do it.

@etiennebacher
Copy link
Owner

etiennebacher commented Feb 19, 2025

@atsyplenkov just FYI I plan on doing a new release before mid-March so let me know if you need some help to finish this PR before that.

@atsyplenkov
Copy link
Contributor Author

@etiennebacher Hey, sorry for taking so long. I have a very tight deadline at work and couldn't allocate enough time to finish this PR. I should have some time after the 25th of February, and I can submit my revisions then.

@etiennebacher
Copy link
Owner

No worries, I'm not in a rush at all, I just wanted to inform you in case those functions are important in some of your project . Take your time, good luck with work 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants