Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZonedDateTime can parse its own string representation #483

Merged
merged 16 commits into from
Feb 3, 2025

Conversation

GHTaarn
Copy link
Contributor

@GHTaarn GHTaarn commented Jan 25, 2025

This PR solves issue #470 in a way that formally preserves backwards compatibility of TimeZones.
The first commit is what solves the issue. The second commit is optional, but given that Dates.default_format(::Type{ZonedDateTime}) is no longer used in the TimeZones package, not public and not documented anywhere, I feel that it would be most correct to delete it as this would still formally preserve backwards compatibility.


Copy link

codecov bot commented Jan 25, 2025

Codecov Report

Attention: Patch coverage is 94.44444% with 1 line in your changes missing coverage. Please review.

Project coverage is 92.72%. Comparing base (2bc8f50) to head (4f6add7).
Report is 21 commits behind head on master.

Files with missing lines Patch % Lines
src/parse.jl 90.90% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #483      +/-   ##
==========================================
- Coverage   92.79%   92.72%   -0.07%     
==========================================
  Files          39       39              
  Lines        1818     1843      +25     
==========================================
+ Hits         1687     1709      +22     
- Misses        131      134       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@omus
Copy link
Member

omus commented Jan 27, 2025

I took a quick look and I think this is a good approach to move things in the right direction. I'll do a full review this week.

@omus omus self-requested a review January 27, 2025 20:42
@iamed2
Copy link
Member

iamed2 commented Jan 30, 2025

Is it possible to have a solution that removes the need for this (allow parsing variable-length fractional settings like parsing DateTime does)?

Also, keep in mind years that are not four digits long will change the length of the input.

@GHTaarn
Copy link
Contributor Author

GHTaarn commented Jan 31, 2025

Thank you for your feedback @iamed2 , very good point about non 4 digit years, I have fixed this so that this aspect of the ISO standard is now supported. I felt that the code looked nicer if I used tryparse. The 3 argument version that I used is defined in Dates and is undocumented, I hope that this is okay since TimeZones already uses other undocumented features of the Dates package. I am considering submitting a PR suggesting some documentation for this method.

I cannot see an easy solution to allow the fractional seconds to have variable length like in Dates.DateTime and Dates.Time, the best solutions that I can see would be either for Julia to enhance Dates.DateFormat to be able to handle this or for TimeZones to use a custom format type for ZonedDateTime.

@omus
Copy link
Member

omus commented Jan 31, 2025

After really getting into the code I noticed a couple of things which I went ahead and changed here:

  1. The parse function is intended to be the primary way for parsing a string into a ZonedDateTime. I've done some refactoring to reflect that.
  2. The default format now enforces a literal T is found. This is mainly precautionary in case another package registers T as a character to parse. This also reflects what is used for Dates.ISODateTimeFormat.
  3. I've updated the heuristic determining which format to use to be based upon if a period can be found in the string. That should be faster than parsing the string twice.
  4. I've added more exhaustive tests here to ensure the underlying Dates parsing machinery doesn't silently break this functionality.

end

Dates.default_format(::Type{ZonedDateTime}) = ISOZonedDateTimeFormat
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you said this isn't documented anywhere so it's technically not public. I think it's safest to deprecate this in the off chance anyone is using this.

@GHTaarn
Copy link
Contributor Author

GHTaarn commented Jan 31, 2025

@omus : It is late now, I will have a look a your changes tomorrow

@omus
Copy link
Member

omus commented Jan 31, 2025

@omus : It is late now, I will have a look a your changes tomorrow

Sounds good. Feel free to review next week instead if you prefer. I plan on merging this on Monday and making a new release.

@omus
Copy link
Member

omus commented Jan 31, 2025

Is it possible to have a solution that removes the need for this (allow parsing variable-length fractional settings like parsing DateTime does)?

IIRC the variable-length fractional parsing used by DateTime requires that it the variable portion occurs at the end of the string. We would need to make modifications to Dates.jl for that or roll our own solution here. I'm not opposed to doing that but it's a larger lift that what we have here.

@omus omus changed the title Enable ZonedDateTime to parse its own string representation ZonedDateTime can parse its own string representation Jan 31, 2025
@GHTaarn
Copy link
Contributor Author

GHTaarn commented Feb 2, 2025

1. The `parse` function is intended to be the primary way for parsing a string into a `ZonedDateTime`. I've done some refactoring to reflect that.

2. The default format now enforces a literal `T` is found. This is mainly precautionary in case another package registers `T` as a character to parse. This also reflects what is used for `Dates.ISODateTimeFormat`.

I am perfectly fine with these two.

3. I've updated the heuristic determining which format to use to be based upon if a period can be found in the string. That should be faster than parsing the string twice.

I feel that this change makes the code more vulnerable to changes in ISOZonedDateTimeFormat, e.g. if the ISO standard were to change in the future and allow '.' characters in other places like allowing milliseconds in the offset. I ran some benchmarks and did not see any performance improvement from this change. I also searched the Julia documentation and found the 3 argument tryparse mentioned here, so we should be able to rely on it being present in future versions of Julia. For these reasons, I prefer the tryparse based solution that I proposed.

4. I've added more exhaustive tests here to ensure the underlying Dates parsing machinery doesn't silently break this functionality.

Yes, this is certainly an improvement. I felt that it would be nice to have a year where the ISO standard really applies, a millisecond value ending with a 0 and at least one time zone that is not UTC, so I added these to the testset.

Finally, I added some docstrings for the ZonedDateTime parse constructors and an extra TimeZones specific docstring for Dates.DateFormat. The latter is of course not really in the scope of this PR and I don't know if this is an acceptable way to do this, I just felt that it would be nice to have this info in a docstring.

Your decision to deprecate rather than remove Dates.default_format is also fine with me.

@omus
Copy link
Member

omus commented Feb 3, 2025

3. I've updated the heuristic determining which format to use to be based upon if a period can be found in the string. That should be faster than parsing the string twice.

I feel that this change makes the code more vulnerable to changes in ISOZonedDateTimeFormat, e.g. if the ISO standard were to change in the future and allow '.' characters in other places like allowing milliseconds in the offset. I ran some benchmarks and did not see any performance improvement from this change. I also searched the Julia documentation and found the 3 argument tryparse mentioned here, so we should be able to rely on it being present in future versions of Julia. For these reasons, I prefer the tryparse based solution that I proposed.

There is a noticeable performance difference although it's not huge:

julia> using TimeZones, BenchmarkTools

julia> using TimeZones: ISOZonedDateTimeFormat, ISOZonedDateTimeNoMillisecondFormat

julia> function p1(str)
           return if contains(str, '.')
               parse(ZonedDateTime, str, ISOZonedDateTimeFormat)
           else
               parse(ZonedDateTime, str, ISOZonedDateTimeNoMillisecondFormat)
           end
       end
p1 (generic function with 1 method)

julia> function p2(str)
           res = tryparse(ZonedDateTime, str, ISOZonedDateTimeFormat)
           return res !== nothing ? res : parse(ZonedDateTime, str, ISOZonedDateTimeNoMillisecondFormat)
       end
p2 (generic function with 1 method)

julia> @btime p1("2000-01-02T03:04:05-07:00")
  938.524 ns (26 allocations: 1.07 KiB)
2000-01-02T03:04:05-07:00

julia> @btime p1("2000-01-02T03:04:05-07:00")
  932.524 ns (26 allocations: 1.07 KiB)
2000-01-02T03:04:05-07:00

julia> @btime p2("2000-01-02T03:04:05-07:00")
  966.650 ns (26 allocations: 1.07 KiB)
2000-01-02T03:04:05-07:00

julia> @btime p1.(fill("2000-01-02T03:04:05-07:00",1000));
  925.000 μs (26006 allocations: 1.10 MiB)

julia> @btime p2.(fill("2000-01-02T03:04:05-07:00",1000));
  958.959 μs (26006 allocations: 1.10 MiB)

You are correct that the tryparse approach is more adaptable to changes to ISOZonedDateTimeFormat but as the original format is based off of an ISO standard we are fairly safe to say it's not going to substantially change. One aspect of the tryparse approach that concerns me is that if we did change the format constant and it would start failing by using tryparse we wouldn't see any error message when failing to parse the first format which could lead to confusion for end users.

No matter the approach we take to parsing this dual format the test suite that is in place should ensure reliable parsing so changes to the ISOZonedDateTimeFormat which break parsing or the . heuristic will not go unnoticed.

src/types/zoneddatetime.jl Outdated Show resolved Hide resolved
src/parse.jl Outdated Show resolved Hide resolved
@GHTaarn
Copy link
Contributor Author

GHTaarn commented Feb 3, 2025

@omus Regarding Base.parse(::Type{ZonedDateTime}, ::AbstractString): Let us keep your version with contains.

@omus
Copy link
Member

omus commented Feb 3, 2025

Will merge after CI completes

@omus omus merged commit b333ce1 into JuliaTime:master Feb 3, 2025
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants