Tolerance applies in a surprising way to large integers #138

hadley · 2022-03-12T17:41:18Z

library(waldo)

dt <- as.POSIXct("2016-07-18 16:06:00", tz = "UTC")
compare(dt, dt + 5)
#> `old`: "2016-07-18 16:06:00"
#> `new`: "2016-07-18 16:06:05"
compare(1468857960, 1468857965)
#> `old`: 1468857960
#> `new`: 1468857965

(tol <- testthat::testthat_tolerance())
#> [1] 1.490116e-08
compare(dt, dt + 5, tolerance = tol)
#> ✓ No differences
compare(1468857960, 1468857965, tolerance = tol)
#> ✓ No differences

^{Created on 2022-03-12 by the reprex package (v2.0.1)}

Compared with all.equal() which has different behaviour for POSIXct:

library(waldo)

dt <- as.POSIXct("2016-07-18 16:06:00", tz = "UTC")
all.equal(dt, dt + 5)
#> [1] "Mean absolute difference: 5"
all.equal(1468857960, 1468857965)
#> [1] TRUE

^{Created on 2022-03-12 by the reprex package (v2.0.1)}

I don't think the behaviour is correct in either case: the purpose of tolerance is to ignore minor differences that arise due to floating point error, not to ignore differences in integers that can be exactly represented in floating point.

cc @MichaelChirico

The text was updated successfully, but these errors were encountered:

hadley · 2022-03-12T18:01:50Z

I'm not sure how to resolve this; the all.equal() behaviour doesn't seem correct to me, but I don't know how to derive the correct behaviour.

MichaelChirico · 2022-03-13T03:24:58Z

IMO the all.equal() behavior is correct. c.f.

dt1 <- as.POSIXct("2016-07-18 16:06:00", tz = "UTC")
dt2 <- as.POSIXct("1970-01-01 00:00:00", tz = "UTC")
all.equal(dt1, dt1 + 5)
# [1] "Mean absolute difference: 5"
all.equal(as.numeric(dt1), as.numeric(dt1 + 5))
# [1] TRUE
all.equal(dt2, dt2 + 5)
# [1] "Mean absolute difference: 5"
all.equal(as.numeric(dt2), as.numeric(dt2 + 5))
# [1] "Mean absolute difference: 5"

Time-since-UTC-epoch is an implementation detail to which we shouldn't over-anchor ourselves. Any timestamps that are 5 seconds apart should have the same output of compare().

hadley · 2022-03-13T21:16:22Z

I agree that any of timestamps that are (say) >0.01s different should compare as equal. But I'd think that same principle should also apply to integers? (regardless of whether they're integers or doubles under the hood). Why should the tolerance for only POSIXct always be absolute?

MichaelChirico · 2022-03-14T07:12:26Z

Because time is "interval data" which for which 0 is arbitrary; temperature should similarly get its own all.equal() method, if there's, say, a c("fahrenheit", "temperature") and c("celsius", "temperature") class.

(struggling to find something authoritative-looking about this, so not sure the term is universal, but here's one source)

For unclassed numerics, it's a significant figures issue, right?

hadley · 2022-03-14T14:31:38Z

I don't think it's about significant digits as much as floating point differences like sqrt(2)^2 == 2. And the similar issue when LAPACK or other low-level linear algebra functions might return very slightly different results on different platforms due to slight implementation differences.

My intuition (which might be wrong) is that the tolerance should really only affect very small numbers and very very large numbers (i.e. where the exponent is very large and the gap between representable floating point numbers is larger), but numbers in the middle shouldn't be affected. Maybe I'm expecting the tolerance to be applied to the mantissa? (assuming the exponents must be equal)

hadley · 2022-03-14T14:36:35Z

I think I'm maybe thinking of ULP based comparison? https://bitbashing.io/comparing-floats.html has a good discussion.

https://codingnest.com/the-little-things-comparing-floating-point-numbers/ is also useful because it describes what Catch2 (a C++ testing framework does)

hadley · 2022-03-14T14:49:59Z

I think overall this implies that waldo's current approach to tolerance is overly simplistic and will need to be redesigned (in conjunction with testthat) to allow difference classes to use different values. That will require some thought so I think I'll at least get the current release out before coming back to think about this.

MichaelChirico · 2022-03-15T00:13:57Z

That makes more sense to me, thanks for elaborating & thanks for the links.

One lingering thing that gives me pause is a potential "idempotency" issue?

I'm thinking of large (x, y) gets generated by code as floats (& are subject to floating point imprecision), but then gets serialized to CSV and now look like integers to CSV readers. Should waldo::compare(x, y) be stable to this serialization?

(I'm not 100% sure it's a relevant concern -- it could be that the imprecision happens only at the sub-integer level for numbers in integer range)

hadley added the feature a feature request or enhancement label Mar 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tolerance applies in a surprising way to large integers #138

Tolerance applies in a surprising way to large integers #138

hadley commented Mar 12, 2022

hadley commented Mar 12, 2022

MichaelChirico commented Mar 13, 2022 •

edited

Loading

hadley commented Mar 13, 2022

MichaelChirico commented Mar 14, 2022

hadley commented Mar 14, 2022 •

edited

Loading

hadley commented Mar 14, 2022

hadley commented Mar 14, 2022

MichaelChirico commented Mar 15, 2022

Tolerance applies in a surprising way to large integers #138

Tolerance applies in a surprising way to large integers #138

Comments

hadley commented Mar 12, 2022

hadley commented Mar 12, 2022

MichaelChirico commented Mar 13, 2022 • edited Loading

hadley commented Mar 13, 2022

MichaelChirico commented Mar 14, 2022

hadley commented Mar 14, 2022 • edited Loading

hadley commented Mar 14, 2022

hadley commented Mar 14, 2022

MichaelChirico commented Mar 15, 2022

MichaelChirico commented Mar 13, 2022 •

edited

Loading

hadley commented Mar 14, 2022 •

edited

Loading