Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

recover after missing/extra element in sequence #187

Closed
wants to merge 1 commit into from
Closed

recover after missing/extra element in sequence #187

wants to merge 1 commit into from

Conversation

philomates
Copy link
Collaborator

addresses #177

before:

expected: (match? [1 2 3 4] [1 3 4])
  actual: [1 (mismatch (expected 2) (actual 3)) (mismatch (expected 3) (actual 4)) (missing 4)]

after:

expected: (match? [1 2 3 4] [1 3 4])
actual: [1 (expected 2) 3 4]

before:

```
expected: (match? [1 2 3 4] [1 3 4])
  actual: [1 (mismatch (expected 2) (actual 3)) (mismatch (expected 3) (actual 4)) (missing 4)]
```

after:

```
expected: (match? [1 2 3 4] [1 3 4])
actual: [1 (expected 2) 3 4]
```
Comment on lines +305 to +309
(map vector
(repeat matchers)
(insert-all-combinations actuals
(repeat (- matchers-count (count actuals))
::missing)))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might make computing mismatches for sequences a slower.
One way to deal with it would be to create a dedicated matcher that enables this logic. I don't like this because it creates more options for users and this new mismatch behavior is probably something people want by default.

Another way to limit how much it could potentially slow things down would be to cap how many combinations are considered. Something like a (take 30 ..) around the (insert-all-combinations ..) thing here. (probably bound to a dynamic var for configuration)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thoughts on this @dchelimsky and also @aredington, given your work on #166 ?

@lread
Copy link
Contributor

lread commented Dec 7, 2022

Super cool @philomates!

I've played with this PR in a scratch project to get a feel for it when used in tests.
I tried extra/missing leading/trailing, and nested. It all seems to hang together very well (with one minor comment to follow).

The terminology of missing and unexpected work very nicely and clearly conveys what is awry.

Here's my little playground:

(ns mc-recover.mc-recover-test
  (:require [clojure.test :as t :refer [deftest is testing]]
            [matcher-combinators.test]))

(deftest mc-recover
  ;; playing around with matcher-combinators seq recovery
  ;; very cool!
  ;; all assertions will fail.
  (is (match? [1 2] [1 2 3 4]) "extra trailing")
  (is (match? [1 2] [9 0 1 2]) "extra leading")
  (is (match? [1 2] [1 98 99 2]) "extra internal")
  (is (match? [1 2] [9 8 1 98 99 2 3 4]) "extra leading, internal, trailing")
  (is (match? [] [1 2 3]) "all extra")

  (is (match? [1 2 3] [1 2]) "missing trailing")
  (is (match? [1 2 3] [2 3]) "mising leading")
  (is (match? [1 99 2] [1 2]) "missing internal")
  (is (match? [0 1 2 99 98 97 3 4 5] [2 99 3]) "missing leading, internal, trailing")
  (is (match? [1 2 3] []) "all missing")

  (is (match? [1 82 3 4 2 7 9 10] [1 2 10 11]) "recover multiple mismatches")

  (is (match? [1 [2 3 4] 5] [1 [2 3] 5]) "nested mismatch")

  (is (match? [1 2 3] (list 1 3)) "mismatch on different seq types")

  (is (match? [:x :y [:a :b :c] :z] [:x [:a :b :c]]) "recovery works on nested element")

  (is (match? [#"some.*" :y 3 #"other.*" 5] ["somestring" :y 4 "nope" 5]) "other matchers still work")

  (is (match? {:a {:b [3 4 5]}} {:a {:b [3 4 5 6]}}) "nested mismatch in a map"))

And the results it produced with the cognitect test runner:

$ clojure -M:test
Running tests in #{"test"}

Testing mc-recover.mc-recover-test

FAIL in (mc-recover) (mc_recover_test.clj:9)
extra trailing
expected: (match? [1 2] [1 2 3 4])
  actual: [1 2 (unexpected 3) (unexpected 4)]


FAIL in (mc-recover) (mc_recover_test.clj:10)
extra leading
expected: (match? [1 2] [9 0 1 2])
  actual: [(unexpected 9) (unexpected 0) 1 2]


FAIL in (mc-recover) (mc_recover_test.clj:11)
extra internal
expected: (match? [1 2] [1 98 99 2])
  actual: [1 (unexpected 98) (unexpected 99) 2]


FAIL in (mc-recover) (mc_recover_test.clj:12)
extra leading, internal, trailing
expected: (match? [1 2] [9 8 1 98 99 2 3 4])
  actual: [(unexpected 9)
 (unexpected 8)
 1
 (unexpected 98)
 (unexpected 99)
 2
 (unexpected 3)
 (unexpected 4)]


FAIL in (mc-recover) (mc_recover_test.clj:13)
all extra
expected: (match? [] [1 2 3])
  actual: [(unexpected 1)
 (unexpected 2)
 (unexpected 3)]


FAIL in (mc-recover) (mc_recover_test.clj:15)
missing trailing
expected: (match? [1 2 3] [1 2])
  actual: [1 2 (missing 3)]


FAIL in (mc-recover) (mc_recover_test.clj:16)
mising leading
expected: (match? [1 2 3] [2 3])
  actual: [(missing 1) 2 3]


FAIL in (mc-recover) (mc_recover_test.clj:17)
missing internal
expected: (match? [1 99 2] [1 2])
  actual: [1 (missing 99) 2]


FAIL in (mc-recover) (mc_recover_test.clj:18)
missing leading, internal, trailing
expected: (match? [0 1 2 99 98 97 3 4 5] [2 99 3])
  actual: [(missing 0)
 (missing 1)
 2
 99
 (missing 98)
 (missing 97)
 3
 (missing 4)
 (missing 5)]


FAIL in (mc-recover) (mc_recover_test.clj:19)
all missing
expected: (match? [1 2 3] [])
  actual: [(missing 1) (missing 2) (missing 3)]


FAIL in (mc-recover) (mc_recover_test.clj:21)
recover multiple mismatches
expected: (match? [1 82 3 4 2 7 9 10] [1 2 10 11])
  actual: [1
 (missing 82)
 (missing 3)
 (missing 4)
 2
 (missing 7)
 (mismatch (expected 9) (actual 10))
 (mismatch (expected 10) (actual 11))]


FAIL in (mc-recover) (mc_recover_test.clj:23)
nested mismatch
expected: (match? [1 [2 3 4] 5] [1 [2 3] 5])
  actual: [1 [2 3 (missing 4)] 5]


FAIL in (mc-recover) (mc_recover_test.clj:25)
mismatch on different seq types
expected: (match? [1 2 3] (list 1 3))
  actual: (1 (missing 2) 3)


FAIL in (mc-recover) (mc_recover_test.clj:27)
recovery works on nested element
expected: (match? [:x :y [:a :b :c] :z] [:x [:a :b :c]])
  actual: [:x (missing :y) [:a :b :c] (missing :z)]


FAIL in (mc-recover) (mc_recover_test.clj:29)
other matchers still work
expected: (match? [#"some.*" :y 3 #"other.*" 5] ["somestring" :y 4 "nope" 5])
  actual: ["somestring"
 :y
 (mismatch (expected 3) (actual 4))
 (mismatch (expected #"other.*") (actual "nope"))
 5]


FAIL in (mc-recover) (mc_recover_test.clj:31)
nested mismatch in a map
expected: (match? {:a {:b [3 4 5]}} {:a {:b [3 4 5 6]}})
  actual: {:a {:b [3 4 5 (unexpected 6)]}}


Ran 1 tests containing 16 assertions.
16 failures, 0 errors.

The only case that surprised me slightly was this one:

FAIL in (mc-recover) (mc_recover_test.clj:21)
recover multiple mismatches
expected: (match? [1 82 3 4 2 7 9 10] [1 2 10 11])
  actual: [1
 (missing 82)
 (missing 3)
 (missing 4)
 2
 (missing 7)
 (mismatch (expected 9) (actual 10))
 (mismatch (expected 10) (actual 11))]

I was thinking that maybe we'd get a missing for 9, a match for 10, and an unexpected for 11.
But maybe this is not practical? Dunno. What you've come up with is already a vast improvement over the existing diffs for sequences.

The formatting of actual, when actual is multiple lines is, I'm pretty sure, a pre-existing and separate issue.

@lread
Copy link
Contributor

lread commented Dec 7, 2022

For comparison, that last example when diffed with deep-diff2:

❯ clj -Sdeps '{:deps {lambdaisland/deep-diff2 {:mvn/version "2.7.169"}}}'

Clojure 1.11.1
user=> (require '[lambdaisland.deep-diff2 :as ddiff])
nil
user=> (ddiff/pretty-print (ddiff/diff [1 82 3 4 2 7 9 10] [1 2 10 11]))
[1 -82 -3 -4 2 -7 -9 10 +11]
nil

Here's a screenshot for the colouring:
image

It seems to behave the way I would have expected, maybe we can do the same after all.

@philomates
Copy link
Collaborator Author

Thanks for the comprehensive testing @lread

for

(is (match? [1 82 3 4 2 7 9 10] [1 2 10 11]) "recover multiple mismatches")

the current behavior gives 6 mismatches and 2 matches, for a total length of 8 elements

[...
(missing 7)
(mismatch (expected 9) (actual 10))
(mismatch (expected 10) (actual 11))]

the deepdiff version gives

[... 
(missing 7)
(missing 9)
10
(unexpected 11)]

which has 3 matches and 6 mismatches, for a total length of 9 elements.
So it sort of depends on how you want to weigh matches vs mismatches vs total sequence size. I don't think it would be too difficult to adapt the behavior to be like deepdiff but I'm not sure yet if it is necessarily the right thing

@lread
Copy link
Contributor

lread commented Dec 12, 2022

Yeah, good point, there there is no absolute right or wrong.

I prefer a diff which makes some attempt to re-sync after a mismatch.

But that might be my subjective preference.

It could be interesting to learn more about deep-diff2's rationale and intent with regards to re-sync. Deep-diff2 uses clj-diff, which describes its diffing algorithm use here.

I don't know much about diff algorithms, but I think an "optimal" one produces the smallest number of diffs, which seems related.

@dchelimsky
Copy link
Contributor

I think that (missing 9) and (unexpected 11) is easier to grok. @philomates you said

I don't think it would be too difficult to adapt the behavior to be like deepdiff

We could

  • leave this as/is
  • wait for you to address that in this PR
  • merge this as/is and address that in a follow up PR

I'm happy with any of this, as eve without the deepdiff alignment, this is a lot better as/is. I think the best option is the last one, if you're good with that.

@philomates
Copy link
Collaborator Author

I'm fine with merging as is and doing a follow-up to improve the results to better match deepdiff.
any thoughts on the performance considerations? #187 (comment)

@philomates philomates closed this by deleting the head repository May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants