recover after missing/extra element in sequence #187

philomates · 2022-11-29T11:19:50Z

addresses #177

before:

expected: (match? [1 2 3 4] [1 3 4])
  actual: [1 (mismatch (expected 2) (actual 3)) (mismatch (expected 3) (actual 4)) (missing 4)]

after:

expected: (match? [1 2 3 4] [1 3 4])
actual: [1 (expected 2) 3 4]

before: ``` expected: (match? [1 2 3 4] [1 3 4]) actual: [1 (mismatch (expected 2) (actual 3)) (mismatch (expected 3) (actual 4)) (missing 4)] ``` after: ``` expected: (match? [1 2 3 4] [1 3 4]) actual: [1 (expected 2) 3 4] ```

philomates · 2022-11-29T11:23:41Z

src/cljc/matcher_combinators/core.cljc

+          (map vector
+               (repeat matchers)
+               (insert-all-combinations actuals
+                                        (repeat (- matchers-count (count actuals))
+                                                ::missing)))


This might make computing mismatches for sequences a slower.
One way to deal with it would be to create a dedicated matcher that enables this logic. I don't like this because it creates more options for users and this new mismatch behavior is probably something people want by default.

Another way to limit how much it could potentially slow things down would be to cap how many combinations are considered. Something like a (take 30 ..) around the (insert-all-combinations ..) thing here. (probably bound to a dynamic var for configuration)

thoughts on this @dchelimsky and also @aredington, given your work on #166 ?

lread · 2022-12-07T21:04:50Z

Super cool @philomates!

I've played with this PR in a scratch project to get a feel for it when used in tests.
I tried extra/missing leading/trailing, and nested. It all seems to hang together very well (with one minor comment to follow).

The terminology of missing and unexpected work very nicely and clearly conveys what is awry.

Here's my little playground:

(ns mc-recover.mc-recover-test
  (:require [clojure.test :as t :refer [deftest is testing]]
            [matcher-combinators.test]))

(deftest mc-recover
  ;; playing around with matcher-combinators seq recovery
  ;; very cool!
  ;; all assertions will fail.
  (is (match? [1 2] [1 2 3 4]) "extra trailing")
  (is (match? [1 2] [9 0 1 2]) "extra leading")
  (is (match? [1 2] [1 98 99 2]) "extra internal")
  (is (match? [1 2] [9 8 1 98 99 2 3 4]) "extra leading, internal, trailing")
  (is (match? [] [1 2 3]) "all extra")

  (is (match? [1 2 3] [1 2]) "missing trailing")
  (is (match? [1 2 3] [2 3]) "mising leading")
  (is (match? [1 99 2] [1 2]) "missing internal")
  (is (match? [0 1 2 99 98 97 3 4 5] [2 99 3]) "missing leading, internal, trailing")
  (is (match? [1 2 3] []) "all missing")

  (is (match? [1 82 3 4 2 7 9 10] [1 2 10 11]) "recover multiple mismatches")

  (is (match? [1 [2 3 4] 5] [1 [2 3] 5]) "nested mismatch")

  (is (match? [1 2 3] (list 1 3)) "mismatch on different seq types")

  (is (match? [:x :y [:a :b :c] :z] [:x [:a :b :c]]) "recovery works on nested element")

  (is (match? [#"some.*" :y 3 #"other.*" 5] ["somestring" :y 4 "nope" 5]) "other matchers still work")

  (is (match? {:a {:b [3 4 5]}} {:a {:b [3 4 5 6]}}) "nested mismatch in a map"))

And the results it produced with the cognitect test runner:

$ clojure -M:test
Running tests in #{"test"}

Testing mc-recover.mc-recover-test

FAIL in (mc-recover) (mc_recover_test.clj:9)
extra trailing
expected: (match? [1 2] [1 2 3 4])
  actual: [1 2 (unexpected 3) (unexpected 4)]


FAIL in (mc-recover) (mc_recover_test.clj:10)
extra leading
expected: (match? [1 2] [9 0 1 2])
  actual: [(unexpected 9) (unexpected 0) 1 2]


FAIL in (mc-recover) (mc_recover_test.clj:11)
extra internal
expected: (match? [1 2] [1 98 99 2])
  actual: [1 (unexpected 98) (unexpected 99) 2]


FAIL in (mc-recover) (mc_recover_test.clj:12)
extra leading, internal, trailing
expected: (match? [1 2] [9 8 1 98 99 2 3 4])
  actual: [(unexpected 9)
 (unexpected 8)
 1
 (unexpected 98)
 (unexpected 99)
 2
 (unexpected 3)
 (unexpected 4)]


FAIL in (mc-recover) (mc_recover_test.clj:13)
all extra
expected: (match? [] [1 2 3])
  actual: [(unexpected 1)
 (unexpected 2)
 (unexpected 3)]


FAIL in (mc-recover) (mc_recover_test.clj:15)
missing trailing
expected: (match? [1 2 3] [1 2])
  actual: [1 2 (missing 3)]


FAIL in (mc-recover) (mc_recover_test.clj:16)
mising leading
expected: (match? [1 2 3] [2 3])
  actual: [(missing 1) 2 3]


FAIL in (mc-recover) (mc_recover_test.clj:17)
missing internal
expected: (match? [1 99 2] [1 2])
  actual: [1 (missing 99) 2]


FAIL in (mc-recover) (mc_recover_test.clj:18)
missing leading, internal, trailing
expected: (match? [0 1 2 99 98 97 3 4 5] [2 99 3])
  actual: [(missing 0)
 (missing 1)
 2
 99
 (missing 98)
 (missing 97)
 3
 (missing 4)
 (missing 5)]


FAIL in (mc-recover) (mc_recover_test.clj:19)
all missing
expected: (match? [1 2 3] [])
  actual: [(missing 1) (missing 2) (missing 3)]


FAIL in (mc-recover) (mc_recover_test.clj:21)
recover multiple mismatches
expected: (match? [1 82 3 4 2 7 9 10] [1 2 10 11])
  actual: [1
 (missing 82)
 (missing 3)
 (missing 4)
 2
 (missing 7)
 (mismatch (expected 9) (actual 10))
 (mismatch (expected 10) (actual 11))]


FAIL in (mc-recover) (mc_recover_test.clj:23)
nested mismatch
expected: (match? [1 [2 3 4] 5] [1 [2 3] 5])
  actual: [1 [2 3 (missing 4)] 5]


FAIL in (mc-recover) (mc_recover_test.clj:25)
mismatch on different seq types
expected: (match? [1 2 3] (list 1 3))
  actual: (1 (missing 2) 3)


FAIL in (mc-recover) (mc_recover_test.clj:27)
recovery works on nested element
expected: (match? [:x :y [:a :b :c] :z] [:x [:a :b :c]])
  actual: [:x (missing :y) [:a :b :c] (missing :z)]


FAIL in (mc-recover) (mc_recover_test.clj:29)
other matchers still work
expected: (match? [#"some.*" :y 3 #"other.*" 5] ["somestring" :y 4 "nope" 5])
  actual: ["somestring"
 :y
 (mismatch (expected 3) (actual 4))
 (mismatch (expected #"other.*") (actual "nope"))
 5]


FAIL in (mc-recover) (mc_recover_test.clj:31)
nested mismatch in a map
expected: (match? {:a {:b [3 4 5]}} {:a {:b [3 4 5 6]}})
  actual: {:a {:b [3 4 5 (unexpected 6)]}}


Ran 1 tests containing 16 assertions.
16 failures, 0 errors.

The only case that surprised me slightly was this one:

FAIL in (mc-recover) (mc_recover_test.clj:21)
recover multiple mismatches
expected: (match? [1 82 3 4 2 7 9 10] [1 2 10 11])
  actual: [1
 (missing 82)
 (missing 3)
 (missing 4)
 2
 (missing 7)
 (mismatch (expected 9) (actual 10))
 (mismatch (expected 10) (actual 11))]

I was thinking that maybe we'd get a missing for 9, a match for 10, and an unexpected for 11.
But maybe this is not practical? Dunno. What you've come up with is already a vast improvement over the existing diffs for sequences.

The formatting of actual, when actual is multiple lines is, I'm pretty sure, a pre-existing and separate issue.

lread · 2022-12-07T21:36:03Z

For comparison, that last example when diffed with deep-diff2:

❯ clj -Sdeps '{:deps {lambdaisland/deep-diff2 {:mvn/version "2.7.169"}}}'

Clojure 1.11.1
user=> (require '[lambdaisland.deep-diff2 :as ddiff])
nil
user=> (ddiff/pretty-print (ddiff/diff [1 82 3 4 2 7 9 10] [1 2 10 11]))
[1 -82 -3 -4 2 -7 -9 10 +11]
nil

Here's a screenshot for the colouring:

It seems to behave the way I would have expected, maybe we can do the same after all.

philomates · 2022-12-12T13:29:26Z

Thanks for the comprehensive testing @lread

for

(is (match? [1 82 3 4 2 7 9 10] [1 2 10 11]) "recover multiple mismatches")

the current behavior gives 6 mismatches and 2 matches, for a total length of 8 elements

[...
(missing 7)
(mismatch (expected 9) (actual 10))
(mismatch (expected 10) (actual 11))]

the deepdiff version gives

[... 
(missing 7)
(missing 9)
10
(unexpected 11)]

which has 3 matches and 6 mismatches, for a total length of 9 elements.
So it sort of depends on how you want to weigh matches vs mismatches vs total sequence size. I don't think it would be too difficult to adapt the behavior to be like deepdiff but I'm not sure yet if it is necessarily the right thing

lread · 2022-12-12T14:11:21Z

Yeah, good point, there there is no absolute right or wrong.

I prefer a diff which makes some attempt to re-sync after a mismatch.

But that might be my subjective preference.

It could be interesting to learn more about deep-diff2's rationale and intent with regards to re-sync. Deep-diff2 uses clj-diff, which describes its diffing algorithm use here.

I don't know much about diff algorithms, but I think an "optimal" one produces the smallest number of diffs, which seems related.

dchelimsky · 2023-01-31T14:28:23Z

I think that (missing 9) and (unexpected 11) is easier to grok. @philomates you said

I don't think it would be too difficult to adapt the behavior to be like deepdiff

We could

leave this as/is
wait for you to address that in this PR
merge this as/is and address that in a follow up PR

I'm happy with any of this, as eve without the deepdiff alignment, this is a lot better as/is. I think the best option is the last one, if you're good with that.

philomates · 2023-01-31T15:46:48Z

I'm fine with merging as is and doing a follow-up to improve the results to better match deepdiff.
any thoughts on the performance considerations? #187 (comment)

recover after missing/extra element in sequence

85970c8

before: ``` expected: (match? [1 2 3 4] [1 3 4]) actual: [1 (mismatch (expected 2) (actual 3)) (mismatch (expected 3) (actual 4)) (missing 4)] ``` after: ``` expected: (match? [1 2 3 4] [1 3 4]) actual: [1 (expected 2) 3 4] ```

philomates commented Nov 29, 2022

View reviewed changes

philomates mentioned this pull request Nov 29, 2022

Consider recovering after missing/extra element in sequence #177

Open

philomates closed this by deleting the head repository May 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

recover after missing/extra element in sequence #187

recover after missing/extra element in sequence #187

philomates commented Nov 29, 2022

philomates Nov 29, 2022

philomates Dec 12, 2022

lread commented Dec 7, 2022

lread commented Dec 7, 2022

philomates commented Dec 12, 2022

lread commented Dec 12, 2022

dchelimsky commented Jan 31, 2023

philomates commented Jan 31, 2023

recover after missing/extra element in sequence #187

recover after missing/extra element in sequence #187

Conversation

philomates commented Nov 29, 2022

philomates Nov 29, 2022

Choose a reason for hiding this comment

philomates Dec 12, 2022

Choose a reason for hiding this comment

lread commented Dec 7, 2022

lread commented Dec 7, 2022

philomates commented Dec 12, 2022

lread commented Dec 12, 2022

dchelimsky commented Jan 31, 2023

philomates commented Jan 31, 2023