Flaky test `TestSearchTagValuesV2` #4578

ndk · 2025-01-18T16:30:35Z

TestSearchTagValuesV2/second_batch_with_incomplete_query_-_name sometimes fails, and it appears this may be tied to the timing of block compaction.
Example of a failing run: https://github.com/grafana/tempo/actions/runs/12831739931/job/35782895220

15:18:40 tempo: level=info ts=2025-01-17T15:18:40.666072638Z caller=handler.go:134 tenant=single-tenant method=GET traceID= url="/api/v2/search/tag/span.x/values?q=%7B+name%3D%22baz%22+%26%26+span.x+%3D+%7D" duration=860.231µs response_size=113 status=200
    api_test.go:594: 
        	Error Trace:	/home/runner/work/tempo/tempo/integration/e2e/api_test.go:594
        	            				/home/runner/work/tempo/tempo/integration/e2e/api_test.go:409
        	Error:      	Not equal: 
        	            	expected: []e2e.TagValue{e2e.TagValue{Type:"string", Value:"qux"}}
        	            	actual  : []e2e.TagValue{e2e.TagValue{Type:"string", Value:"bar"}, e2e.TagValue{Type:"string", Value:"qux"}}
        	            	
        	            	Diff:
        	            	--- Expected
        	            	+++ Actual
        	            	@@ -1,2 +1,6 @@
        	            	-([]e2e.TagValue) (len=1) {
        	            	+([]e2e.TagValue) (len=2) {
        	            	+ (e2e.TagValue) {
        	            	+  Type: (string) (len=6) "string",
        	            	+  Value: (string) (len=3) "bar"
        	            	+ },

Observations
https://github.com/grafana/tempo/blob/main/modules/ingester/instance_search.go#L462

Successful runs have len(instance.completeBlocks)=0 before the test:

searchBlock("headBlock")
searchBlock("completingBlocks")

Failed runs have len(instance.completeBlocks)=1 before the test:

searchBlock("headBlock")
searchBlockWithCache("completeBlocks")

It seems that if block compaction triggers after the test traces are added but before querying tags, the test fails. When searchBlockWithCache runs, it appears to add entries from the block without filtering, which might be causing the unexpected extra result ("bar").

Example Logs

Successful run

// Testing traces are already ingested
// The first test
instance.SearchTagValuesV2 ->>>> req.Query=""
instance.SearchTagValuesV2 ->>>> query="{}", req.Query=""
instance.SearchTagValuesV2 - searchBlock ->>>> spanName="headBlock"
instance.SearchTagValuesV2 ->>>> len(i.completeBlocks)=0
instance.SearchTagValuesV2 - searchBlock ->>>> spanName="completingBlocks"
instance.SearchTagValuesV2 - performSearch ->>>> query="{}", req.Query=""
instance.SearchTagValuesV2 - performSearch ->>>> query="{}", req.Query=""
instance.SearchTagValuesV2 - performSearch.IsEmptyQuery ->>>> query="{}", req.Query=""
instance.SearchTagValuesV2 - performSearch.IsEmptyQuery ->>>> query="{}", req.Query=""
// The second test
instance.SearchTagValuesV2 ->>>> req.Query="{ name=\"baz\" && span.x = }"
instance.SearchTagValuesV2 ->>>> query="{name=\"baz\"}", req.Query="{ name=\"baz\" && span.x = }"
instance.SearchTagValuesV2 - searchBlock ->>>> spanName="headBlock"
instance.SearchTagValuesV2 ->>>> len(i.completeBlocks)=0
instance.SearchTagValuesV2 - performSearch ->>>> query="{name=\"baz\"}", req.Query="{ name=\"baz\" && span.x = }"
instance.SearchTagValuesV2 - searchBlock ->>>> spanName="completingBlocks"
instance.SearchTagValuesV2 - performSearch ->>>> query="{name=\"baz\"}", req.Query="{ name=\"baz\" && span.x = }"
//
instance.CompleteBlock ->>>> append completeBlocks, before len(i.completeBlocks) = 0
instance.CompleteBlock ->>>> completeBlocks is updated, now len(i.completeBlocks) = 1

Failed run

// Testing traces are already ingested
instance.CompleteBlock ->>>> append completeBlocks, before len(i.completeBlocks) = 0
instance.CompleteBlock ->>>> completeBlocks is updated, now len(i.completeBlocks) = 1
// The first test
instance.SearchTagValuesV2 ->>>> req.Query=""
instance.SearchTagValuesV2 ->>>> query="{}", req.Query=""
instance.SearchTagValuesV2 - searchBlock ->>>> spanName="headBlock"
instance.SearchTagValuesV2 ->>>> len(i.completeBlocks)=1
instance.SearchTagValuesV2 - searchBlockWithCache ->>>> spanName="completeBlocks"
instance.SearchTagValuesV2 - performSearch ->>>> query="{}", req.Query=""
instance.SearchTagValuesV2 - performSearch.IsEmptyQuery ->>>> query="{}", req.Query=""
instance.SearchTagValuesV2 - performSearch ->>>> query="{}", req.Query=""
instance.SearchTagValuesV2 - performSearch.IsEmptyQuery ->>>> query="{}", req.Query=""
LocalBlock.SearchTagValuesV2 ->>>> "x"
// The second test
instance.SearchTagValuesV2 ->>>> req.Query="{ name=\"baz\" && span.x = }"
instance.SearchTagValuesV2 ->>>> query="{name=\"baz\"}", req.Query="{ name=\"baz\" && span.x = }"
instance.SearchTagValuesV2 ->>>> len(i.completeBlocks)=1
instance.SearchTagValuesV2 - searchBlock ->>>> spanName="headBlock"
instance.SearchTagValuesV2 - searchBlockWithCache ->>>> spanName="completeBlocks"
instance.SearchTagValuesV2 - performSearch ->>>> query="{name=\"baz\"}", req.Query="{ name=\"baz\" && span.x = }"

Conclusion
Based on these observations, it looks like the test may fail when the data is queried from a block in the completeBlocks list - possibly because the block compaction step happens too soon and searchBlockWithCache does not filter entries correctly. I might be missing some implementation details, but wanted to report this in case it indicates a bug.

The text was updated successfully, but these errors were encountered:

stoewer · 2025-02-06T07:50:22Z

Thank you for reporting this and for the detailed description.

I can confirm that the test is flaky on my local setup as well, failing approximately 1 out of 10 times. Additionally, I observed that when TestSearchTagValuesV2/second_batch_with_incomplete_query_-_name fails, TestSearchTagValuesV2/bad_query_-_unfiltered_results also tends to fail:

    --- FAIL: TestSearchTagValuesV2/bad_query_-_unfiltered_results (0.00s)
        api_test.go:850: 
            	Error Trace:	/home/astoewer/Develop/tempo/integration/e2e/api_test.go:850
            	            				/home/astoewer/Develop/tempo/integration/e2e/api_test.go:595
            	            				/home/astoewer/Develop/tempo/integration/e2e/api_test.go:409
            	Error:      	"30" is not greater than "300"
            	Test:       	TestSearchTagValuesV2/bad_query_-_unfiltered_results

stoewer added type/bug Something isn't working good first issue Good for newcomers help wanted Extra attention is needed labels Feb 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flaky test `TestSearchTagValuesV2` #4578

Flaky test `TestSearchTagValuesV2` #4578

ndk commented Jan 18, 2025 •

edited

Loading

stoewer commented Feb 6, 2025

Flaky test TestSearchTagValuesV2 #4578

Flaky test TestSearchTagValuesV2 #4578

Comments

ndk commented Jan 18, 2025 • edited Loading

stoewer commented Feb 6, 2025

Flaky test `TestSearchTagValuesV2` #4578

Flaky test `TestSearchTagValuesV2` #4578

ndk commented Jan 18, 2025 •

edited

Loading