Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky test TestSearchTagValuesV2 #4578

Open
ndk opened this issue Jan 18, 2025 · 1 comment
Open

Flaky test TestSearchTagValuesV2 #4578

ndk opened this issue Jan 18, 2025 · 1 comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed type/bug Something isn't working

Comments

@ndk
Copy link
Contributor

ndk commented Jan 18, 2025

TestSearchTagValuesV2/second_batch_with_incomplete_query_-_name sometimes fails, and it appears this may be tied to the timing of block compaction.
Example of a failing run: https://github.com/grafana/tempo/actions/runs/12831739931/job/35782895220

15:18:40 tempo: level=info ts=2025-01-17T15:18:40.666072638Z caller=handler.go:134 tenant=single-tenant method=GET traceID= url="/api/v2/search/tag/span.x/values?q=%7B+name%3D%22baz%22+%26%26+span.x+%3D+%7D" duration=860.231µs response_size=113 status=200
    api_test.go:594: 
        	Error Trace:	/home/runner/work/tempo/tempo/integration/e2e/api_test.go:594
        	            				/home/runner/work/tempo/tempo/integration/e2e/api_test.go:409
        	Error:      	Not equal: 
        	            	expected: []e2e.TagValue{e2e.TagValue{Type:"string", Value:"qux"}}
        	            	actual  : []e2e.TagValue{e2e.TagValue{Type:"string", Value:"bar"}, e2e.TagValue{Type:"string", Value:"qux"}}
        	            	
        	            	Diff:
        	            	--- Expected
        	            	+++ Actual
        	            	@@ -1,2 +1,6 @@
        	            	-([]e2e.TagValue) (len=1) {
        	            	+([]e2e.TagValue) (len=2) {
        	            	+ (e2e.TagValue) {
        	            	+  Type: (string) (len=6) "string",
        	            	+  Value: (string) (len=3) "bar"
        	            	+ },

Observations
https://github.com/grafana/tempo/blob/main/modules/ingester/instance_search.go#L462

  • Successful runs have len(instance.completeBlocks)=0 before the test:
searchBlock("headBlock")
searchBlock("completingBlocks")
  • Failed runs have len(instance.completeBlocks)=1 before the test:
searchBlock("headBlock")
searchBlockWithCache("completeBlocks")

It seems that if block compaction triggers after the test traces are added but before querying tags, the test fails. When searchBlockWithCache runs, it appears to add entries from the block without filtering, which might be causing the unexpected extra result ("bar").

Example Logs

Successful run
// Testing traces are already ingested
// The first test
instance.SearchTagValuesV2 ->>>> req.Query=""
instance.SearchTagValuesV2 ->>>> query="{}", req.Query=""
instance.SearchTagValuesV2 - searchBlock ->>>> spanName="headBlock"
instance.SearchTagValuesV2 ->>>> len(i.completeBlocks)=0
instance.SearchTagValuesV2 - searchBlock ->>>> spanName="completingBlocks"
instance.SearchTagValuesV2 - performSearch ->>>> query="{}", req.Query=""
instance.SearchTagValuesV2 - performSearch ->>>> query="{}", req.Query=""
instance.SearchTagValuesV2 - performSearch.IsEmptyQuery ->>>> query="{}", req.Query=""
instance.SearchTagValuesV2 - performSearch.IsEmptyQuery ->>>> query="{}", req.Query=""
// The second test
instance.SearchTagValuesV2 ->>>> req.Query="{ name=\"baz\" && span.x = }"
instance.SearchTagValuesV2 ->>>> query="{name=\"baz\"}", req.Query="{ name=\"baz\" && span.x = }"
instance.SearchTagValuesV2 - searchBlock ->>>> spanName="headBlock"
instance.SearchTagValuesV2 ->>>> len(i.completeBlocks)=0
instance.SearchTagValuesV2 - performSearch ->>>> query="{name=\"baz\"}", req.Query="{ name=\"baz\" && span.x = }"
instance.SearchTagValuesV2 - searchBlock ->>>> spanName="completingBlocks"
instance.SearchTagValuesV2 - performSearch ->>>> query="{name=\"baz\"}", req.Query="{ name=\"baz\" && span.x = }"
//
instance.CompleteBlock ->>>> append completeBlocks, before len(i.completeBlocks) = 0
instance.CompleteBlock ->>>> completeBlocks is updated, now len(i.completeBlocks) = 1
Failed run
// Testing traces are already ingested
instance.CompleteBlock ->>>> append completeBlocks, before len(i.completeBlocks) = 0
instance.CompleteBlock ->>>> completeBlocks is updated, now len(i.completeBlocks) = 1
// The first test
instance.SearchTagValuesV2 ->>>> req.Query=""
instance.SearchTagValuesV2 ->>>> query="{}", req.Query=""
instance.SearchTagValuesV2 - searchBlock ->>>> spanName="headBlock"
instance.SearchTagValuesV2 ->>>> len(i.completeBlocks)=1
instance.SearchTagValuesV2 - searchBlockWithCache ->>>> spanName="completeBlocks"
instance.SearchTagValuesV2 - performSearch ->>>> query="{}", req.Query=""
instance.SearchTagValuesV2 - performSearch.IsEmptyQuery ->>>> query="{}", req.Query=""
instance.SearchTagValuesV2 - performSearch ->>>> query="{}", req.Query=""
instance.SearchTagValuesV2 - performSearch.IsEmptyQuery ->>>> query="{}", req.Query=""
LocalBlock.SearchTagValuesV2 ->>>> "x"
// The second test
instance.SearchTagValuesV2 ->>>> req.Query="{ name=\"baz\" && span.x = }"
instance.SearchTagValuesV2 ->>>> query="{name=\"baz\"}", req.Query="{ name=\"baz\" && span.x = }"
instance.SearchTagValuesV2 ->>>> len(i.completeBlocks)=1
instance.SearchTagValuesV2 - searchBlock ->>>> spanName="headBlock"
instance.SearchTagValuesV2 - searchBlockWithCache ->>>> spanName="completeBlocks"
instance.SearchTagValuesV2 - performSearch ->>>> query="{name=\"baz\"}", req.Query="{ name=\"baz\" && span.x = }"

Conclusion
Based on these observations, it looks like the test may fail when the data is queried from a block in the completeBlocks list - possibly because the block compaction step happens too soon and searchBlockWithCache does not filter entries correctly. I might be missing some implementation details, but wanted to report this in case it indicates a bug.

@stoewer
Copy link
Contributor

stoewer commented Feb 6, 2025

Thank you for reporting this and for the detailed description.

I can confirm that the test is flaky on my local setup as well, failing approximately 1 out of 10 times. Additionally, I observed that when TestSearchTagValuesV2/second_batch_with_incomplete_query_-_name fails, TestSearchTagValuesV2/bad_query_-_unfiltered_results also tends to fail:

    --- FAIL: TestSearchTagValuesV2/bad_query_-_unfiltered_results (0.00s)
        api_test.go:850: 
            	Error Trace:	/home/astoewer/Develop/tempo/integration/e2e/api_test.go:850
            	            				/home/astoewer/Develop/tempo/integration/e2e/api_test.go:595
            	            				/home/astoewer/Develop/tempo/integration/e2e/api_test.go:409
            	Error:      	"30" is not greater than "300"
            	Test:       	TestSearchTagValuesV2/bad_query_-_unfiltered_results

@stoewer stoewer added type/bug Something isn't working good first issue Good for newcomers help wanted Extra attention is needed labels Feb 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants