Dropping label to resolve a high cardinality metric series? #8070
-
I see a lot of discussion online about this, but it seems they are all missing two points on block identification(q1) and metric collision/deduplication(q2). Background: We have a metric "order_total" that was impacted by a high cardinality label "store_number" that added thousands of unique label values to the metric series. We have since removed the "store_number" label from the backend metric that Prometheus is scraping. However, when querying this metric in Thanos for the date ranges it was affected, we experience very high resource usage on the querier, enough to crash the querier entirely. 2 questions on this matter:
Thanks in advance for any replies. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
@mwalz0 Hello there! I'm here to help you with your issue. I can assist you in solving bugs, answering questions, and becoming a contributor.
To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other |
Beta Was this translation helpful? Give feedback.
-
Figured this out. For my setup, Thanos compactor would compact our metrics from sidecar into two week blocks, replicated 3x for each Prometheus replica we run. I believe this is the default block time range, but this is important to know for identifying your bad blocks. I wasn't too familiar with exactly how Compactor deals with block downsampling/retention so I was able to learn that.
Hope this can help someone else, because it was a big headache for me until I got it working. 🦦 |
Beta Was this translation helpful? Give feedback.
Figured this out. For my setup, Thanos compactor would compact our metrics from sidecar into two week blocks, replicated 3x for each Prometheus replica we run. I believe this is the default block time range, but this is important to know for identifying your bad blocks. I wasn't too familiar with exactly how Compactor deals with block downsampling/retention so I was able to learn that.
thanos bucket rewrite --no-dry-run --delete-…