Influxdb 3: Partition size and querier vs ingester performance #24396

m-v-w · 2023-09-28T20:53:33Z

m-v-w
Sep 28, 2023

I've been experimenting with the performance of influxdb IOX on a dataset that has a high, but limited (~3M), cardinality "key" tag and a few double value fields. Queries are performed on the key and a time range, sometimes with time aggregation. The Question that arises is how to choose the partition-key of the table optimally.

Using the standard partition-key "%Y-%m-%d", the ingestion performance is sublime: > 500K rows/s on a 32 vCPU instance. But querying over a longer time frame (multiple days) for a given key has very poor performance, causes "fan_out" warnings and ultimately OOM errors.

I tried to address the issue by grouping keys together, adding a "cluster-key" tag to the data and adding it to the partition-key. As long as the cardinality per ingester of this "cluster-key" is low, the ingestion performance is still good and the query performance is greatly improved.

But the overall performance, compared to older versions of influxdb, is still not great. There seems to be a fundamental trade-off between ingester and querier performance depending on the number of partitions per day.

Feature Suggestion:
Would it make sense to add a secondary partition-sub-key that is ignored by the ingester but respected by the compactor and querier to allow for full ingestion performance, while separating the compacted data to a finer degree, allowing for a better querying performance when filtering for a specific sub key?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Influxdb 3: Partition size and querier vs ingester performance #24396

{{title}}

Replies: 0 comments

Select a reply

Influxdb 3: Partition size and querier vs ingester performance #24396

m-v-w Sep 28, 2023

Replies: 0 comments

m-v-w
Sep 28, 2023