You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been experimenting with the performance of influxdb IOX on a dataset that has a high, but limited (~3M), cardinality "key" tag and a few double value fields. Queries are performed on the key and a time range, sometimes with time aggregation. The Question that arises is how to choose the partition-key of the table optimally.
Using the standard partition-key "%Y-%m-%d", the ingestion performance is sublime: > 500K rows/s on a 32 vCPU instance. But querying over a longer time frame (multiple days) for a given key has very poor performance, causes "fan_out" warnings and ultimately OOM errors.
I tried to address the issue by grouping keys together, adding a "cluster-key" tag to the data and adding it to the partition-key. As long as the cardinality per ingester of this "cluster-key" is low, the ingestion performance is still good and the query performance is greatly improved.
But the overall performance, compared to older versions of influxdb, is still not great. There seems to be a fundamental trade-off between ingester and querier performance depending on the number of partitions per day.
Feature Suggestion:
Would it make sense to add a secondary partition-sub-key that is ignored by the ingester but respected by the compactor and querier to allow for full ingestion performance, while separating the compacted data to a finer degree, allowing for a better querying performance when filtering for a specific sub key?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I've been experimenting with the performance of influxdb IOX on a dataset that has a high, but limited (~3M), cardinality "key" tag and a few double value fields. Queries are performed on the key and a time range, sometimes with time aggregation. The Question that arises is how to choose the partition-key of the table optimally.
Using the standard partition-key "%Y-%m-%d", the ingestion performance is sublime: > 500K rows/s on a 32 vCPU instance. But querying over a longer time frame (multiple days) for a given key has very poor performance, causes "fan_out" warnings and ultimately OOM errors.
I tried to address the issue by grouping keys together, adding a "cluster-key" tag to the data and adding it to the partition-key. As long as the cardinality per ingester of this "cluster-key" is low, the ingestion performance is still good and the query performance is greatly improved.
But the overall performance, compared to older versions of influxdb, is still not great. There seems to be a fundamental trade-off between ingester and querier performance depending on the number of partitions per day.
Feature Suggestion:
Would it make sense to add a secondary partition-sub-key that is ignored by the ingester but respected by the compactor and querier to allow for full ingestion performance, while separating the compacted data to a finer degree, allowing for a better querying performance when filtering for a specific sub key?
Beta Was this translation helpful? Give feedback.
All reactions