Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: update iceberg writer and add more metrics #13893

Merged
merged 2 commits into from
Dec 15, 2023
Merged

feat: update iceberg writer and add more metrics #13893

merged 2 commits into from
Dec 15, 2023

Conversation

ZENOTME
Copy link
Contributor

@ZENOTME ZENOTME commented Dec 11, 2023

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Close #12352

The icelake introduce new writer interface, this PR update to use them and add more metrics:

  1. partition num
  2. position delete cache num
  3. unflushed data file

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
    - [ ] I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
    - [ ] My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
    - [ ] My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)

- [ ] My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

- [ ] My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

Copy link

codecov bot commented Dec 11, 2023

Codecov Report

Attention: 279 lines in your changes are missing coverage. Please review.

Comparison is base (8b2b3f7) 67.99% compared to head (65fce4a) 68.01%.

Files Patch % Lines
src/connector/src/sink/iceberg/mod.rs 0.00% 119 Missing ⚠️
...k/iceberg/prometheus/monitored_base_file_writer.rs 0.00% 44 Missing ⚠️
...erg/prometheus/monitored_position_delete_writer.rs 0.00% 39 Missing ⚠️
...k/iceberg/prometheus/monitored_partition_writer.rs 0.00% 37 Missing ⚠️
.../sink/iceberg/prometheus/monitored_write_writer.rs 0.00% 28 Missing ⚠️
src/stream/src/executor/monitor/streaming_stats.rs 74.46% 12 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #13893      +/-   ##
==========================================
+ Coverage   67.99%   68.01%   +0.02%     
==========================================
  Files        1536     1540       +4     
  Lines      265403   265453      +50     
==========================================
+ Hits       180462   180556      +94     
+ Misses      84941    84897      -44     
Flag Coverage Δ
rust 68.01% <12.53%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

src/common/src/metrics/guarded_metrics.rs Outdated Show resolved Hide resolved
src/stream/src/executor/monitor/streaming_stats.rs Outdated Show resolved Hide resolved
Comment on lines +421 to +420
.deref()
.clone(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By deref and clone here, the ref count for label cannot work correctly.

This appears to be unsound. cc @wenym1 Is there a way to prevent this kind of leaking? Or maybe it's another motivation for us to migrate the prometheus library. #13086

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PrometheusWriterBuilder is defined in icelake, which means we are not able to choose the metric type we expect. If it's using the raw type of metric, there is no way to deal with the issue, even if we migrate to another prometheus library. The only way seems to be forking the current library to embed the current label guard logic in the internal implementation of current library and add a crate patch to the cargo.toml.

@ZENOTME @liurenjie1024 It is possible to let PrometheusWriterBuilder define some metric trait and pass a box dyn trait to it so that it can support customized metric report logic?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we should the promtheus writer builder in risinwave?

Copy link
Contributor Author

@ZENOTME ZENOTME Dec 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible to let PrometheusWriterBuilder define some metric trait and pass a box dyn trait to it so that it can support customized metric report logic?

Yes, it's possible.

By deref and clone here, the ref count for label cannot work correctly.

Actually the semantic here is more like to extract the ref of metrics. The ref count don't need to inc if we can express the ref semantic which means the life time of the ref metrics is subrange to the metrics. So another solution maybe :

  1. label metrics have an interface to expose the ref of raw metrics.
  2. PrometheusWriterBuilder can take a ref of raw metrics. (I'm not sure whether it can

Or we should the promtheus writer builder in risinwave?

Yes. We can also implement our own Prometheus writer builder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to be unsound. cc @wenym1 Is there a way to prevent this kind of leaking?

It seems hard to prevent because Deref makes it convenient to use the inner type. Except we expose the inner type op (add, set) manually.

impl<T, const N: usize> Deref for LabelGuardedMetric<T, N> {
    type Target = T;

    fn deref(&self) -> &Self::Target {
        &self.inner
    }
}

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license-eye has totally checked 4577 files.

Valid Invalid Ignored Fixed
2009 5 2563 0
Click to see the invalid file list
  • src/connector/src/sink/iceberg/prometheus/base_file_metrics_writer.rs
  • src/connector/src/sink/iceberg/prometheus/mod.rs
  • src/connector/src/sink/iceberg/prometheus/partition_metrics_writer.rs
  • src/connector/src/sink/iceberg/prometheus/position_delete_metrics_writer.rs
  • src/connector/src/sink/iceberg/prometheus/write_metrics_writer.rs


async fn write(&mut self, record: RecordBatch) -> Result<()> {
self.write_qps.inc();
let _ = self.write_latency.start_timer();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable named with _ will be dropped immediately, in which case we will fail to measure the time. We should name it as _timer to avoid this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

// See the License for the specific language governing permissions and
// limitations under the License.

pub mod base_file_metrics_writer;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In our naming convention we usually name it as monitored_xxx_writer;

.iceberg_position_delete_cache_num
.clone(),
);
let euality_delete_builder =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo, should be equality?

Copy link
Contributor

@tabVersion tabVersion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally LGTM

[
panels.target(
f"{metric('iceberg_partition_num')}",
"{{executor_id}} @ {{sink_id}}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the exec id in RW? I suggest using actor id instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the exec id in RW? I suggest using actor id instead

I think it means

pub identity: String,
, because the sink metrics is per executor, executor id is used to identify it.🤔

Why not use, cc @wenym1

pub executor_id: u64,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be compatible with the code before I introduce this refactor, because before I introduce this refactor, the sink metric was using the executor id to form a identity string as the label.

I think we can change to directly using the actor id or executor id, but we should remember to change the grafana dashboard at the meantime.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's refactor them all in a separate PR.

@ZENOTME ZENOTME force-pushed the zj/iceberg branch 2 times, most recently from 4d92681 to 360cbdc Compare December 12, 2023 07:02
Copy link
Contributor

@wenym1 wenym1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

// See the License for the specific language governing permissions and
// limitations under the License.

pub mod monitor_base_file_writer;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

monitor -> monitored

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides, the struct defined in these mods can be renamed to MonitoredXxxWriter as well.

}

pub struct FanoutPartitionedWriterWithMetrics<B: IcebergWriterBuilder>
where
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this where necessary?

I think there should already be type R: IcebergWriter trait bound on the IcebergWriterBuilder trait definition. If not, I think we should add the trait bound in the trait definition instead of here in the impl.

@ZENOTME ZENOTME added this pull request to the merge queue Dec 15, 2023
Merged via the queue into main with commit 821792a Dec 15, 2023
8 checks passed
@ZENOTME ZENOTME deleted the zj/iceberg branch December 15, 2023 04:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add observability for iceberg sink.
6 participants