Sinks #575
purplefox
started this conversation in
Design discussions
Sinks
#575
Replies: 1 comment
-
This looks great. Will a message be published to the topic when any field in the query is updated? It could be useful to have a message published that includes fields that are not part of the change detection. Honestly though, that is a stretch and I think the feature as written would be extremely valuable. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Sinks
Motivation
We wish to pass a stream of updates from a source or materialized view into a Kafka topic.
For example: We might have a materialized view that maintains a customer's account balance:
select customer_id, sum(amount) as balance from transactions group by customer_id
When a change occurs in the materialized view we want to emit an event to Kafka, representing the latest value of the balance for the customer, or, if there is a deletion, the key and a null value.
It should be possible to configure the sink to emit events for a key a most every, say, x seconds (or ms etc). Some MVs might update very frequently and we might not want to emit an event for every change. E.g. a stock price might change thousands of times per second, but we only want to send the latest price every 5 seconds.
Declaration
Sinks will be declared as follows:
The columns selected for a sink are defined using a SQL query. The query cannot use any aggregation or join but can use a source or materialized view as its source. It can use any of the supported SQL expressions.
The
with
configuration is similar to a source,fieldInjectors
are similar tocolumnSelectors
in a source.There must be one field injector for each column selected in the SQL statement. The injector determines where in the resulting Kafka message the value will be injected.
Implementation
A sink will be implemented as a simple DAG, similar to how we create a materialized view, with the results of the DAG going into a table in storage, like a materialized view.
However, we will scan the state of the sink every
delay
period, construct Kafka messages from the rows and send them to Kafka. When we have received an acknowledgement from Kafka that the message has been received, it can be deleted from the sink table. This way we can guarantee no loss of changes to Kafka in event of system failure and recovery.We will have an upper limit on the number of outstanding changes in the sink table before message processing will be blocked, to avoid the sink table size increasing without bound.
Beta Was this translation helpful? Give feedback.
All reactions