-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Analyze cause of duplicate events in db #333
Comments
Next steps: add logging to events component and deploy to yt01 to log all event ids that are sent to the storage endpoint |
last duplicate in tt02 was created "2023-06-05 12:32:57.818753+00". Any changed implemented around this time, @SandGrainOne ? |
Last duplicates in production "2023-06-12 08:16:24.930741+00". Which kind of matches the deployment schedule, I guess. |
@acn-sbuad Kan du sjekke om det finnes duplikater siden sist det ble sjekket? Dersom det ikke er noen, kan kanskje denne lukkes? |
8 events har blitt duplisert siste 90 dagene. @annerisbakk FYI |
FYI: As of 2024-08-05, there were 71 events with 2 or more entries during the past 90 days. This issue is still relevant... |
Continued in #573 |
Things to check/understand (read docs?):
|
Can't find any duplicates in prod or tt02. (Data older then 90 days are deleted.) Should we postpone further analysis until the problem is observed again? |
Description
Analyze
Additional Information
No response
Tasks
No elements in poison queues, but a number of duplicates in db
Hypothesis
No need for the inbound endpoint in events if function can push elements directly to queue.
Conclusion: Did not fix the problem of duplicates in the database, however it does save us 1 lookup in keyvault per processed cloud event. Can't quite remember why we implemented it like this is the first place, is there a reason function cannot return the cloud event directly to the next queue ?
== > Directy using an out binding for the function resulted in some lost events. Will need to find out if we can change the function config
[return: Queue("events-inbound", Connection = "QueueStorage")]
Hypothesis
Duplicates occur due to exhaustion of connections to key vault
Conclusion: The connection to KV fails far more often than we see duplicates.
Hypothesis 05.08.24
Duplicates are in large created during deploy.
Defining a preStop hook in the HELM deployment can allow us to postpone the shutdown process, potentially allowing the pod to complete all ongoing requests before being shut down. Functions log
// PostInbound event with id 661cc13f-9b21-4af2-9639-6de0e845aead failed with status code GatewayTimeout
. DocsAcceptance Criterias
No response
The text was updated successfully, but these errors were encountered: