Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #37137 - Use insert_all when creating available module streams #10878

Conversation

jeremylenz
Copy link
Member

@jeremylenz jeremylenz commented Feb 1, 2024

What are the changes introduced in this pull request?

During a profile upload, a host's module streams are updated in Katello's database.

In #10412 we changed to using create_or_find_by! to create those records. This solved an issue where concurrent host registrations would sometimes fail due to a PostgreSQL error:

ERROR:  duplicate key value violates unique constraint "katello_available_module_streams_name_stream_context"

However, these errors are still inserted into postgres logs. They are harmless, but log files can fill up quickly. This is because there will be one log for every module stream creation attempt, for every host with module streams, not just at registration but also during every profile upload. This results in log files that can quickly build to 200000 lines with just those error logs.

With this change, we're now using insert_all to insert all of the records in a single SQL query, which should avoid the race condition.

Considerations taken when implementing this change?

This required some major refactoring, because

  1. import_module_streams takes module stream data in a list of Hashes, but those don't include Katello's ID. (There's no way they could, as this info is coming from sub-man/dnf.)
  2. In our previous implementation, we relied on the ids returned from create_or_find_by etc. to pass on to the next method, sync_available_module_stream_associations. sync_available_module_stream_associations requires a Hash indexed by database ID.
  3. Previously, the data passed to sync_available_module_stream_associations was all of the host's available module streams, indexed by ID, including the ones we just newly added. Thus, new_available_module_streams was a bit of a misnomer. I did some refactoring to recreate this format of the indexed hash.

What are the testing steps for this pull request?

  1. tail -f /var/lib/pgsql/data/log/postgresql-Thu.log (replace Thu with whatever day it is today)
  2. Register a host (without this patch) - Notice that the file fills up with error logs
2024-02-01 21:57:16 UTC STATEMENT:  INSERT INTO "katello_available_module_streams" ("name", "stream", "context") VALUES ($1, $2, $3) RETURNING "id"
2024-02-01 21:57:16 UTC ERROR:  duplicate key value violates unique constraint "katello_available_module_streams_name_stream_context"
2024-02-01 21:57:16 UTC DETAIL:  Key (name, stream, context)=(virt, rhel, 30b713e6) already exists.

You can also check how many logs there are:

grep 'duplicate key value violates unique constraint "katello_available_module_streams_name_stream_context"' /var/lib/pgsql/data/log/postgresql-Thu.log | wc -l
---
227950
---
  1. Enable or disable a module stream - Notice that more error logs are created.
  2. Check out this PR
  3. Enable or disable a module stream - No new logs should be created.
  4. Register a host - No new logs should be created.

@jeremylenz jeremylenz force-pushed the 37137-first-or-create-or-create-or-find-by-oh-my branch from e69483b to 4b73412 Compare February 2, 2024 14:44
@jeremylenz
Copy link
Member Author

fixed rubocop

@chris1984
Copy link
Member

Your PR is having the same test failures as mine is

@jeremylenz
Copy link
Member Author

Your PR is having the same test failures as mine is

Yeah, apparently #10861 has the fixes. We may have to ignore those for a bit.

@jeremylenz
Copy link
Member Author

[test katello]

1 similar comment
@jeremylenz
Copy link
Member Author

[test katello]

@sbernhard
Copy link
Member

Ready?

@jeremylenz
Copy link
Member Author

Yes, this should be ready for review again 👍

stream: module_stream["stream"])
else
stream = AvailableModuleStream.where(name: module_stream["name"],
context: module_stream["context"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the indention broken?

@@ -348,11 +348,21 @@ def import_enabled_repositories(repos)
end

def import_module_streams(module_streams)
# create_or_find_by avoids race conditions during concurrent registrations but clogs postgres logs with harmless errors.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of two different implementation, would it be possible to catch the "postgres logs with harmless errors" and handle this accordingly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that's the way the race condition is avoided - by letting the Postgres constraint handle it rather than ruby/rails.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. make sense.

Copy link
Member

@ianballou ianballou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why won't we hit PG::UniqueViolation errors during concurrent package profile uploads? Is there a difference in concurrency between 10 systems registering at the same time and 10 systems dnf updating at the same time?

# create_or_find_by avoids race conditions during concurrent registrations but clogs postgres logs with harmless errors.
# So we'll use create_or_find_by! during registration and first_or_create! otherwise.
registered_time = subscription_facet&.registered_at
use_create_or_find_by = registered_time.nil? || registered_time > 1.minute.ago
streams = {}
module_streams.each do |module_stream|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this code is old, but I wonder if now would be a good time to switch to insert_all or upsert_all. It feels odd to do so many creations one postgres insert at a time in a Ruby loop.

Bulk insertion lets you create all entities in a single Postgres insert call. It takes into account duplicates too from what I remember. You can pass in what the records should be unique_by (in this case name, stream, and context).

There might be a catch, but I think it's worth a thought.

https://apidock.com/rails/v6.0.0/ActiveRecord/Persistence/ClassMethods/insert_all
https://apidock.com/rails/v6.0.0/ActiveRecord/Persistence/ClassMethods/upsert_all

We use them in a couple places:

https://github.com/search?q=repo%3AKatello%2Fkatello%20insert_all&type=code
https://github.com/search?q=repo%3AKatello%2Fkatello+upsert_all&type=code

@jeremylenz
Copy link
Member Author

We haven't had any complaints of that, for whatever reason. It could be because most package profile uploads are from rhsmcertd checkins which are staggered automatically with random delay intervals to prevent too many concurrent requests, whereas registrations can happen at any time.

@jeremylenz jeremylenz force-pushed the 37137-first-or-create-or-create-or-find-by-oh-my branch from 4b73412 to 4dfde98 Compare February 20, 2024 23:54
@jeremylenz jeremylenz changed the title Fixes #37137 - Revert to old method of creating module streams after registration Fixes #37137 - Use upsert_all when creating available module streams Feb 21, 2024
@jeremylenz jeremylenz force-pushed the 37137-first-or-create-or-create-or-find-by-oh-my branch from 4dfde98 to 8123b07 Compare February 27, 2024 20:24
@jeremylenz jeremylenz changed the title Fixes #37137 - Use upsert_all when creating available module streams Fixes #37137 - Use insert_all when creating available module streams Feb 27, 2024
@jeremylenz jeremylenz force-pushed the 37137-first-or-create-or-create-or-find-by-oh-my branch from 8123b07 to 65430af Compare February 27, 2024 20:54
@jeremylenz
Copy link
Member Author

fixed rubocop

@ianballou
Copy link
Member

Gonna test first then code review.

Copy link
Member

@ianballou ianballou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is working great! So far I'm not seeing any issues with the code.

I'll give it a final review once the dev logging is cleaned up.

@jeremylenz jeremylenz force-pushed the 37137-first-or-create-or-create-or-find-by-oh-my branch from 5aad7dc to f0bad71 Compare February 28, 2024 14:28
@jeremylenz jeremylenz force-pushed the 37137-first-or-create-or-create-or-find-by-oh-my branch from e3caee8 to 1f8225e Compare February 28, 2024 16:09
Copy link
Member

@ianballou ianballou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still working fine for me! Ruby tests are passing now too.

@jeremylenz jeremylenz merged commit 7458f70 into Katello:master Feb 28, 2024
13 of 16 checks passed
sbernhard pushed a commit to ATIX-AG/katello that referenced this pull request Feb 29, 2024
…atello#10878)

* Fixes #37137 - Use insert_all when creating available module streams

(cherry picked from commit 7458f70)
qcjames53 pushed a commit to qcjames53/katello that referenced this pull request Mar 7, 2024
…atello#10878)

* Fixes #37137 - Use insert_all when creating available module streams

(cherry picked from commit 7458f70)
qcjames53 added a commit that referenced this pull request Mar 7, 2024
* Release 4.12.0-rc1 (#10909)

* Fixes #37137 - Use insert_all when creating available module streams (#10878)

* Fixes #37137 - Use insert_all when creating available module streams

(cherry picked from commit 7458f70)

* Fixes #37178 - use safe navigation for operatingsystem (#10897)

(cherry picked from commit 664488d)

* Refs #37202 - display name according to setting in host collection

(cherry picked from commit de39a36)

* Fixes #37192 - Use with_enabled_email scope (#10901)

* Fixes #37192 - Use with_enabled_email scope

Instead of typing out the same query explicitly.

* Refs #37192 - Bump required foreman version

(cherry picked from commit 1ace8e5)

* Refs #37203 - display name according to setting in errata host list

(cherry picked from commit 2775554)

* Fixes #37214 - Change 'default' for limit to environment checkbox in activation-key and content-host

(cherry picked from commit 9d0a338)

* Fixes #37108 - Preload content_view_components (#10864)

(cherry picked from commit d98c17b)

* Fixes #36976 - Too many audit records slow down CV loading (#10911)

(cherry picked from commit 30c6977)

* Refs #37201 - display short host name in activation keys menu

(cherry picked from commit 8164043)

* Fixes #37187 - Update ACS refresh Pulp fixtures + fix repository_test test

(cherry picked from commit 99c7857)

* Fixes #37198 - Allow installedDeb package attributes in safemode

(cherry picked from commit b5d785f)

* Fixes #37197 - Kickstart repository correctly listed on hostgroup

(cherry picked from commit 35c49b1)

* Fixes #37169 - Managing a Hosts Repository Sets does not behave as expected (#10905)

(cherry picked from commit 6420cf8)

---------

Co-authored-by: Jeremy Lenz <jlenz@redhat.com>
Co-authored-by: Matěj Mudra <dyrkon603@gmail.com>
Co-authored-by: Adam Růžička <adamruzicka@users.noreply.github.com>
Co-authored-by: Manisha Singhal <singhal@atix.de>
Co-authored-by: Bernhard Suttner <sbernhard@users.noreply.github.com>
Co-authored-by: Samir Jha <sjha4@ncsu.edu>
Co-authored-by: ianballou <ianballou67@gmail.com>
Co-authored-by: Bernhard Suttner <suttner@atix.de>
Co-authored-by: Partha Aji <paji@redhat.com>
Co-authored-by: Thorben <99347625+Thorben-D@users.noreply.github.com>
sbernhard pushed a commit to ATIX-AG/katello that referenced this pull request Mar 8, 2024
…atello#10878)

* Fixes #37137 - Use insert_all when creating available module streams

(cherry picked from commit 7458f70)
qcjames53 pushed a commit to qcjames53/katello that referenced this pull request Mar 8, 2024
…atello#10878)

* Fixes #37137 - Use insert_all when creating available module streams

(cherry picked from commit 7458f70)
qcjames53 added a commit that referenced this pull request Mar 8, 2024
* Fixes #37187 - Update ACS refresh Pulp fixtures + fix repository_test test

(cherry picked from commit 99c7857)

* Fixes #37137 - Use insert_all when creating available module streams (#10878)

* Fixes #37137 - Use insert_all when creating available module streams

(cherry picked from commit 7458f70)

* Fixes #37178 - use safe navigation for operatingsystem (#10897)

(cherry picked from commit 664488d)

* Refs #37202 - display name according to setting in host collection

(cherry picked from commit de39a36)

* Refs #37201 - display short host name in activation keys menu

(cherry picked from commit 8164043)

* Refs #37203 - display name according to setting in errata host list

(cherry picked from commit 2775554)

* Fixes #37214 - Change 'default' for limit to environment checkbox in activation-key and content-host

(cherry picked from commit 9d0a338)

* Fixes #37198 - Allow installedDeb package attributes in safemode

(cherry picked from commit b5d785f)

* Fixes #37108 - Preload content_view_components (#10864)

(cherry picked from commit d98c17b)

* Fixes #37197 - Kickstart repository correctly listed on hostgroup

(cherry picked from commit 35c49b1)

* Fixes #36976 - Too many audit records slow down CV loading (#10911)

(cherry picked from commit 30c6977)

* Fixes #37169 - Managing a Hosts Repository Sets does not behave as expected (#10905)

(cherry picked from commit 6420cf8)

---------

Co-authored-by: ianballou <ianballou67@gmail.com>
Co-authored-by: Jeremy Lenz <jlenz@redhat.com>
Co-authored-by: Matěj Mudra <dyrkon603@gmail.com>
Co-authored-by: Manisha Singhal <singhal@atix.de>
Co-authored-by: Bernhard Suttner <suttner@atix.de>
Co-authored-by: Bernhard Suttner <sbernhard@users.noreply.github.com>
Co-authored-by: Partha Aji <paji@redhat.com>
Co-authored-by: Samir Jha <sjha4@ncsu.edu>
Co-authored-by: Thorben <99347625+Thorben-D@users.noreply.github.com>
qcjames53 pushed a commit to qcjames53/katello that referenced this pull request Mar 8, 2024
…atello#10878)

* Fixes #37137 - Use insert_all when creating available module streams

(cherry picked from commit 7458f70)
qcjames53 added a commit that referenced this pull request Mar 8, 2024
* Fixes #37187 - Update ACS refresh Pulp fixtures + fix repository_test test

(cherry picked from commit 99c7857)

* Fixes #37137 - Use insert_all when creating available module streams (#10878)

* Fixes #37137 - Use insert_all when creating available module streams

(cherry picked from commit 7458f70)

* Fixes #37178 - use safe navigation for operatingsystem (#10897)

(cherry picked from commit 664488d)

* Refs #37202 - display name according to setting in host collection

(cherry picked from commit de39a36)

* Refs #37201 - display short host name in activation keys menu

(cherry picked from commit 8164043)

* Refs #37203 - display name according to setting in errata host list

(cherry picked from commit 2775554)

* Fixes #37214 - Change 'default' for limit to environment checkbox in activation-key and content-host

(cherry picked from commit 9d0a338)

* Fixes #37198 - Allow installedDeb package attributes in safemode

(cherry picked from commit b5d785f)

* Fixes #37108 - Preload content_view_components (#10864)

(cherry picked from commit d98c17b)

* Fixes #37197 - Kickstart repository correctly listed on hostgroup

(cherry picked from commit 35c49b1)

* Fixes #36976 - Too many audit records slow down CV loading (#10911)

(cherry picked from commit 30c6977)

* Fixes #37169 - Managing a Hosts Repository Sets does not behave as expected (#10905)

(cherry picked from commit 6420cf8)

* Refs #37214 - Use Limit to Environment by default

(cherry picked from commit 8bedda9)

---------

Co-authored-by: ianballou <ianballou67@gmail.com>
Co-authored-by: Jeremy Lenz <jlenz@redhat.com>
Co-authored-by: Matěj Mudra <dyrkon603@gmail.com>
Co-authored-by: Manisha Singhal <singhal@atix.de>
Co-authored-by: Bernhard Suttner <suttner@atix.de>
Co-authored-by: Bernhard Suttner <sbernhard@users.noreply.github.com>
Co-authored-by: Partha Aji <paji@redhat.com>
Co-authored-by: Samir Jha <sjha4@ncsu.edu>
Co-authored-by: Thorben <99347625+Thorben-D@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants