RFC: Back osbuild-composer with a PostgreSQL database #33
Replies: 0 comments 26 replies
-
Quick thought: If there is a comprehensive test suite for the interface both backends are implementing ( |
Beta Was this translation helpful? Give feedback.
-
@larskarlitski Good write up on this. Adding a database backend makes sense, especially as we hope to increase redundancy and availability. Having a (mostly) stateless application also makes it much easier to manage in OpenShift. |
Beta Was this translation helpful? Give feedback.
-
This doesn't seem like a big drawback to me. I can imagine it being simple to deploy a default setup with the A related drawback I can imagine is needing to maintain both backends. The requirements are rather simple now so it doesn't seem like much of an issue but it could become cumbersome. I'm trying to think of a scenario where there's a need for a feature that requires some not-so-straightforward storage implementation and we end up needing to solve it twice—we would need to consider different performance implications or even the feasibility for the two very different bakends. |
Beta Was this translation helpful? Give feedback.
-
Love it, the drawbacks are very small in comparison with the benefits. |
Beta Was this translation helpful? Give feedback.
-
Another potential issue I see now is that we rely on the order of certain things in the fsjobqueue, dependencies in particular. It's always assumed that the first dependency ( I think there might be a slightly similar issue (though not a DB issue exactly) with dequeueing. When the |
Beta Was this translation helpful? Give feedback.
-
It's not clear to me what the motivation is, probably because I don't have the current worker architecture loaded into my head. What other system would need access to the job queue and why? Is this for workers to pick up jobs? Assuming there is a good reason for it that I'm missing, I do like having 2 backends. Maintaining a file based job queue means it is easier for potential users to setup a system to explore without needing to build out infrastructure just to see if they like it until they are ready to scale up to using a database. Using a SQL database instead of a key-value store is probably a good idea. It keeps the options open, even though it can be slower. Some other things to think about are: encrypted communications between the db and the clients. Client auth (setup, maintenance, removal of old accounts), overall database configuration (pgsql can be optimized in a wide variety of ways, some db admins are hands on, some are not), and it also means more moving parts to debug when something goes wrong. eg. unreliable network connections, dns issues, etc. If you are looking for go libraries and only need to support PostgreSQL, take a look at pgx and pgxpool from - https://pkg.go.dev/github.com/jackc/ which have better support for postgresql features than the more generic database/sql interface. |
Beta Was this translation helpful? Give feedback.
-
So after reading up on psql notify/listen/unlisten and I guess just as a note: currently for clouddot we'd only store an aws id not owned by us. However in future this might change if we can't find similar solutions for other cloud platforms we want to support. A database like this is slightly less ephemeral than some files on a filesystem inside a container, so at some point we might need to add support for tokenizing or something like that. Not sure it would/should happen on the level of the jobqueue though. |
Beta Was this translation helpful? Give feedback.
-
You currently rely on jobs being manually cancelled, don't you? That is, you rely on the Instead of tracking claimed entries, you can just use SQL transactions. That is, you can simply change |
Beta Was this translation helpful? Give feedback.
-
So after exploring it a little and implementing a heartbeat in the worker server, I realized that just a psql jobqueue isn't enough to scale composer. There's actually quite a bit of state in the workerserver itself with the map of running jobIds to generated tokens. The latter is a problem in and of itself, for example:
So we need to either move this indirection into the queue (thus defeating the point of the indirection I think), or remove the token->id indirection all together, but I'm not sure if that's possible? Is there a reason we can't upload to Either way fixing the worker server first takes priority over the dbqueue imo. |
Beta Was this translation helpful? Give feedback.
-
RFC: Back osbuild-composer with a PostgreSQL database
Motivation
osbuild-composer currently uses the file system to persist its state (using package fsjobqueue). This has served it well for its main use-case of running on a single machine with local access via the
weldr
API. It's easy to inspect the state on disk and it doesn't require running any additional services.It has drawbacks when deploying it, though. Attaching volumes to stateless VMs or containers is cumbersome and error-prone. It also means that there can only be one instance of osbuild-composer accessing the state at the same time, and that no other services can access the data directly (e.g., for monitoring or reporting). Most importantly, it doesn't scale beyond a few hundred jobs.
Those are all problems that various databases have been solving for ages. osbuild-composer persisting its state to one of them would enable people using it at scale, first of all our own Image Builder service.
Release Note
This is what the release note could say if this feature is implemented
osbuild-composer's job queue can now optionally be backed by a PostgreSQL database, allowing it to scale to many more jobs and running multiple instances against the same state. README.md contains documentation about how to setup a database and configure osbuild-composer to access it.
Approach
Write a PostgreSQL backend to the JobQueue interface, using its built-in SQL extensions to implement a queue that can be concurrently accessed from multiple services.
PostgreSQL has a reputation of being very stable, while also offering extensions that make it more useful than standard SQL. In particular, it makes it possible to implement a queue without resorting to an additional queuing service. Thus, only supporting this one database is worthwhile. PostgreSQL is available in all of osbuild-composer's target environments: RHEL, Fedora, and Amazon's RDS service.
Note that solely implementing
JobQueue
is not enough to back all ofweldr
's state by a database (it serializes and additionalstate.json
). It's possible to write this to a database as well, but considerably more work for a questionable amount of benefit - the local use case is served quite well with everything on the filesystem.Schema Versioning
The aim is to keep all state for the Image Builder service in this database, with many stateless services accessing it. At first, these services will be multiple instances of osbuild-composer, potentially at different versions.
Thus, the database schema has to be treated with the same API stability considerations as other APIs, because it is not private to (and versioned with) a single service. In particular, it needs to stay backwards-compatible with all services that are still accessing it. For a start, only allowing accretive, non-breaking changes is enough, as long as there's a way to allow for breaking changes in the future.
Testing
To ensure osbuild-composer is not regressing, move the tests for
fsjobqueue
that are really testing properties of theJobQueue
interface into a shared package, from where they can be run against bothfsjobqueue
and the new database-backed queue.Ideally, all integration tests also run against a version of osbuild-composer that is configured with a database backend. Most importantly, the ones exercising the
composer
andcomposer-koji
APIs should.Drawbacks
Having two backends to
JobQueue
, thus giving users an additional choice to make, complicates deploying Image Builder for them. However, the ease of use of the filesystem-based backend in the local (weldr
) scenario might make up for this.Relatedly, this means new features (or any changes) have to be implemented in both backends. This might not be that large of an issue, because the interface is meant to be fairly stable (with all logic in the jobs themselves).
Alternatives
this are in section "Motivation".)
(Arguments for using PostgreSQL are in section "Approach".)
Prototype
An early prototype of this work is at larskarlitski/osbuild-composer/tree/db.
Beta Was this translation helpful? Give feedback.
All reactions