Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(database): indices for columns used in 'where' and 'order by' clauses #3880

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

iainsproat
Copy link
Contributor

@iainsproat iainsproat commented Jan 23, 2025

Description & motivation

We analyzed a running database and identified significant tables where indices were involved in a low proportion of queries. This PR adds indices to select table columns to aim to improve the index hit ratio and overall performance.

Existing indices can be found with:

select *
from pg_indexes
WHERE schemaname='public'
AND tablename='automation_triggers';

Changes:

  • migration to add indices for columns in select tables

To-do before merge:

Screenshots:

Validation of changes:

Checklist:

  • My pull request follows the guidelines in the Contributing guide?
  • My pull request does not duplicate any other open Pull Requests for the same update/change?
  • My commits are related to the pull request and do not amend unrelated code or documentation.
  • My code follows a similar style to existing code.
  • I have added appropriate tests.
  • I have updated or added relevant documentation.

References

Copy link

linear bot commented Jan 23, 2025

@iainsproat iainsproat changed the title chore(database): indices for columns used in 'where' and 'order by' c… chore(database): indices for columns used in 'where' and 'order by' clauses Jan 23, 2025
@iainsproat iainsproat marked this pull request as ready for review January 24, 2025 09:30
@iainsproat
Copy link
Contributor Author

iainsproat commented Jan 24, 2025

knex doesn't seem to add Indices concurrently knex/knex#3913
Need to amend creation of index to use a raw sql, as so: https://github.com/backstage/backstage/pull/22594/files

@iainsproat iainsproat marked this pull request as draft January 24, 2025 09:35
})
await knex.schema.alterTable('commits', (table) => {
// table.index('id') // index should already be available due to this being the primary key
table.index('author')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do these really need to be separate indices, instead of composite ones? it was at least the case in MySQL that if you had an index like "x, y, z", then it would be used even if you filter by only "x" or "x" and "y", as long as the order is the same (so it wouldn't work when just filtering by "z")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

important to think about this, cause they do take up space

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@fabis94 fabis94 Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's even better than what I expected then, I didn't think there would be a speedup on just filtering "y" at all. But my point is that we need to look at what the actual filter conditions are, and if "y" is usually used together with "x", then there's no need to make "y" have its own index

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just having X and Y as separate indices could even be worse than having a "X, Y" index, if both are filtered together. It's just going to use the X or the Y index, where "X, Y" would be the better one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getPaginatedProjectModelsBaseQueryFactory, getModelTreeItemsFilteredBaseQueryFactory, getUserAuthoredCommitCountsFactory query on the author column alone

@fabis94
Copy link
Contributor

fabis94 commented Jan 24, 2025

surprising that so many tables don't have appropriate indices. we need to be more attentive to new table creating migrations, indices should be planned out from the very beginning, if possible

@iainsproat
Copy link
Contributor Author

surprising that so many tables don't have appropriate indices. we need to be more attentive to new table creating migrations, indices should be planned out from the very beginning, if possible

There's a lot more in the list linked in the Linear issue, but those were much smaller tables (at the moment). I'll do the work of sweeping them up in some follow up PRs.

Copy link
Contributor Author

@iainsproat iainsproat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all the indices, and there are indeed some candidates for being composite indices instead. Will revise.

export async function up(knex: Knex): Promise<void> {
await knex.schema.alterTable('stream_commits', (table) => {
// table.index('streamId') // index should already be available due to this being the first item in the composite primary key
table.index('commitId')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getCommitStreamsFactory & markCommitStreamUpdatedFactory has a 'where' statement on this column alone

})
await knex.schema.alterTable('branch_commits', (table) => {
// table.index('branchId') // index should already be available due to this being the first item in the composite primary key
table.index('commitId')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

markCommitBranchUpdatedFactory, moveCommitsToBranchFactory, getCommitBranchesFactory, switchCommitBranchFactory has a 'where' statement on this column alone

})
await knex.schema.alterTable('refresh_tokens', (table) => {
// table.index('id') // index should already be available due to this being the primary key
table.index('userId')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleteExistingAuthTokensFactory has a 'where' statement on this column alone. (Though I'm confused as to what the statement in that factory is actually doing, it appears to query and then not use the result)

await knex.schema.alterTable('commits', (table) => {
// table.index('id') // index should already be available due to this being the primary key
table.index('author')
table.index('sourceApplication')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getPaginatedProjectModelsBaseQueryFactory, getModelTreeItemsFilteredBaseQueryFactory, getStreamsSourceAppsFactory query on the sourceApplication column alone

})
await knex.schema.alterTable('commits', (table) => {
// table.index('id') // index should already be available due to this being the primary key
table.index('author')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getPaginatedProjectModelsBaseQueryFactory, getModelTreeItemsFilteredBaseQueryFactory, getUserAuthoredCommitCountsFactory query on the author column alone

//table.index('id') // index should already be available due to this being the primary key
// table.index('gendoGenerationId') // already explicitly indexed
table.index('versionId')
table.index('createdAt') // used in an ORDER BY clause. https://www.postgresql.org/docs/current/indexes-ordering.html
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getLatestVersionRenderRequestsFactory and getVersionRenderRequestFactory include this column in an ORDER BY. It is always used alongside versionId, so could be a composite index.

})
await knex.schema.alterTable('pwdreset_tokens', (table) => {
// table.index('id') // index should already be available due to this being the primary key
table.index('email')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

baseQueryFactory includes this column, and is used alone in deleteTokensFactory

await knex.schema.alterTable('pwdreset_tokens', (table) => {
// table.index('id') // index should already be available due to this being the primary key
table.index('email')
table.index('createdAt')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getPendingTokenFactory includes this column, but alongside email so could be the secondary in a composite index.

})
await knex.schema.alterTable('automation_triggers', (table) => {
// table.index('automationRevisionId') // index should already be available due to this being the first item in the composite primary key
table.index('triggerType')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getActiveTriggerDefinitionsFactory includes this column, alongside triggeringId
getAutomationTriggerDefinitionsFactory includes this column, alongside automationRevisionId.

await knex.schema.alterTable('automation_triggers', (table) => {
// table.index('automationRevisionId') // index should already be available due to this being the first item in the composite primary key
table.index('triggerType')
table.index('triggeringId')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getActiveTriggerDefinitionsFactory includes this column, but only alongside triggerType

@iainsproat iainsproat marked this pull request as ready for review February 5, 2025 09:39
@iainsproat iainsproat marked this pull request as draft February 6, 2025 11:06
@iainsproat iainsproat marked this pull request as ready for review February 6, 2025 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants