Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a GitHub Action to check for performance regressions within the incoming changes from PRs, along with two test cases based on historical regressions #6078

Merged
merged 15 commits into from
Apr 16, 2024

Conversation

Anirban166
Copy link
Member

@Anirban166 Anirban166 commented Apr 11, 2024

This adds a GHA that will get triggered by every commit in a pull request to the master branch, creating and updating one comment on the PR thread.

As discussed here, it aims to aid in detecting performance degradation for the code changes introduced in PRs using test cases based on historical regressions that we would be maintaining in inst/atime/tests.R (added the file for starters, containing the two cases that I used for the examples I mentioned there; more test cases are to be added in the future)

Should probably be skipped for certain PRs, like for changes not based on code (such as documentation or comments) or minor things that are known to not have an impact on performance.

Note: I couldn't come up with a good name for my workflow file so I just named it autocomment (hopefully not confusing since at present there isn't another GHA workflow being used in data.table that makes comments!), but please feel free to suggest a better name or change it later.

Closes #6065
Closes #4687

Copy link

github-actions bot commented Apr 11, 2024

Comparison Plot

Generated via commit a65b088

Download link for the artifact containing the test results: ↓ atime-results.zip

Time taken to finish the standard R installation steps: 11 minutes and 34 seconds

Time taken to run atime::atime_pkg on the tests: 3 minutes and 11 seconds

@tdhock
Copy link
Member

tdhock commented Apr 11, 2024

thanks! can you please change the name of the file from autocomment to performance-tests?

@Anirban166
Copy link
Member Author

thanks! can you please change the name of the file from autocomment to performance-tests?

Sure!

inst/atime/tests.R Outdated Show resolved Hide resolved
@MichaelChirico
Copy link
Member

cases based on historical regressions that we would be maintaining in inst/atime/tests.R

Is there a particular reason we need to ship this with the package? My instinct would be to put it in e.g. the .ci folder.

I only ask because we already generate a CRAN NOTE based on the size of the package, a large part of which is coming from the huge text files in inst/tests. As the set of performance regression tests grows over time, it will only add more to this burden, but I don't see the benefit of every user having these tests available (whereas we do benefit from every user being able to report test.data.table() results).

@Anirban166
Copy link
Member Author

@jangorecki Hello Jan!

I was wondering if you know of the commit/source that introduced the regression discussed in #4311 (since you mentioned in a comment there that the regression was already in 1.12.8)

@MichaelChirico
Copy link
Member

Should probably be skipped for certain PRs, like for changes not based on code (such as documentation or comments) or minor things that are known to not have an impact on performance.

Yes, I'd be fine only acting on PRs that touch files in R/ or src/. That means a path filter.

@Anirban166
Copy link
Member Author

Yes, I'd be fine only acting on PRs that touch files in R/ or src/. That means a path filter.

Right on!
I'll update my workflow with one shortly. (I've used it before in other projects, here for e.g.)

@Anirban166
Copy link
Member Author

Is there a particular reason we need to ship this with the package? My instinct would be to put it in e.g. the .ci folder.

I read the manual and under 'Details' for the atime_pkg function (on page 5), it says:

There should be a file named pkg.path/inst/atime/tests.R which defines test.list, a list with names
corresponding to different tests.

So I presume yes, unfortunately. But I believe @tdhock can change this path to be customizable and then we can redirect?

As the set of performance regression tests grows over time, it will only add more to this burden, but I don't see the benefit of every user having these tests available (whereas we do benefit from every user being able to report test.data.table() results).

I completely agree.

@tdhock
Copy link
Member

tdhock commented Apr 12, 2024

Is there a particular reason we need to ship this with the package? My instinct would be to put it in e.g. the .ci folder.

this is possible, see tdhock/atime#41 for an implementation, would that be OK?

inst/atime/tests.R Outdated Show resolved Hide resolved
inst/atime/tests.R Outdated Show resolved Hide resolved
Copy link
Member

@tdhock tdhock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better comments, thanks! please revise again though.

@MichaelChirico
Copy link
Member

this is possible, see tdhock/atime#41 for an implementation, would that be OK?

perfect, no need to address anything in this PR then

@MichaelChirico MichaelChirico mentioned this pull request Apr 12, 2024
… yesterday (tick removals), added a link to the related atime vignette, tried to write in as much detail as I understand and included optional parameters for test.list
…rrent commits to be working (tested locally, but just to ensure..)
# A list of performance tests.
#
# Each entry in this list corresponds to a performance test and contains a sublist with three mandatory arguments:
# - N: A numeric sequence of data sizes to vary.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

off-topic / thinking aloud: @tdhock have you considered extending {atime} for timing surfaces in 2 dimensions? i'm wondering if we'd benefit from (nrow, ncol) 2D benchmarking in some case. Now we are restricted to marginal approach (nrow, then ncol), but the interaction effects can sometimes be interesting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

atime is inherently 1 dimensional (N varies), but you can make nrow and ncol vary with N

inst/atime/tests.R Outdated Show resolved Hide resolved
inst/atime/tests.R Outdated Show resolved Hide resolved
inst/atime/tests.R Outdated Show resolved Hide resolved
inst/atime/tests.R Outdated Show resolved Hide resolved
Anirban166 and others added 2 commits April 12, 2024 22:29
Co-authored-by: Michael Chirico <michaelchirico4@gmail.com>
@MichaelChirico
Copy link
Member

I am happy with the current state, I'll let Toby give final approval. Thanks again!

inst/atime/tests.R Outdated Show resolved Hide resolved
inst/atime/tests.R Outdated Show resolved Hide resolved
@tdhock
Copy link
Member

tdhock commented Apr 16, 2024

thanks!

@tdhock tdhock merged commit 8476b5b into master Apr 16, 2024
3 checks passed
@tdhock
Copy link
Member

tdhock commented Apr 17, 2024

@Anirban166 can you please make a follow-up PR which adds another test, and moves tests under .ci? (I should have asked you to do that before merging, but forgot)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

A GitHub Action to perform automated performance regression testing on pull requests Continuous Benchmarking
3 participants