-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increasing test coverage? #10
Comments
@danielinteractive, you are welcome! Very nice to know that you find the package useful! I am not sure what is the actual coverage of the existing tests, but 20% seems lower than what I would expect. The package has quite a few test cases using testthat but I have been leaving those tests out of the CRAN submissions because they are mainly visual tests that vdiffr very easily reports as not passed even with the visual review showing no changes. Say for example running and saving tests results in the now pending 'ggplot2' 3.4.0-rc results in some 20 test cases failing when switching to 'ggplot2' 3.3.6 and vice-versa, but without any detectable visual change in the output. So, I think that in addition to adding new test cases, a solution to this problem is needed. In other words most of the saved test results in the current tests are specific to 'ggplot2' versions. The latest version of 'ggpp', available here in GitHub, includes 206 test cases using 'testthat' and 'vdiffr'. This is under development and some newer code is still missing tests. I have for some time been planning to try to submit 'ggpp' and 'ggpmisc' for peer review at ROpenScience but have lacked the time to get these packages ready for such submission. So, pull requests are very welcome for improved test coverage and also exchange of ideas about possible updates to the package code itself, if needed for validation/compliance. I have found that getting good test coverage for graphical output is tricky because updates to 'ggplot2' and 'vdiffr' can change the rendered plot in ways that do not affect how a plot looks like. For some functions like position functions, it would most likely better to create test cases that do not render a plot, and test the actual returned numerical values instead. One final thought is that the code in 'ggpp' was originally in 'ggpmisc' and some of it may be indirectly covered by test cases in 'ggpmisc'. Hoping to hear from you soon. (I may not answer straight away as I will be travelling for the next three weeks, but I will have internet access every day.) |
Thanks @aphalo for the fast reply! Great. So our baseline number here comes from out-of-the-box |
@danielinteractive o.k., yes, I see 23%. Several of the functions with 0% coverage have documentation examples that could be used as tests. However, I have been thinking about how to make the tests more robust after recently reading a blog post by the current maintainer of 'ggplot2' at (https://www.tidyverse.org/blog/2022/09/playing-on-the-same-team-as-your-dependecy/#testing-testing). |
Interesting- thanks for sharing. So maybe using |
I explored a bit the tests in the 'ggplot2' 3.4.0-rc branch. I see I expect to be busy with other tasks during the next 20 days, with little time available for 'ggpp' before 20 November. I will review any pull request as soon as possible but a.s.a.p. may mean a few days delay during this period. I just noticed that 'ggplot2' 3.4.0 is now in the main branch, about to be submitted to CRAN. |
@danielinteractive @sob2021 In branch |
@danielinteractive @sob2021 I have added unit tests that should provide nearly 100% coverage for Do you know of a way to get around this problem? Or another way of measuring coverage that is more reliable and at the same time acceptable for your aims. |
cool thanks @aphalo ! Unfortunately I am not an expert (yet) with the coverage measurement. |
@danielinteractive I have not tried 'ggplot2' is tricky to work with as even breakpoints set in the debugger are in most cases ignored, requiring use In the 'covr' repo in GitHub and in its vignette there are some dicussions about different difficulties, including dealing with |
@danielinteractive I was able to setup the action easily but some of the 'vdiffr' tests produce plots that seem to be different in Ubuntu at GitHub than on Windows 10 locally. So coverage is not measured. As part of this process I added the test cases to the package build but for the same reason this broke the CHECK Action in Github. I had to push these changes to GitHub to see if they worked. The commints are now reverted and pushed back the version that does not include tests in the package build. |
Hi Pedro @aphalo , let us try to have @cicdguy make a PR to introduce Regarding failing Unfortunately r-lib/vdiffr#125 has not been started yet, this would be the final solution, because then you could store snapshots for different OS. (Maybe you would like to comment there to support the proposal) |
@danielinteractive The coverage workflow is now working, and the package build now includes all tests so that the CHECK workflow also runs them. BFalquet's pull request #23 still needs improvements. alexjoerich's pull request #24 looks good to me, although testing with different summary functions will not increase much the coverage. It is still marked as draft but I reviewed it giving some suggestions. It should now be possible to see what contibution these tests make to coverage. I am not sure how you are organizing this, are you supervising these people's work/training? Would you like to have additional rights to the repo? |
thanks @aphalo that is great! Yeah we coordinate this here in our team. We will continue pushing on the mentioned PRs and ping you for re-review once ready. I don't think more rights are needed at this point |
@danielinteractive Hi Daniel. I just watched "Statistical Software Engineering 101: Unit Testing for R Developers - The Basics". Very useful and clear! Although, the unit tests in my packages have had relatively low coverage, tests have indeed been very useful. 'ggpp' had specially low coverage, because the geoms where earlier part of 'ggpmisc' and jointly tested with the stats in the unit tests that are now separated in 'ggpmisc', I have been lazy and run locally these tests indirectly with 'revdep'. The usefulness of tests has been most noticeable in 'photobiology' a package much larger and complex than 'ggpp' defining an S3 class hierarchy and numerous methods and operators for the classes. It currently has nearly 5000 unit tests but still only 51% coverage, this is not as bad as it looks as the most crucial things are being tested thoroughly. One thing that you do not mention in the video is that in my experience the unit tests, if included in the package as they are in 'photobiology' and now also in 'ggpp' ensure that changes in the dependencies are detected early enough, even by the maintainers of the dependencies before the updates to their packages are accepted in CRAN. The other thing is that visual tests can be a pain, because of false positives that need to be checked one by one visually. If one has hundreds of them as in 'ggpmisc' it can take quite some time to check them. Some updates to 'ggplot2' or 'vdiffr' have made all tests fail, forcing me to walk one by one through unit tests even when nothing was actually broken (things like very small changes in the thickness of lines used, even visually, undetectable or changes to the default justification of the plot title). So, I think, that for non-visual tests the cost once written is close to zero, while visual tests do add a cost to maintenance that needs to be counterbalanced by a greater benefit. Nowadays, we have the tools to avoid the use of visual tests as I have learnt from your group's contribution to 'ggpp'. I appreciate very much these contributions. Please, feel free to add Dinakar and BFalquet as contributors if you think their contributions are signicant enough for this. |
@danielinteractive @cicdguy One problem that is keeping the "apparent" coverage low is that compute_panel and compute_group functions are defined separately and then assigned when the ggproto object is built. If the anonymous functions are assigned directly, 'covr' sees them, but if instead the functions are named and assigned using the name 'covr' does not see them. I edited this in a few cases and I will edit the other stats and geoms. However, in a few cases the same functions are used in more than one stat, so defining them inline as anonymous would mean duplicating quite a few lines of code. Do you have any idea on how to get around this? |
Thanks @aphalo for your feedback on the unit testing video - actually we are about to record a second "advanced" one (together with @yonicd), so I will try to include your suggested topics in there. About the function renaming - yeah not sure if there is a general workaround for this, maybe it could be a good question to the |
@danielinteractive I thought about testing those functions on their own but then I would need to fake the arguments that the function will get as input and then test the returned grob. I will just duplicate the code adding comments to get the needed coverage and in parallel discuss the issue with the authors of 'covr'. This will allow us to see where more tests are still needed. Much of the low coverage is due to this as most of the needed tests are already in place but not seen when the coverage percent is computed. |
@danielinteractive Now the overall coverage is about 75%. I am almost sure all tests are now seen by 'covr' so in some functions there is still code that is not being tested. |
That number sounds very nice already, thanks Pedro for your help with this! We are currently finishing up a few items we had on the radar for tests, and will then check back with our validation experts whether we pass the requirements already, or whether we need to add more tests. |
@danielinteractive Now the overall coverage is about 88%. I was wrong, there was still code not seen. I also added tests for some internal functions as I promised to do some time ago. We are lacking tests for one statistic with 0% coverage, otherwise all files report > 75% coverage. An issue was raised yesterday about a bug in a boundary case with the density stats with too few observations, but I fixed it. |
Awesome! So we should be there very soon now. |
Hi @aphalo I collaborate with @danielinteractive on a few repos and wanted to suggest a package called {covrpage}. You may find helpful to track and communicate the testing you are developing. It can be used within the regular CI/CD framework to generate the report as part of the regular commit/PR process. From running it on the master branch, I am seeing that a few tests are not passing and two files that dont have any tests running on them Happy to help set up the automated report for you if you find it useful. Yoni Expand to see the full reportCoverageCoverage summary is created using the
Unit TestsUnit Test summary is created using the Show Detailed Test Results
Session Info
|
Hi @yonicd Overall, though, the main issue with test coverage in 'ggpp' is that because of how 'ggplot2' works with a lot of code as functions assigned as list members, 'covr' only "sees" a fraction of all the code in the package, so also the test coverage % is not computed based on all the code in the package nor on all the code that tests are testing. I did not setup the covr action myself, Danikar, a member of Daniel's team did. I would greatly appreciate your help with setting up the 'covrpage' action. I am quite busy at the moment. Pedro. |
Hi @yonicd By looking at 'covrpage' you seem to be still using Travis. I am trying to only use GitHub actions for CI. I guess the coverage table can be also generated locally... Is this true? |
@yonicd @danielinteractive In the case of 'vdiffr' tests, for the same reason that they are disabled on CRAN, it would be unwise an confussing to have them in a vignette. In a README they would need a very clear warning. These tests break much more often than "normal" tests, and most failures are not because of bugs but because of irrelevant changes, usually invisible changes, in the rendered SVG output caused by intentional updates in depedencies, including even system fonts, etc. The test failures you found, seem to be an example of this. All tests pass cleanly locally and also on GitHub CI but not under the setup you used. I think a precondition for publishing test results, as oppossed to using them internally, is that false positives are a rarity rather than the most frequent cause of test failures. I am already uneasy about a test coverage badge that is based on a coverage that ignores all anonymous functions saved as part of lists, which are at the core of plot layer functions. |
@yonicd @danielinteractive I merged the pull request from @alexjoerich. The test coverage action that Danikar setup is no longer working, possibly because of the recent commits at https://github.com/insightsengineering/ |
'ggpp' 0.5.2 is in CRAN |
@danielinteractive I updated |
@danielinteractive In a separate branch I am implementing |
Thanks @aphalo for letting me know - great to see that you are adding further nice features to the package. And since you add the unit tests and coverage stays very good, all fine from our perspective :) |
@danielinteractive 'ggpp' 0.5.5 submitted minutes ago to CRAN. Main change is compatibility with 'ggplot2' development (future 3.5.x). At the moment the test action at GitHub is failing. The problem is that there are small differences in the graphical output generated under different versions of R. So, vdiffr tests either fail under R-devel or R-release. These tests are anyway skipped on CRAN. So status at CRAN is OK. |
Thanks @aphalo , in those cases I usually regenerate the snapshots so that GitHub actions can pass again. |
@danielinteractive, do you know if it is possible to set-up tests so that different snapshots are used for r-devel, and r-release and r-old-release? The current snapshots are for r-devel, which seems to me the most likely test to break in such a way that would need action on my part to fix it. An alternative would be to use only the results from tests under r-devel to build the icon displayed in the README... |
I don't think that is possible. But you could double check with the vdiffr GitHub folks. |
@danielinteractive I am planning to submit 'ggpp' 0.5.7 to CRAN wihtin a couple of days. Coverage is now back to about 93%. |
Hi @aphalo , first of all thanks for the nice package, really useful extensions here!
One question - we are currently looking into this package (actually only a few of its functions) to use for cases where we need to "validate" the packages. Without going into details, the hard requirement there is to have sufficient unit test coverage of the package.
So currently we see around 20% test coverage in
ggpp
. But we would need at least 80% overall, and each function tested at least with one test, to get through our requirements.I was wondering - would you be open to us contributing tests via PRs? Since that seems the most straightforward way.
Cheers
Daniel
The text was updated successfully, but these errors were encountered: