Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rbind allow binding of different class attributes #5446

Merged
merged 39 commits into from
Jul 24, 2024

Conversation

ben-schwen
Copy link
Member

@ben-schwen ben-schwen commented Aug 25, 2022

Closes #5309 (shadowing current approach for int64 and factor)
Closes #3911 (automatically allows for mixing Date and IDate and adds ignore.attr argument)
Closes #4934 (only carries class of AsIs to result if AsIs is first in binding to stay conform with do.call(rbind, list) in this case)
Closes #5391
Closes #5542
Towards #5486 also needs #5569

  • Mix/fill dates (Date and IDate) with atomic columns
  • Mix/fill POSIXct with atomic columns
  • Mix/fill ITime
  • Mix/fill AsIs
  • Add ignore.attr argument rbindlist and rbind to manually deactivate check for equal classes of binding columns
  • Tests
  • News

@codecov
Copy link

codecov bot commented Aug 25, 2022

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.53%. Comparing base (898dce3) to head (7353bc6).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #5446   +/-   ##
=======================================
  Coverage   97.53%   97.53%           
=======================================
  Files          80       80           
  Lines       14915    14926   +11     
=======================================
+ Hits        14547    14558   +11     
  Misses        368      368           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

R/merge.R Outdated
@@ -97,7 +97,7 @@ merge.data.table = function(x, y, by = NULL, by.x = NULL, by.y = NULL, all = FAL
# Perhaps not very commonly used, so not a huge deal that the join is redone here.
missingyidx = y[!x, which=TRUE, on=by, allow.cartesian=allow.cartesian]
if (length(missingyidx)) {
dt = rbind(dt, y[missingyidx], use.names=FALSE, fill=TRUE)
dt = rbind(dt, y[missingyidx], use.names=FALSE, fill=TRUE, ignore.attr=TRUE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have unit test that covers that? It looks like a breaking change, therefore should be introduced softly, having backward compatible default for 1-2 versions, before switching to new behavior.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have now since #5857. I'm indifferent whether we add the change to merge or not. It makes the code much shorter and does not hide so much what we want to achieve but since many ppl rely on merge we should be careful about changes.

@jangorecki
Copy link
Member

This PR resolves one of the issues on 1.15.0 milestone. But it is much bigger than just fixing regression, it adds new feature. I would therefore prefer to shift it to 1.15.99, and for 1.15.0 push only regression fixes.

@MichaelChirico MichaelChirico added this to the 1.15.0 milestone Dec 26, 2023
@MichaelChirico
Copy link
Member

Agree with Jan, it would be nice to merge a minimal PR that fixes regression to send out as 1.15.0. I glanced at the diff here and didn't see an easy way to separate out the regression fix from the new functionality, is that possible @ben-schwen? Or should we work on that as a standalone PR?

@ben-schwen
Copy link
Member Author

ben-schwen commented Dec 26, 2023

#5309 has two issues in it:

  1. rbindlist(..., fill=TRUE, use.names=FALSE). This did not work previously and threw a warning changing the value of use.names to TRUE.
Warning message:
In rbindlist(l, use.names, fill, idcol) :
  use.names= cannot be FALSE when fill is TRUE. Setting use.names=TRUE.

This error of #5309 is already fixed by #5468

  1. merge used rbind(dt, yy, use.name=FALSE, fill=FALSE) and manually filled yy before binding.
    There is a working version of merge without rbindlist(fill=TRUE, use.names=FALSE) in rbindlist support fill=TRUE with use.names=FALSE and use it in merge.R ToDo of #678 #5263 but I would even roll back merge to version pre rbindlist support fill=TRUE with use.names=FALSE and use it in merge.R ToDo of #678 #5263 (since it was only a minor).

Will add tests of #5263 and file regression PR.

@MichaelChirico
Copy link
Member

All LGTM, minor requests for further improvement. Defer to @jangorecki about merge(), I think it's fine though, AIUI the main concern was about including too much in a patch release, now we have "normal" release the issue is resolved.

@jangorecki
Copy link
Member

Probably full outer join do rbind

NEWS.md Outdated Show resolved Hide resolved
NEWS.md Outdated Show resolved Hide resolved
src/rbindlist.c Outdated Show resolved Hide resolved
src/rbindlist.c Outdated Show resolved Hide resolved
src/rbindlist.c Outdated Show resolved Hide resolved
src/rbindlist.c Outdated Show resolved Hide resolved
@MichaelChirico MichaelChirico merged commit 4fd75e2 into master Jul 24, 2024
5 checks passed
@MichaelChirico MichaelChirico deleted the rbind_class_att branch July 24, 2024 17:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants