Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated error message in rbindlist to display the class attributes of mismatching columns #4822

Draft
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

logworthy
Copy link

Thought it would be helpful to provide a little more context to this error message.

Old: Class attribute on column 1 of item 2 does not match with column 1 of item 1.
New: Class attribute on column 1 (POSIXct, POSIXt) of item 2 does not match with column 1 (Date) of item 1.

I've added some tests and attempted to update the translations.

… mismatching columns

- added 2 tests
- added joinCharVec utility function to join each element of a character vector into a single C string
- updated translations for the new error message
- added *.sublime-project and *.sublime-workspace to .gitignore
src/rbindlist.c Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Nov 26, 2020

Codecov Report

Merging #4822 (f8150fe) into master (d405d72) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #4822   +/-   ##
=======================================
  Coverage   99.47%   99.47%           
=======================================
  Files          73       73           
  Lines       14589    14611   +22     
=======================================
+ Hits        14512    14534   +22     
  Misses         77       77           
Impacted Files Coverage Δ
src/rbindlist.c 100.00% <100.00%> (ø)
src/utils.c 98.48% <100.00%> (+0.12%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d405d72...f8150fe. Read the comment docs.

src/rbindlist.c Outdated Show resolved Hide resolved
src/utils.c Outdated Show resolved Hide resolved
Copy link
Member

@jangorecki jangorecki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for PR. Some minor feedback left, haven't really looked at the logic.

src/rbindlist.c Outdated Show resolved Hide resolved
src/utils.c Outdated Show resolved Hide resolved
src/utils.c Outdated Show resolved Hide resolved
- renamed joinCharVec to concatCharVec
- fixed spacing
- added # nocov
- fixed calls to 'free'
- moved variable declarations to point of use
src/utils.c Outdated Show resolved Hide resolved
@logworthy logworthy marked this pull request as draft November 26, 2020 11:15
@logworthy
Copy link
Author

@jangorecki, @MichaelChirico thanks for the quick feedback! Please let me know if there's anything else. I've converted back to draft since I still need to update the translation files - will do those next if everything else is looking good.

@MichaelChirico
Copy link
Member

don't bother with the translations part, we handle that in one go before release.

src/rbindlist.c Outdated Show resolved Hide resolved
@logworthy logworthy marked this pull request as ready for review November 26, 2020 23:52
src/rbindlist.c Outdated
if (!R_compute_identical(thisClass, firstClass, 0)) {
#define BUFSIZE 1024 // 'error' function length limit in R's errors.c is 8192
char thisClassBuf[BUFSIZE];
char* thisClassStr = concatCharVec(thisClass, ", ");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can concatCharVec() has an argument length? So that we can avoid this assigning to thisClassStr then to thisClassBuf code, which may be more concise, IMO.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add a length argument, but on its own I don't think that wouldn't make the code more concise.

The real issue here is the dynamic allocation of the array within concatCharVec(). I think that memory needs to be freed, but the array also needs to be passed to the error() function, and I'm not sure if there is a way to free it after error() has been called.

Hence we're copying the values to the intermediate statically allocated arrays before the dynamic arrays are freed, and then passing those static arrays to error().

I'm open to feedback if there is a better way to handle this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think return a SEXP object should solve your concerns as SEXP object could be freed by R.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh, that makes things much simpler! Thanks :)

src/rbindlist.c Show resolved Hide resolved
@logworthy
Copy link
Author

I think this needs to be UNPROTECT(1)

What needs to be UNPROTECT(1)?

}

SEXP concatenatedCharVec = PROTECT(mkChar(concatenated));
free(concatenated);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I think the comment got misplaced from mobile.

This free applies to the SEXP right? if so then it needs to be UNPROTECTed

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, okay. No, it's actually being called on a char* which is dynamically allocated and used only within concatCharVec().

Copy link
Member

@mattdowle mattdowle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea and improvement.
There is already concat in fmelt.c. Can that be enhanced and reused for this usage? The main enhancement needed being able to be called twice from one error/warning message. It constrains the maximum length of the string which is needed, and this PR doesn't do currently. When its idx is NULL, rather than error as it does now, it could concat all the items in order as needed in this PR.
Or, if it were me, I would have just included the first item of the class vector in the message; no concat would be needed. In this example, POSIXct != Date is already clear and helpful to the user. They don't need to know that POSIXct inherits from POSIXt too. I can only think of esoteric constructed examples where the message might state that myClass != myClass (and thus be confusing to the user) because myClass somehow didn't inherit the same way.
If you decide to include all the classes in the message and reuse fmelt.c:concat, its static could be removed and its signature added to data.table.h so it could be used here.
Also please remove the translation changes from PR as Michael said it will be done as part of release procedures in one shot. Or, since it has already been translated, maybe @MichaelChirico is ok to leave the translation in this PR this time? Up to him on that.

@mattdowle mattdowle added the WIP label May 6, 2021
@MichaelChirico
Copy link
Member

MichaelChirico commented May 6, 2021

yes, please revert the translations since unfortunately they're incorrect 😄 -- we'll let a native speaker handle updating them (and other new/changed messages) in one go before release.

Thanks! I've added a small section on Translation to the Contributing wiki: https://github.com/Rdatatable/data.table/wiki/Contributing

@MichaelChirico
Copy link
Member

Hi @logworthy would you like to resume this PR? Please take note of related PRs like #5874 -- is the functionality here still needed given recent enhancements?

cc @ben-schwen again.

@ben-schwen
Copy link
Member

Definitely an improvement, but would go with Matt's comment and only return the first item of class vector, which makes it much simpler.

It won't interfere with my recent PRs since they try to solve the problem and not just improve the error message.

@MichaelChirico
Copy link
Member

Thanks... marking as draft until #5570 is merged, unless someone wants to rebase this PR relative to that one first.

@MichaelChirico MichaelChirico marked this pull request as draft January 7, 2024 02:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants