-
Notifications
You must be signed in to change notification settings - Fork 305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HPCC-32558 Parquet plugin not utilizing Decimal encoding #19141
Conversation
Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-32558 Jirabot Action Result: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved, with a couple of very minor comments.
plugins/parquet/parquetembed.cpp
Outdated
} | ||
|
||
/** | ||
* @brief Calls processDecimal since unsigned decimals still have a max of 64 digits with 32 digits of precision. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trivial: Function does not seem to call processDecimal().
* @brief Calls processDecimal since unsigned decimals still have a max of 64 digits with 32 digits of precision. | ||
* | ||
* @param value Data to be written to the Parquet file. | ||
* @param digits Number of bytes holding the digits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: An argument named 'digits' makes me think "number of base-10 digits in the value." Perhaps I am the only one that thinks like that but if not: Renaming that argument for clarity may be in order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previous comment replicated to processDecimal() as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with your comment as I expected the same thing. The digits name is defined in the IFieldProcessor Interface. Would this require renaming all instances of digits in the code where IFieldProcessor is implemented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, never mind then. If this was purely local to your code then the rename may be in order. If it is part of the de facto standard in the rest of the code then it's better that it stays this way. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
@ghalliday Please take a look. |
}], layout); | ||
|
||
overwriteOption := TRUE; | ||
ParquetIO.Write(decimal_data, '/home/jack/dev/debug/etc/HPCCSystems/mydropzone/regress/parquet/decimal.parquet', overwriteOption); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not pass the smoketest because it is a reference to a file path that does not exist. This probably needs to be hthor only and use a local relative filename.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed test file tags to exclude it from running on Thor and Roxie.
The Arrow library does not support creating arrow::fs::Filesystems with relative paths. It would be possible to add support if the plugin was able to get the path context from somewhere else and add it to the relative path, but for now I have set the path where the test files are created to '/etc/HPCCSystems/mydropzone/regress/decimal.parquet'. Is this appropriate or would a more general location like '/tmp/decimal.parquet' be better?
@@ -37,9 +37,9 @@ decimal_data := DATASET([{(DECIMAL) '0.12345678901234567890123456789012', | |||
}], layout); | |||
|
|||
overwriteOption := TRUE; | |||
ParquetIO.Write(decimal_data, '/home/jack/dev/debug/etc/HPCCSystems/mydropzone/regress/parquet/decimal.parquet', overwriteOption); | |||
ParquetIO.Write(decimal_data, '/etc/HPCCSystems/mydropzone/regress/decimal.parquet', overwriteOption); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better, but I don't think this is the correct path though. I think it should be
/var/lib/HPCCSystems/mydropzone/regress/...
Also, that assumes that the system has been installed in the root directory - that isn't the case for many developers. @jakesmith can you think of a clean way of doing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jackdelv - you could use Std.File.GetDefaultDropZone (or potentially Std.File.GetLandingZones if you needed >1 and/or more detail)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to use Std.File.GetDefaultDropZone to prefix the file name with a valid dropzone directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please squash and I will merge
Signed-off-by: Jack Del Vecchio <jack.delvecchio@lexisnexisrisk.com>
6d88452
to
0d13c87
Compare
@ghalliday Squashed. |
Type of change:
Checklist:
Smoketest:
Testing: