-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CALCITE-3697] Implement BITCOUNT scalar function #3927
Conversation
site/_docs/reference.md
Outdated
@@ -2752,6 +2752,8 @@ In the following: | |||
| * | ATANH(numeric) | Returns the inverse hyperbolic tangent of *numeric* | |||
| f | BITAND_AGG(value) | Equivalent to `BIT_AND(value)` | |||
| f | BITOR_AGG(value) | Equivalent to `BIT_OR(value)` | |||
| * | BITCOUNT(integer) | Returns the bitwise COUNT of non-null *integer* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then the test case maybe should include a negative number and zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In MYSQL:
select bit_count(3.5); return 1
select bit_count(3); return 2
select bit_count(5.5); return 2
So support float number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In calcite these are decimal numbers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the decimal representation of a number is not specified, how can these functions be implemented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the MySQL implementation to support decimal values. It will only count bits set in the integer portion of a decimal value. This matches MySQL behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you have to say this in the documentation. It's a very bizarre choice, frankly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation has been updated. If it makes more sense, both BITCOUNT()
and BIT_COUNT()
can use the MySQL behaviour and become identical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@normanj-bitquill Are you sure Spark also has a bigcount function? I didn't find the corresponding record in the documentation
https://spark.apache.org/docs/3.4.0/sql-ref-functions.html#content
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a BIT_COUNT
function in Spark. I'll enable it for the Spark library.
There was discussion in this ticket
https://issues.apache.org/jira/browse/CALCITE-3732
that all libraries would get the BITCOUNT
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BIT_COUNT
is now enabled for Apache Spark.
@@ -15523,6 +15523,58 @@ void checkBitAnd(SqlOperatorFixture f0, FunctionAlias functionAlias) { | |||
f0.forEachLibrary(list(functionAlias.libraries), consumer); | |||
} | |||
|
|||
@Test public void testBitCountFunc() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add javadoc. explain that one tests BITCOUNT and the other tests BIT_COUNT.
are these methods identical? do you intend them to stay identical? If so, share code, don't copy-paste. We have done that for other operators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are not identical. BIT_COUNT()
supports decimal values. I viewed this as a unique feature in MySQL. If it makes more sense, I can use the MySQL behaviour for both BITCOUNT()
and BIT_COUNT()
.
site/_docs/reference.md
Outdated
@@ -2752,6 +2752,8 @@ In the following: | |||
| * | ATANH(numeric) | Returns the inverse hyperbolic tangent of *numeric* | |||
| f | BITAND_AGG(value) | Equivalent to `BIT_AND(value)` | |||
| f | BITOR_AGG(value) | Equivalent to `BIT_OR(value)` | |||
| * | BITCOUNT(integer) | Returns the bitwise COUNT of non-null *integer* | |||
| m | BIT_COUNT(numeric) | Returns the bitwise COUNT of the integer portion of non-null *numeric* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do these functions apply to binary, varbinary , decimal, float? if so, say so
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the documentation to mention that it also supports binary and numeric (in the case of BIT_COUNT()
).
false); | ||
f.checkScalar("bit_count(8)", "1", "BIGINT NOT NULL"); | ||
f.checkScalar("bit_count(CAST(x'ad' AS BINARY(1)))", "5", "BIGINT NOT NULL"); | ||
f.checkScalar("bit_count(5.23)", "2", "BIGINT NOT NULL"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so it ignores bits in the fractional part? that's rather subtle. worth calling out in the test, and also in the bitCount(BigDecimal)
function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it ignores the fractional part. I have updated the documentation as well as the Javadoc for bitCount(BigDecimal)
.
f.checkNull("bitcount(cast(NULL as BINARY))"); | ||
} | ||
|
||
@Test public void testBitCountMySQLFunc() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are the results when applied to negative numbers? need tests for that.
do you assume two's complement? do you assume that a 32 bit integer becomes a 32 bit negative integer? or does it become a 64 bit negative integer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The input number is converted to a 64-bit integer using two's complement. It will count the bits set in this, so bitcount(-1)
will return 64
.
Added a test.
/** Helper function for implementing <code>BITCOUNT</code>. Counts the number | ||
* of bits set in the integer portion of a decimal value. */ | ||
public static long bitCount(BigDecimal b) { | ||
return Long.bitCount(b.longValue()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also rounds down the decimal number, which is not obvious.
Please also note that a decimal may not fit into a long since we merged #3589
So I am not sure this implementation is correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I think this behavior is very unexpected, and I think decimals should not be supported at all.
Moreover, this is not the number of 1 bits in the actual internal representation, but in the conversion of the decimal to a long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some more testing with MySQL.
- bitcount(18446744073709551615) = 64
- bitcount(18446744073709551616 or greater) = 63
- bitcount(-9223372036854775808 or less) = 1
I have updating the implementation of bitcount(BigDecimal)
to match this behaviour. This is only enabled in the MySQL library.
It should match MySQL behaviour at the moment, but since this is not really documented
https://dev.mysql.com/doc/refman/8.4/en/bit-functions.html#function_bit-count
perhaps this is not worth keeping and could change at any time.
In general is it better to match MySQL behaviour (for a MySQL specific function) or is it better to take the simpler approach?
f.checkScalar(functionName + "(8)", "1", "BIGINT NOT NULL"); | ||
f.checkScalar(functionName + "(CAST(x'ad' AS BINARY(1)))", "5", "BIGINT NOT NULL"); | ||
f.checkScalar(functionName + "(-1)", "64", "BIGINT NOT NULL"); | ||
f.checkNull(functionName + "(cast(NULL as TINYINT))"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about a test with a null without a cast?
this trips some function implementations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a test for NULL
without a cast.
From my pov this can be merged, but let's see if the other reviewers have other comments. |
return Long.bitCount(b); | ||
} | ||
|
||
private static final BigDecimal BITCOUNT_MAX = new BigDecimal("18446744073709551615"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is 2^64 - 1
. I found it through testing in MySQL. I added a comment to say that it is 2^64 - 1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can use
Math.pow(2,64) -1
generate it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BigDecimal has s shiftleft method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BigInteger has a shiftleft, BigDecimal does not. I have updated the constants to use BigDecimal power, subtract and negate.
site/_docs/reference.md
Outdated
@@ -2752,6 +2752,11 @@ In the following: | |||
| * | ATANH(numeric) | Returns the inverse hyperbolic tangent of *numeric* | |||
| f | BITAND_AGG(value) | Equivalent to `BIT_AND(value)` | |||
| f | BITOR_AGG(value) | Equivalent to `BIT_OR(value)` | |||
| * | BITCOUNT(integer) | Returns the bitwise COUNT of *integer* or NULL if *integer* is NULL | |||
| * | BITCOUNT(binary) | Returns the bitwise COUNT of *binary* of NULL if *binary* is NULL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merge these two functions into one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merged.
6930fa9
to
c9a6de7
Compare
Squashed the commits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* All libraries now have a BITCOUNT() function that supports integer and binary values * The MySQL and BigQuery libraries also hav BIT_COUNT() which is an alias for BITCOUNT() * The MySQL version of BIT_COUNT() also supports decimal values (only looks at the integer portion) * BITCOUNT() counts all of the bits set in an integer or bytestring * Return NULL if the argument is NULL
85dd533
to
d3b5a06
Compare
Quality Gate passedIssues Measures |
@mihaibudiu Is there anything further needed on this PR? |
This is based on a previous PR that appears dormant.
#1746