Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CALCITE-3697] Implement BITCOUNT scalar function #3927

Merged
merged 1 commit into from
Sep 9, 2024

Conversation

normanj-bitquill
Copy link
Contributor

  • All libraries now have a BITCOUNT() function
  • The MySQL library also has BIT_COUNT() which is an alias for BITCOUNT()
  • BITCOUNT() counts all of the bits set in an integer or bytestring
  • Return NULL if the argument is NULL

This is based on a previous PR that appears dormant.
#1746

core/src/test/resources/sql/functions.iq Show resolved Hide resolved
@@ -2752,6 +2752,8 @@ In the following:
| * | ATANH(numeric) | Returns the inverse hyperbolic tangent of *numeric*
| f | BITAND_AGG(value) | Equivalent to `BIT_AND(value)`
| f | BITOR_AGG(value) | Equivalent to `BIT_OR(value)`
| * | BITCOUNT(integer) | Returns the bitwise COUNT of non-null *integer*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then the test case maybe should include a negative number and zero.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In MYSQL:

select bit_count(3.5);  return 1
select bit_count(3); return 2
select bit_count(5.5); return 2

So support float number?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In calcite these are decimal numbers

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the decimal representation of a number is not specified, how can these functions be implemented?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the MySQL implementation to support decimal values. It will only count bits set in the integer portion of a decimal value. This matches MySQL behaviour.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you have to say this in the documentation. It's a very bizarre choice, frankly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation has been updated. If it makes more sense, both BITCOUNT() and BIT_COUNT() can use the MySQL behaviour and become identical.

Copy link
Member

@caicancai caicancai Aug 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@normanj-bitquill Are you sure Spark also has a bigcount function? I didn't find the corresponding record in the documentation
https://spark.apache.org/docs/3.4.0/sql-ref-functions.html#content

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a BIT_COUNT function in Spark. I'll enable it for the Spark library.

There was discussion in this ticket
https://issues.apache.org/jira/browse/CALCITE-3732

that all libraries would get the BITCOUNT function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BIT_COUNT is now enabled for Apache Spark.

@@ -15523,6 +15523,58 @@ void checkBitAnd(SqlOperatorFixture f0, FunctionAlias functionAlias) {
f0.forEachLibrary(list(functionAlias.libraries), consumer);
}

@Test public void testBitCountFunc() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add javadoc. explain that one tests BITCOUNT and the other tests BIT_COUNT.

are these methods identical? do you intend them to stay identical? If so, share code, don't copy-paste. We have done that for other operators.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are not identical. BIT_COUNT() supports decimal values. I viewed this as a unique feature in MySQL. If it makes more sense, I can use the MySQL behaviour for both BITCOUNT() and BIT_COUNT().

@@ -2752,6 +2752,8 @@ In the following:
| * | ATANH(numeric) | Returns the inverse hyperbolic tangent of *numeric*
| f | BITAND_AGG(value) | Equivalent to `BIT_AND(value)`
| f | BITOR_AGG(value) | Equivalent to `BIT_OR(value)`
| * | BITCOUNT(integer) | Returns the bitwise COUNT of non-null *integer*
| m | BIT_COUNT(numeric) | Returns the bitwise COUNT of the integer portion of non-null *numeric*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do these functions apply to binary, varbinary , decimal, float? if so, say so

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the documentation to mention that it also supports binary and numeric (in the case of BIT_COUNT()).

false);
f.checkScalar("bit_count(8)", "1", "BIGINT NOT NULL");
f.checkScalar("bit_count(CAST(x'ad' AS BINARY(1)))", "5", "BIGINT NOT NULL");
f.checkScalar("bit_count(5.23)", "2", "BIGINT NOT NULL");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so it ignores bits in the fractional part? that's rather subtle. worth calling out in the test, and also in the bitCount(BigDecimal) function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it ignores the fractional part. I have updated the documentation as well as the Javadoc for bitCount(BigDecimal).

f.checkNull("bitcount(cast(NULL as BINARY))");
}

@Test public void testBitCountMySQLFunc() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are the results when applied to negative numbers? need tests for that.

do you assume two's complement? do you assume that a 32 bit integer becomes a 32 bit negative integer? or does it become a 64 bit negative integer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The input number is converted to a 64-bit integer using two's complement. It will count the bits set in this, so bitcount(-1) will return 64.

Added a test.

/** Helper function for implementing <code>BITCOUNT</code>. Counts the number
* of bits set in the integer portion of a decimal value. */
public static long bitCount(BigDecimal b) {
return Long.bitCount(b.longValue());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also rounds down the decimal number, which is not obvious.
Please also note that a decimal may not fit into a long since we merged #3589
So I am not sure this implementation is correct

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I think this behavior is very unexpected, and I think decimals should not be supported at all.
Moreover, this is not the number of 1 bits in the actual internal representation, but in the conversion of the decimal to a long.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some more testing with MySQL.

  • bitcount(18446744073709551615) = 64
  • bitcount(18446744073709551616 or greater) = 63
  • bitcount(-9223372036854775808 or less) = 1

I have updating the implementation of bitcount(BigDecimal) to match this behaviour. This is only enabled in the MySQL library.

It should match MySQL behaviour at the moment, but since this is not really documented
https://dev.mysql.com/doc/refman/8.4/en/bit-functions.html#function_bit-count
perhaps this is not worth keeping and could change at any time.

In general is it better to match MySQL behaviour (for a MySQL specific function) or is it better to take the simpler approach?

f.checkScalar(functionName + "(8)", "1", "BIGINT NOT NULL");
f.checkScalar(functionName + "(CAST(x'ad' AS BINARY(1)))", "5", "BIGINT NOT NULL");
f.checkScalar(functionName + "(-1)", "64", "BIGINT NOT NULL");
f.checkNull(functionName + "(cast(NULL as TINYINT))");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about a test with a null without a cast?
this trips some function implementations

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a test for NULL without a cast.

@mihaibudiu
Copy link
Contributor

From my pov this can be merged, but let's see if the other reviewers have other comments.

return Long.bitCount(b);
}

private static final BigDecimal BITCOUNT_MAX = new BigDecimal("18446744073709551615");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this number?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is 2^64 - 1. I found it through testing in MySQL. I added a comment to say that it is 2^64 - 1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can use

Math.pow(2,64) -1

generate it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BigDecimal has s shiftleft method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BigInteger has a shiftleft, BigDecimal does not. I have updated the constants to use BigDecimal power, subtract and negate.

@@ -2752,6 +2752,11 @@ In the following:
| * | ATANH(numeric) | Returns the inverse hyperbolic tangent of *numeric*
| f | BITAND_AGG(value) | Equivalent to `BIT_AND(value)`
| f | BITOR_AGG(value) | Equivalent to `BIT_OR(value)`
| * | BITCOUNT(integer) | Returns the bitwise COUNT of *integer* or NULL if *integer* is NULL
| * | BITCOUNT(binary) | Returns the bitwise COUNT of *binary* of NULL if *binary* is NULL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge these two functions into one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merged.

@mihaibudiu mihaibudiu added the LGTM-will-merge-soon Overall PR looks OK. Only minor things left. label Aug 27, 2024
@normanj-bitquill
Copy link
Contributor Author

Squashed the commits.

Copy link
Member

@caicancai caicancai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

* All libraries now have a BITCOUNT() function that supports integer and binary values
* The MySQL and BigQuery libraries also hav BIT_COUNT() which is an alias for BITCOUNT()
* The MySQL version of BIT_COUNT() also supports decimal values (only looks at the integer portion)
* BITCOUNT() counts all of the bits set in an integer or bytestring
* Return NULL if the argument is NULL
Copy link

@normanj-bitquill
Copy link
Contributor Author

@mihaibudiu Is there anything further needed on this PR?

@mihaibudiu mihaibudiu merged commit 67405c3 into apache:main Sep 9, 2024
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LGTM-will-merge-soon Overall PR looks OK. Only minor things left.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants