Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add missing numeric precisions in Hive type system #501

Merged
merged 14 commits into from
Jul 25, 2024
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/**
* Copyright 2017-2023 LinkedIn Corporation. All rights reserved.
* Copyright 2017-2024 LinkedIn Corporation. All rights reserved.
* Licensed under the BSD-2 Clause license.
* See LICENSE in the project root for license information.
*/
Expand All @@ -12,7 +12,7 @@
import org.apache.calcite.sql.type.SqlTypeUtil;


// Copied from Hive source code
// Precision and scale values copied from Hive source code
public class HiveTypeSystem extends RelDataTypeSystemImpl {
// TODO: This should come from type system; Currently there is no definition
// in type system for this.
Expand All @@ -29,6 +29,10 @@ public class HiveTypeSystem extends RelDataTypeSystemImpl {
private static final int DEFAULT_CHAR_PRECISION = 255;
private static final int MAX_BINARY_PRECISION = Integer.MAX_VALUE;
private static final int MAX_TIMESTAMP_PRECISION = 9;
private static final int DEFAULT_TINYINT_PRECISION = 3;
private static final int DEFAULT_SMALLINT_PRECISION = 5;
private static final int DEFAULT_INTEGER_PRECISION = 10;
private static final int DEFAULT_BIGINT_PRECISION = 19;
Comment on lines +32 to +35
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you link the reference for these values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the javadoc to specify where the values are coming from, this entire class was copied from Hive source code but has now deviated. We have a backlog item mentioned above to eventually rename this to CoralTypeSystem.


@Override
public int getMaxScale(SqlTypeName typeName) {
Expand Down Expand Up @@ -84,6 +88,14 @@ public int getDefaultPrecision(SqlTypeName typeName) {
case INTERVAL_MINUTE_SECOND:
case INTERVAL_SECOND:
return SqlTypeName.DEFAULT_INTERVAL_START_PRECISION;
case TINYINT:
return DEFAULT_TINYINT_PRECISION;
case SMALLINT:
return DEFAULT_SMALLINT_PRECISION;
case INTEGER:
return DEFAULT_INTEGER_PRECISION;
case BIGINT:
return DEFAULT_BIGINT_PRECISION;
default:
return -1;
}
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/**
* Copyright 2017-2023 LinkedIn Corporation. All rights reserved.
* Copyright 2017-2024 LinkedIn Corporation. All rights reserved.
* Licensed under the BSD-2 Clause license.
* See LICENSE in the project root for license information.
*/
Expand Down Expand Up @@ -622,6 +622,44 @@ public void testNameSakeColumnNamesShouldGetUniqueIdentifiers() {
assertEquals(generated, expected);
}

@Test
public void testUnionIntAndBigInt() {
final String sql = "SELECT a FROM test.tableOne UNION ALL SELECT bigint_col FROM test.tableInt";
RelNode rel = converter.convertSql(sql);
assertEquals(rel.getRowType().getFieldCount(), 1);
// Should be BIGINT since it is a less restrictive type than INT
assertEquals(rel.getRowType().getFieldList().get(0).getType().getSqlTypeName(), SqlTypeName.BIGINT);
}

@Test
public void testUnionIntAndSmallInt() {
final String sql = "SELECT smallint_col FROM test.tableInt UNION ALL SELECT a FROM test.tableOne";
RelNode rel = converter.convertSql(sql);
assertEquals(rel.getRowType().getFieldCount(), 1);
// Should be INT since it is a less restrictive type than SMALLINT
assertEquals(rel.getRowType().getFieldList().get(0).getType().getSqlTypeName(), SqlTypeName.INTEGER);
}

@Test
public void testUnionIntAndTinyInt() {
final String sql = "SELECT tinyint_col FROM test.tableInt UNION ALL SELECT a FROM test.tableOne";
RelNode rel = converter.convertSql(sql);
assertEquals(rel.getRowType().getFieldCount(), 1);
// Should be INT since it is a less restrictive type than TINYINT
assertEquals(rel.getRowType().getFieldList().get(0).getType().getSqlTypeName(), SqlTypeName.INTEGER);
}

@Test
public void testIntCastToBigIntDuringComparison() {
// We're testing that a comparison between INT and BIGINT sees a cast on the more restrictive type to the
// less restrictive type and not the other way around. In other words, the INT is cast to BIGINT.
final String sql = "SELECT CASE WHEN int_col = bigint_col THEN 'abc' ELSE 'def' END FROM test.tableInt";
String expected = "LogicalProject(EXPR$0=[CASE(=(CAST($2):BIGINT, $3), 'abc', 'def')])\n"
+ " LogicalTableScan(table=[[hive, test, tableint]])\n";

assertEquals(relToString(sql), expected);
}

private String relToString(String sql) {
return RelOptUtil.toString(converter.convertSql(sql));
}
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/**
* Copyright 2017-2023 LinkedIn Corporation. All rights reserved.
* Copyright 2017-2024 LinkedIn Corporation. All rights reserved.
* Licensed under the BSD-2 Clause license.
* See LICENSE in the project root for license information.
*/
Expand Down Expand Up @@ -82,6 +82,8 @@ public static TestHive setupDefaultHive(HiveConf conf) throws IOException {
driver.run("CREATE DATABASE IF NOT EXISTS test");
driver.run("CREATE TABLE IF NOT EXISTS test.tableOne(a int, b varchar(30), c double, d timestamp)");
driver.run("CREATE TABLE IF NOT EXISTS test.tableTwo(x int, y double)");
driver.run(
"CREATE TABLE IF NOT EXISTS test.tableInt(tinyint_col tinyint, smallint_col smallint, int_col int, bigint_col bigint)");

driver.run("CREATE DATABASE IF NOT EXISTS fuzzy_union");

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,15 +11,13 @@
import java.util.Set;

import org.apache.calcite.rel.type.RelDataType;
import org.apache.calcite.sql.SqlBasicTypeNameSpec;
import org.apache.calcite.sql.SqlCall;
import org.apache.calcite.sql.SqlDataTypeSpec;
import org.apache.calcite.sql.SqlNode;
import org.apache.calcite.sql.SqlOperator;
import org.apache.calcite.sql.SqlTypeNameSpec;
import org.apache.calcite.sql.fun.SqlStdOperatorTable;
import org.apache.calcite.sql.type.BasicSqlType;
import org.apache.calcite.sql.type.SqlTypeName;
import org.apache.calcite.sql.type.SqlTypeUtil;

import com.linkedin.coral.common.HiveTypeSystem;
import com.linkedin.coral.common.calcite.CalciteUtil;
Expand Down Expand Up @@ -122,12 +120,6 @@ protected SqlCall transform(SqlCall sqlCall) {
}

private SqlCall castOperand(SqlNode operand, RelDataType relDataType) {
return SqlStdOperatorTable.CAST.createCall(ZERO, operand, getSqlDataTypeSpecForCasting(relDataType));
}

private SqlDataTypeSpec getSqlDataTypeSpecForCasting(RelDataType relDataType) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add an example of Coral translation for before & after this code change? Since there's no difference in unit tests, does that mean this is a bug or an unnecessary transformation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add an example of Coral translation for before & after this code change

After adding in the precisions, the first case in testSubstrWithTimestampOperator in HiveToTrinoConverterTest started to fail because we were creating a faulty type spec that fails this assertion in the Calcite's derive type function for SqlBasicTypeNameSpec. Therefore getSqlDataTypeSpecForCasting is a bug, if you look at SqlTypeUtil.convertTypeToSpec there are more involved steps that involve using precision/scale values.

final SqlTypeNameSpec typeNameSpec = new SqlBasicTypeNameSpec(relDataType.getSqlTypeName(),
relDataType.getPrecision(), relDataType.getScale(), null, ZERO);
return new SqlDataTypeSpec(typeNameSpec, ZERO);
return SqlStdOperatorTable.CAST.createCall(ZERO, operand, SqlTypeUtil.convertTypeToSpec(relDataType));
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -994,4 +994,19 @@ public void testSqlSelectAliasAppenderTransformerWithoutTableAliasPrefix() {
+ "WHERE \"tablea\".\"a\" > 5";
assertEquals(expandedSql, expected);
}

@Test
public void testIntCastToBigIntDuringComparison() {
// We're testing that a comparison between INT and BIGINT sees a cast on the more restrictive type to the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this required? What happens if you query Trino directly with a comparison operation without the CAST?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not explicitly required (tested in trino to confirm), however, Calcite (during validation) will implicitly type coerce comparisons between 2 types by casting the more restrictive type to the more restrictive type. This PR just makes sure we give Calcite the right precision values to figure out correctly which type is more/less restrictive.

// less restrictive type and not the other way around. In other words, the INT is cast to BIGINT.
RelNode relNode = TestUtils.getHiveToRelConverter().convertSql(
"SELECT CASE WHEN a_integer = a_bigint THEN 'abc' ELSE 'def' END FROM test.table_from_utc_timestamp");
KevinGe00 marked this conversation as resolved.
Show resolved Hide resolved
RelToTrinoConverter relToTrinoConverter = TestUtils.getRelToTrinoConverter();
String expandedSql = relToTrinoConverter.convert(relNode);

String expected =
"SELECT CASE WHEN CAST(\"table_from_utc_timestamp\".\"a_integer\" AS BIGINT) = \"table_from_utc_timestamp\".\"a_bigint\" THEN 'abc' ELSE 'def' END\n"
+ "FROM \"test\".\"table_from_utc_timestamp\" AS \"table_from_utc_timestamp\"";
assertEquals(expandedSql, expected);
}
}
Loading