Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid handling JSON_ARRAY as multi value JSON during transformation #14738

Merged
merged 4 commits into from
Jan 2, 2025

Conversation

shounakmk219
Copy link
Collaborator

@shounakmk219 shounakmk219 commented Jan 2, 2025

The transform pipeline is failing with an ArrayIndexOutOfBoundsException when it encounters a JSON column value with empty json array as JSON value is not standardised (empty array -> null) https://github.com/apache/pinot/blob/master/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/DataTypeTransformer.java#L97

Earlier empty array for json datatype was getting extracted as string, now its getting extracted as Object[] due to the change at https://github.com/apache/pinot/pull/14547/files#diff-7ac5349f9d75e27a62a063dbf81db3ed30c8de052b4ffa7719187e4babaa60baR66
which leads to isMultiValue returning true for empty json array

convertMultiValue returns Object[] while convertSingleValue returns a string
https://github.com/apache/pinot/blob/master/pinot-spi/src/main/java/org/apache/pinot/spi/data/readers/BaseRecordExtractor.java#L39

  public Object convert(Object value) {
    Object convertedValue;
    if (isMultiValue(value)) {
      convertedValue = convertMultiValue(value);
    } else if (isMap(value)) {
      convertedValue = convertMap(value);
    } else if (isRecord(value)) {
      convertedValue = convertRecord(value);
    } else {
      convertedValue = convertSingleValue(value);
    }
    return convertedValue;
  }

Updating the transform logic to handle the empty array for JSON datatype

@xiangfu0
Copy link
Contributor

xiangfu0 commented Jan 2, 2025

Please fix the tests.

@shounakmk219
Copy link
Collaborator Author

OK, looks like I got few things wrong about the extractor, fixing it.

@codecov-commenter
Copy link

codecov-commenter commented Jan 2, 2025

Codecov Report

Attention: Patch coverage is 33.33333% with 2 lines in your changes missing coverage. Please review.

Project coverage is 63.90%. Comparing base (59551e4) to head (5bed8c2).
Report is 1525 commits behind head on master.

Files with missing lines Patch % Lines
...t/local/recordtransformer/DataTypeTransformer.java 33.33% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #14738      +/-   ##
============================================
+ Coverage     61.75%   63.90%   +2.15%     
- Complexity      207     1607    +1400     
============================================
  Files          2436     2703     +267     
  Lines        133233   150731   +17498     
  Branches      20636    23290    +2654     
============================================
+ Hits          82274    96329   +14055     
- Misses        44911    47183    +2272     
- Partials       6048     7219    +1171     
Flag Coverage Δ
custom-integration1 100.00% <ø> (+99.99%) ⬆️
integration 100.00% <ø> (+99.99%) ⬆️
integration1 100.00% <ø> (+99.99%) ⬆️
integration2 0.00% <ø> (ø)
java-11 63.80% <33.33%> (+2.09%) ⬆️
java-21 63.78% <33.33%> (+2.15%) ⬆️
skip-bytebuffers-false 63.83% <33.33%> (+2.08%) ⬆️
skip-bytebuffers-true 63.75% <33.33%> (+36.02%) ⬆️
temurin 63.90% <33.33%> (+2.15%) ⬆️
unittests 63.90% <33.33%> (+2.15%) ⬆️
unittests1 56.27% <33.33%> (+9.38%) ⬆️
unittests2 34.24% <33.33%> (+6.51%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@shounakmk219 shounakmk219 requested a review from xiangfu0 January 2, 2025 07:11
@xiangfu0 xiangfu0 merged commit 97cbbfc into apache:master Jan 2, 2025
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants