Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adopt JSONUtils.concatenateJsonStrings for concatenating JSON strings #11549

Merged
merged 19 commits into from
Oct 15, 2024

Conversation

ttnghia
Copy link
Collaborator

@ttnghia ttnghia commented Sep 30, 2024

This adopts the newly implemented JSONUtils.concatenateJsonStrings from spark-rapids-jni for concatenating JSON strings into one single string for reading using cudf's JSON reader.

Depends on:

This will also closes #10922.

Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
# Conflicts:
#	sql-plugin/src/main/scala/org/apache/spark/sql/rapids/GpuJsonReadCommon.scala
#	sql-plugin/src/main/scala/org/apache/spark/sql/rapids/GpuJsonToStructs.scala
# Conflicts:
#	sql-plugin/src/main/scala/org/apache/spark/sql/rapids/GpuJsonReadCommon.scala
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
@ttnghia ttnghia added the performance A performance related task/issue label Sep 30, 2024
@ttnghia ttnghia self-assigned this Sep 30, 2024
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>

@allow_non_gpu(*non_utc_allow)
def test_from_json_input_wrapped_in_whitespaces():
json_string_gen = StringGen(r'[ \r\n\t]{0,5}({"key":( |\r|\n|\t|)"[A-z]{0,5}"}|null|invalid|)[ \r\n\t]{0,5}')
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will generate text that is either:

  • '{"key":( |\r|\n|\t|)"[A-z]{0,5}"}'
  • 'null'
  • 'invalid'
  • Empty string

And each of these strings is surrounded by whitespace chars [ \r\n\t]{0,5}.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
@revans2 revans2 linked an issue Oct 7, 2024 that may be closed by this pull request
@ttnghia ttnghia changed the base branch from branch-24.10 to branch-24.12 October 8, 2024 18:31
@ttnghia
Copy link
Collaborator Author

ttnghia commented Oct 11, 2024

build

@ttnghia
Copy link
Collaborator Author

ttnghia commented Oct 14, 2024

build

@ttnghia ttnghia requested a review from revans2 October 14, 2024 20:22
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
@ttnghia
Copy link
Collaborator Author

ttnghia commented Oct 15, 2024

build

@ttnghia ttnghia merged commit 0510a78 into NVIDIA:branch-24.12 Oct 15, 2024
44 of 45 checks passed
@ttnghia ttnghia deleted the new_concat branch October 15, 2024 19:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance A performance related task/issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

from_json cannot support line separator in the input string.
2 participants