Prototype expansion of SQL transforms for single-node execution #59

pabloem · 2022-12-08T07:29:27Z

One of the main targets for the Ray Beam Runner is to support SQL (and streaming SQL).

Beam's SQL support is implemented in Java. There are two parts for the execution of SQL transforms in Beam:

Expansion: The way Beam implements expansion of multi-language transforms is by implementing an ExpansionService interface (sample of the GRPC implementation - this seems way too complicated to be honest)

My idea:

Implement a class "RayJavaExpansionService" - that receives the expansion request that can be a relatively simple thing. It must contain:
- Schema of the Input PCollection (what are schemas)
- Identifier of the transform to apply (these ideantifiers are provided by SchemaTransformProvider implementations (see a few examples)
  - Note: I will implement a Sql one: SqlSchemaTransformProvider with id "beam:schematransform:org.apache.beam:sql:v1" this week.
- Parameters for the transform (in this case, just the SQL statement)

The RayJavaExpansionService should then return the schema of the resulting PCollection, as well as the expanded graph of operations in protobuf format (the proto format).

Java dependencies:
- "org.apache.beam:beam-sdks-java-core"
- "org.apache.beam:beam-sdks-java-extensions-sql"

The expansion is not enough to execute SQL, but it's the first step. The next step is to recognize Java Stages, and execute them in a Java process rather than a Python process (basically, a Java implementation of this code, where we return some kind of JavaWorkerHandler

The text was updated successfully, but these errors were encountered:

pabloem · 2022-12-08T07:32:22Z

Ray Java resources:

fyi @iasoon @valiantljk this issue is more complex than the other stuff you've tried, but it should help move one of our big features forward. is any of you interested? : )

wilsonwang371 · 2022-12-14T06:06:18Z

i don't fully understand this issue. Since you mentioned that this SQL transforms are done in Java. does this mean that we are adding java support for our beam runner?

pabloem · 2022-12-14T21:03:36Z

yes, we would have to add support for expanding java PTransforms. I think we can limit the scope of this quite a bit while still delivering SQL execution.

wilsonwang371 · 2022-12-14T21:08:56Z

yes, we would have to add support for expanding java PTransforms. I think we can limit the scope of this quite a bit while still delivering SQL execution.

this sounds cool, if we are also targeting java. I may ask my colleagues to take a look if he is interested to join us.

wilsonwang371 · 2022-12-14T22:36:02Z

@Evan2022TT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prototype expansion of SQL transforms for single-node execution #59

Prototype expansion of SQL transforms for single-node execution #59

pabloem commented Dec 8, 2022

pabloem commented Dec 8, 2022

wilsonwang371 commented Dec 14, 2022

pabloem commented Dec 14, 2022

wilsonwang371 commented Dec 14, 2022

wilsonwang371 commented Dec 14, 2022

Prototype expansion of SQL transforms for single-node execution #59

Prototype expansion of SQL transforms for single-node execution #59

Comments

pabloem commented Dec 8, 2022

pabloem commented Dec 8, 2022

wilsonwang371 commented Dec 14, 2022

pabloem commented Dec 14, 2022

wilsonwang371 commented Dec 14, 2022

wilsonwang371 commented Dec 14, 2022