-
Notifications
You must be signed in to change notification settings - Fork 693
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Are Apache Sedona geometry functions compatible with Spark Connect? #1764
Comments
Thank you for your interest in Apache Sedona! We appreciate you opening your first issue. Contributions like yours help make Apache Sedona better. |
@barrieca Sedona should be able to work with spark-connect and we have tests for it. |
@jiayuasu, do you happen to see anything obviously incorrect about the way we are starting the connect server or with how we are connecting to it via Python? Our hunch is that perhaps the jars are not correctly being loaded when the server starts. Additionally, would you mind pointing us to the tests? |
You need to add additional configuration options when starting the Spark Connect server to load Sedona's Spark SQL extension:
This will make ST_ functions available in Spark Connect sessions. |
Thanks, @Kontinuation! That worked! As an aside, is there documentation for this somewhere that we missed? |
@barrieca I don't think this is documented as spark-connect is a pretty new feature. Maybe you can help us improve the documentation here: https://sedona.apache.org/latest/setup/cluster/ |
Hello.
We are trying to use geometry data with Apache Spark Connect and Apache Sedona. We are able to convert binary geometry data to Sedona geometry types using
ST_GeomFromWKB
on a local Apache Sedona instance, but when attempting to do this via a remote Spark Connect server, theST_GeomFromWKB
function is unable to be found (see below error). Are Sedona operations compatible with a Spark Connect server?Actual behavior
Running this code produces the above error at
df.show()
. When we use Sedona Spark in conjunction with our Spark Connect server without geospatial data (i.e., we don't use.withColumn("geom", f.expr("ST_GeomFromWKB(geom)"))
), there is no error; the data is loaded and made available with thegeom
column in the original binary form.Note: We are using the PostGIS demo database found here.
Steps to reproduce the problem
2. Run the Python code above.
Settings
Sedona version = 1.7.0
Apache Spark version = 3.5.0
Scala version = 2.12
Python version = 3.8
The text was updated successfully, but these errors were encountered: