Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sedona parameterized queries #1781

Open
MrPowers opened this issue Jan 30, 2025 · 1 comment
Open

Sedona parameterized queries #1781

MrPowers opened this issue Jan 30, 2025 · 1 comment

Comments

@MrPowers
Copy link
Contributor

Parameterized queries were recently added to Spark and allow for some really clean syntax when shifting from the Python API to the SQL API. We should consider adding this interface to Sedona.

Status quo

Create a geometry object for Minnesota:

minnesota = 'POLYGON((-96.4517 43.5008,-91.2195 43.5017,-91.3101 43.8226,…))’

Create a DataFrame:

df = sedona.read.format("shapefile").option("charset", "UTF-8").load(user_uri + extract_dir)

Create a temporary view:

df.createOrReplaceTempView('gauges')

Run a query with a f-string:

query = sedona.sql(f'''
select geometry, Status, Waterbody, Observed 
from gauges
where st_intersects(geometry, ST_GeomFromWKT('{minnesota}'))
''')

How to make this more concise with a parameterized query

This could be nicer with a parameterized query:

sedona.sql("""
select geometry, Status, Waterbody, Observed 
from {df}
where st_intersects(geometry, ST_GeomFromWKT('{minnesota}'))
""", df=df, minnesota=minnesota)

The parameterized query saves you from creating the temporary view and using f-strings.

It would be great if the parameters were also sanitized to prevent SQL injection vulnerabilities.

Another parameterized query example

This can be even cooler. Assume Minnesota is a geometry object as follows:

minnesota_wkt = 'POLYGON((-96.4517 43.5008,-91.2195 43.5017,-91.3101 43.8226,…))’
minnesota = ST_GeomFromWKT(minnesota_wkt)

Then, we can pass the geometry object directly in the parameterized query:

sedona.sql("""
select geometry, Status, Waterbody, Observed 
from {df}
where st_intersects(geometry, minnesota)
""", df=df, minnesota=minnesota)
@james-willis
Copy link
Contributor

nitpicking your example, it should be:

from shapely import from_wkt
minnesota = from_wkt(minnesota_wkt)
...

Some context:

  1. spark functions dont work outside of sql/df methods. Sedona's functions are the same
  2. the compat layer for geom literals in python is shapely. In java its JTS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants