Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patch Exploding Components ETL changes into production #1420

Closed
wants to merge 80 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
80 commits
Select commit Hold shift + click to select a range
5c93c47
Give this a compose file for DX
frankhereford Aug 12, 2024
e96feee
use compose
frankhereford Aug 12, 2024
e1f21c7
add sql endpoint support
frankhereford Aug 13, 2024
817e3c5
include env variables and access to the 'localhost' hostname
frankhereford Aug 13, 2024
8921508
get data back from sql endpoint
frankhereford Aug 13, 2024
4674056
throw exception if no interval designation exists
frankhereford Aug 15, 2024
7b79590
give -d a useful default arg
frankhereford Aug 15, 2024
061914f
break up these comments to be easier to read
frankhereford Aug 15, 2024
7cc4951
we're going to integrate the data in here
frankhereford Aug 15, 2024
38c3853
get the data into the function for integration
frankhereford Aug 15, 2024
9154b7c
ignore my dev json files
frankhereford Aug 22, 2024
ec43e76
import timedelta to help provide default arg for -d
frankhereford Aug 22, 2024
871a825
add a debug json output function
frankhereford Aug 22, 2024
4d904ca
integrate exploded data into the all features formation function
frankhereford Aug 22, 2024
13a4a92
manipulate graphql-engine query output to be more usable
frankhereford Aug 22, 2024
78c4590
remove debugging limit from SQL
frankhereford Aug 26, 2024
cded021
use the data re-shaping function
frankhereford Aug 26, 2024
210c3ca
remove dev comments
frankhereford Aug 26, 2024
def437d
more comments clean up
frankhereford Aug 26, 2024
e5fc569
align the SQL query with the graphql one
frankhereford Aug 26, 2024
a34e29f
worth remembering 🎗️
frankhereford Aug 26, 2024
f059f7f
transform data into the form that agol expects
frankhereford Aug 26, 2024
71fd830
no need for this to be network=host
frankhereford Aug 27, 2024
86d4982
add lookup for 4th layer
frankhereford Aug 27, 2024
b413908
include new layer
frankhereford Aug 27, 2024
7c74ee0
improve exception comment
frankhereford Aug 27, 2024
c8d69ba
don't want to keep that one
frankhereford Aug 27, 2024
a28f237
ordering
frankhereford Aug 27, 2024
14b6d45
sentence case
frankhereford Aug 27, 2024
4489151
better comment
frankhereford Aug 27, 2024
5adb740
remove debug print statements
frankhereford Aug 27, 2024
9d1b9e5
remove this whitespace
frankhereford Aug 27, 2024
a1734fd
move query into settings with its buddies
frankhereford Aug 27, 2024
24161a0
goodbye useful debug function
frankhereford Aug 27, 2024
ceff836
sort out my entrypoint and CMD
frankhereford Aug 27, 2024
ba27f6d
include the exploded stuff in this mode as well
frankhereford Aug 27, 2024
4f0f66b
append not extend
frankhereford Aug 27, 2024
dfa6b5b
fix single point creation
frankhereford Aug 27, 2024
6b12cb2
casting hackery
frankhereford Aug 27, 2024
7ae8754
clean up debug stuff
frankhereford Aug 27, 2024
5b0de5b
Create view for exploded geometry
frankhereford Sep 6, 2024
7aaf9bb
remove query from settings file
frankhereford Sep 6, 2024
5ab42d2
remove helper function
frankhereford Sep 6, 2024
76180c2
these are gone
frankhereford Sep 6, 2024
06d6987
🤖 Export view for explode-multipoint-for-agol
Sep 9, 2024
8e10c92
zap this out of here
frankhereford Sep 9, 2024
9f199da
expose tables via graphql endpoint
frankhereford Sep 9, 2024
a9ac861
add edge between components and exploded components
frankhereford Sep 9, 2024
c0c7c4b
add exploded gql query
frankhereford Sep 10, 2024
ddb37d1
remove this unneeded logic now
frankhereford Sep 10, 2024
85bb524
add solo point support
frankhereford Sep 10, 2024
8ad4361
accumulate exploded multipoint records
frankhereford Sep 10, 2024
5531387
and it works!
frankhereford Sep 10, 2024
7d50aae
a little naming change
frankhereford Sep 10, 2024
591b7d9
Improve comment
frankhereford Sep 10, 2024
2b53ea0
EOF linefeeds
frankhereford Sep 10, 2024
f1d200f
EOF linefeed
frankhereford Sep 10, 2024
0a45e64
linting
frankhereford Sep 11, 2024
fc33ea4
title specificity
frankhereford Sep 11, 2024
b446005
using established abbreviation
frankhereford Sep 11, 2024
b6c8592
work on enumeration of resources
frankhereford Sep 11, 2024
81a70e2
Retitle README
frankhereford Sep 11, 2024
babd797
revise execution instructions
frankhereford Sep 11, 2024
ae1d63d
Merge branch 'main' into explode-multipoint-for-agol
frankhereford Sep 11, 2024
a271b59
Merge remote-tracking branch 'origin/md-18877-update-proj-dev-status'…
frankhereford Sep 11, 2024
9b74407
fix double eof linefeed
frankhereford Sep 11, 2024
b2ce403
update view creation and query on view
frankhereford Sep 11, 2024
0364db2
remove removed fields
frankhereford Sep 11, 2024
d0fd1d6
🤖 Export view for explode-multipoint-for-agol
Sep 11, 2024
e568559
improve comment, H/T to Mike for the suggestion
frankhereford Sep 11, 2024
2be8ac5
Remove my debugging statements that had snuck in as comments.
frankhereford Sep 11, 2024
671679c
Just in case we build an image locally; don't want to include any jso…
frankhereford Sep 12, 2024
5b74d4a
correct the default of the HASURA endpoint without network=host
frankhereford Sep 12, 2024
4ffd297
strip down this view to just what we need
frankhereford Sep 12, 2024
e223da3
🤖 Export view for explode-multipoint-for-agol
Sep 12, 2024
a61579f
remove confusing comment
frankhereford Sep 12, 2024
de42ca4
Remove relationship now that the second table is trimmed down
frankhereford Sep 13, 2024
79eba0b
Merge pull request #1413 from cityofaustin/update-readme
frankhereford Sep 13, 2024
0e3b059
Remove the `represented` that i had in here twice. H/T Chia for catch…
frankhereford Sep 13, 2024
f247735
Merge pull request #1397 from cityofaustin/explode-multipoint-for-agol
frankhereford Sep 13, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions moped-database/metadata/tables.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -301,6 +301,35 @@
- phase_name_simple
filter: {}
allow_aggregations: true
- table:
name: exploded_component_arcgis_online_view
schema: public
select_permissions:
- role: moped-admin
permission:
columns:
- project_component_id
- exploded_geometry
- project_updated_at
filter: {}
comment: ""
- role: moped-editor
permission:
columns:
- project_component_id
- exploded_geometry
- project_updated_at
filter: {}
allow_aggregations: true
comment: ""
- role: moped-viewer
permission:
columns:
- project_component_id
- exploded_geometry
- project_updated_at
filter: {}
comment: ""
- table:
name: feature_drawn_lines
schema: public
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
DROP VIEW IF EXISTS exploded_component_arcgis_online_view;
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
CREATE VIEW exploded_component_arcgis_online_view AS
SELECT
component_arcgis_online_view.project_id,
component_arcgis_online_view.project_component_id,
ST_GEOMETRYTYPE(dump.geom) AS geometry_type,
dump.path[1] AS point_index, -- ordinal value of the point in the MultiPoint geometry
component_arcgis_online_view.geometry AS original_geometry,
ST_ASGEOJSON(dump.geom) AS exploded_geometry, -- noqa: RF04
component_arcgis_online_view.project_updated_at
FROM
component_arcgis_online_view,
LATERAL ST_DUMP(ST_GEOMFROMGEOJSON(component_arcgis_online_view.geometry)) AS dump -- noqa: RF04
WHERE
ST_GEOMETRYTYPE(ST_GEOMFROMGEOJSON(component_arcgis_online_view.geometry)) = 'ST_MultiPoint';
13 changes: 13 additions & 0 deletions moped-database/views/exploded_component_arcgis_online_view.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
-- Most recent migration: moped-database/migrations/1725649291445_add_exploded_moped_geometry_view_for_agol/up.sql

CREATE OR REPLACE VIEW exploded_component_arcgis_online_view AS SELECT
component_arcgis_online_view.project_id,
component_arcgis_online_view.project_component_id,
st_geometrytype(dump.geom) AS geometry_type,
dump.path[1] AS point_index,
component_arcgis_online_view.geometry AS original_geometry,
st_asgeojson(dump.geom) AS exploded_geometry,
component_arcgis_online_view.project_updated_at
FROM component_arcgis_online_view,
LATERAL st_dump(st_geomfromgeojson(component_arcgis_online_view.geometry)) dump (path, geom)
WHERE st_geometrytype(st_geomfromgeojson(component_arcgis_online_view.geometry)) = 'ST_MultiPoint'::text;
1 change: 1 addition & 0 deletions moped-etl/arcgis/.dockerignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
env_file
.git%
__pycache__
*.json
1 change: 1 addition & 0 deletions moped-etl/arcgis/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
*json
47 changes: 21 additions & 26 deletions moped-etl/arcgis/README.md
Original file line number Diff line number Diff line change
@@ -1,38 +1,33 @@
# ArcGIS ETLs
# Moped → ArcGIS Online ETL

Scripts which integrate Moped data with Esri ArcGIS
Python script integration pushing Moped data to ESRI's ArcGIS Online (AGOL) platform

## Publish components to ArcGIS Online (AGOL)
## Publish components to AGOL

The script `components_to_agol.py` is used to publish component record data to ArcGIS Online (AGOL). It replaces all records in the AGOL feature services with the latest component data in Moped.
The script `components_to_agol.py` is used to publish component record data to AGOL. It has two primary modes of operation:

The data is sourced from a view, `component_arcgis_online_view` which defines all columns which are available to be processed.
- Full refresh: This mode will delete all existing records in the AGOL feature layer and replace them with the current data from the Moped database.
- Incremental refresh: This mode will only update records that have been modified since a given timestamp.

The AGOL layers can be found here:
The script is responsible for maintaining four layers in the AGOL in the [Moped Project Components](https://austin.maps.arcgis.com/home/item.html?id=1c084c8756a84e6db7e2796c98c850a2) feature service:

- [Project component points](https://austin.maps.arcgis.com/home/item.html?id=997555f6e0904aa88eafe73f19ee65c0)
- [Project component lines](https://austin.maps.arcgis.com/home/item.html?id=e8f03d2cec154cacae539b630bcaa70b)
- [Moped Points](https://austin.maps.arcgis.com/home/item.html?id=1c084c8756a84e6db7e2796c98c850a2&sublayer=0): Components best represented as points, utilizing MultiPoint geometries
- [Moped Lines](https://austin.maps.arcgis.com/home/item.html?id=1c084c8756a84e6db7e2796c98c850a2&sublayer=1): Components best represented as lines, using Line geometries
- [MOPED CombinedGeometries](https://austin.maps.arcgis.com/home/item.html?id=1c084c8756a84e6db7e2796c98c850a2&sublayer=2) (SIC): All components, where points are transformed into a line ringing the location
- [Moped Feature Points](https://austin.maps.arcgis.com/home/item.html?id=1c084c8756a84e6db7e2796c98c850a2&sublayer=3): Components best represented as points, but where MultiPoints are exploded into individual points. Note, the same component can be represented as multiple features, one for each point in the MultiPoint.

### Get it running
The data for the first three layers listed above is sourced from a view, `component_arcgis_online_view` which defines all columns which are available to be processed.

1. Configure an `env_file` according to the `env_template` example. You can find the AGOL Scripts Publisher username and password in the API Secrets vault in the team password store.

2. Create and activate a Python environment that meets the requirments in `requirements.txt`. Alternatively, you can use the provided Dockerfile.

3. Run the script
The fourth layer is sourced from a derivative view, `exploded_component_arcgis_online_view`, which takes the previous view and explodes MultiPoint geometries into individual points.

If you want to fully replace the dataset:
## Running the Script

```shell
$ python components_to_agol.py -f
```
Or, if you want to replace only data updated since a timestamp with time zone offset:
```shell
$ python components_to_agol.py -d <timestamptz>
```
1. Configure an `env_file` according to the `env_template` example. You can find the AGOL Scripts Publisher username and password in the API Secrets vault in the team password store.

or, to mount your local copy to a Docker container
1. `docker compose build` to build the container.

```shell
docker run -it --rm --network host --env-file env_file -v ${PWD}:/app atddocker/atd-moped-etl-arcgis:production python components_to_agol.py
```
1. Run the script via one or more of the following:
- `docker compose run arcgis -d` to start the script with the default interval of changes over the last week.
- `docker compose run arcgis -f` to start the script with a full refresh.
- `docker compose run arcgis -d <timestamptz>` to start the script with a refresh since the given timestamp.
- `docker compose run --entrypoint /bin/bash arcgis` to start a shell inside the container.
79 changes: 64 additions & 15 deletions moped-etl/arcgis/components_to_agol.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,15 @@
#!/usr/bin/env python
"""Copies all Moped component records to ArcGIS Online (AGOL)"""
# docker run -it --rm --network host --env-file env_file -v ${PWD}:/app moped-agol /bin/bash
# docker compose run arcgis;
import argparse
import logging
from datetime import datetime, timezone
import json
from datetime import datetime, timezone, timedelta

from process.logging import get_logger
from settings import (
COMPONENTS_QUERY_BY_LAST_UPDATE_DATE,
EXPLODED_COMPONENTS_QUERY_BY_LAST_UPDATE_DATE,
UPLOAD_CHUNK_SIZE,
)
from utils import (
Expand Down Expand Up @@ -42,7 +44,7 @@ def make_esri_feature(*, esri_geometry_key, geometry, attributes):
See: https://developers.arcgis.com/documentation/common-data-types/feature-object.htm

Args:
esri_geometry_key (str): `paths` or `points`: see the `get_esri_geometry_key` docstring
esri_geometry_key (str): `paths` or `points`, `point`: see the `get_esri_geometry_key` docstring
geometry (dict): A geojson geometry object, such as one returned from our
component view in Moped
attribute (dict): Any additional properties to be included as feature attributes
Expand All @@ -59,27 +61,41 @@ def make_esri_feature(*, esri_geometry_key, geometry, attributes):
"spatialReference": {"wkid": 4326},
},
}
feature["geometry"][esri_geometry_key] = geometry["coordinates"]
if (esri_geometry_key == "points") or (esri_geometry_key == "paths"):
feature["geometry"][esri_geometry_key] = geometry["coordinates"]
elif esri_geometry_key == "point":
geometry = json.loads(geometry)
feature["geometry"]["y"] = geometry["coordinates"][1]
feature["geometry"]["x"] = geometry["coordinates"][0]
else:
feature["geometry"]["y"] = geometry["coordinates"][1]
feature["geometry"]["x"] = geometry["coordinates"][0]

return feature


def make_all_features(data):
def make_all_features(data, exploded_geometry):
"""Take a list of component feature records and create Esri feature objects for lines, points, and combined layers in AGOL.

Args:
data (dict): a list of component feature records
exploded_geometry (dict): a dictionary of exploded geometry data from the component_arcgis_online_view. This is created
by taking the multi-point geometry from the component_arcgis_online_view and "exploding" it into individual points.

Returns:
dict: An object with lists of Esri feature objects for lines, points, and combined layers
"""
all_features = {"lines": [], "points": [], "combined": []}

all_features = {"lines": [], "points": [], "combined": [], "exploded": []}

logger.info("Building Esri feature objects...")
for component in data:
# extract geometry and line geometry from component data
# for line features, the line geometry is redundant/identical to geometry
# for point features, it is the buffered ring around the point as defined
# in the Moped component view

# Extract geometry and line geometry from component data.
# For line features, the line geometry is redundant/identical to geometry.
# For point features, it is the buffered ring around the point as defined
# in the Moped component view.

geometry = component.pop("geometry")
line_geometry = component.pop("line_geometry")

Expand Down Expand Up @@ -115,6 +131,27 @@ def make_all_features(data):
attributes=component,
)
all_features["combined"].append(line_feature)

project_component_id = feature["attributes"]["project_component_id"]
# Filter exploded_geometry to only include dicts with matching project_component_id
matching_exploded_geometry_records = [
feature
for feature in exploded_geometry
if feature.get("project_component_id") == project_component_id
]
for record in matching_exploded_geometry_records:
geometry = record.pop("geometry")
esri_geometry_key = "point"

feature = make_esri_feature(
esri_geometry_key="point",
geometry=geometry,
attributes=component,
)

feature["attributes"]["source_geometry_type"] = "point"
all_features["exploded"].append(feature)

else:
all_features["lines"].append(feature)
all_features["combined"].append(feature)
Expand All @@ -140,10 +177,15 @@ def main(args):
variables=variables,
)["component_arcgis_online_view"]

all_features = make_all_features(data)
exploded_data = make_hasura_request(
query=EXPLODED_COMPONENTS_QUERY_BY_LAST_UPDATE_DATE,
variables=variables,
)["exploded_component_arcgis_online_view"]

all_features = make_all_features(data, exploded_data)

if args.full:
for feature_type in ["points", "lines", "combined"]:
for feature_type in ["points", "lines", "combined", "exploded"]:
logger.info(f"Processing {feature_type} features...")
features = all_features[feature_type]

Expand All @@ -166,7 +208,7 @@ def main(args):
project_ids_for_feature_delete = list(set(project_ids))

# Delete outdated feature from AGOL and add updated features
for feature_type in ["points", "lines", "combined"]:
for feature_type in ["points", "lines", "combined", "exploded"]:
logger.info(f"Processing {feature_type} features...")
logger.info(
f"Deleting all {len(all_features[feature_type])} existing features in {feature_type} layer for updated projects in chunks of {UPLOAD_CHUNK_SIZE}..."
Expand Down Expand Up @@ -195,15 +237,17 @@ def main(args):
"-d",
"--date",
type=str,
nargs="?",
const=(datetime.now(timezone.utc) - timedelta(days=7)).isoformat(),
default=None,
help=f"ISO date string with TZ offset (ex. 2024-06-28T00:06:16.360805+00:00) of latest updated_at value to find project records to update.",
help="ISO date string with TZ offset (ex. 2024-06-28T00:06:16.360805+00:00) of latest updated_at value to find project records to update. Defaults to 7 days ago if -d is used without a value.",
)

parser.add_argument(
"-f",
"--full",
action="store_true",
help=f"Delete and replace all project components.",
help="Delete and replace all project components.",
)

args = parser.parse_args()
Expand All @@ -214,6 +258,11 @@ def main(args):
"Please provide either the -d flag with ISO date string with TZ offset or the -f flag and not both."
)

if not args.date and not args.full:
raise Exception(
"Please provide either the -d flag with optional ISO date string with TZ offset or the -f flag."
)

if args.full:
logger.info(f"Starting sync. Replacing all projects' components data...")
else:
Expand Down
10 changes: 10 additions & 0 deletions moped-etl/arcgis/docker-compose.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
services:
arcgis:
build:
context: .
volumes:
- .:/app
entrypoint: python /app/components_to_agol.py
command: -d
env_file:
- env_file
2 changes: 1 addition & 1 deletion moped-etl/arcgis/env_template
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
AGOL_USERNAME=
AGOL_PASSWORD=
HASURA_ENDPOINT=http://localhost:8080/v1/graphql
HASURA_ENDPOINT=http://host.docker.internal:8080/v1/graphql
HASURA_ADMIN_SECRET=hasurapassword
12 changes: 11 additions & 1 deletion moped-etl/arcgis/settings.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
UPLOAD_CHUNK_SIZE = 100

LAYER_IDS = {"points": 0, "lines": 1, "combined": 2}
LAYER_IDS = {"points": 0, "lines": 1, "combined": 2, "exploded": 3}

COMPONENTS_QUERY_BY_LAST_UPDATE_DATE = """
query GetProjectsComponents($where: component_arcgis_online_view_bool_exp!) {
Expand Down Expand Up @@ -84,3 +84,13 @@
}
}
"""

# line_geometry
EXPLODED_COMPONENTS_QUERY_BY_LAST_UPDATE_DATE = """
query GetExplodedProjectsComponents($where: exploded_component_arcgis_online_view_bool_exp!) {
exploded_component_arcgis_online_view(where: $where) {
geometry: exploded_geometry
project_component_id
}
}
"""
Loading