Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bc2pg - improve type mapping #204

Open
smnorris opened this issue Dec 4, 2024 · 4 comments
Open

bc2pg - improve type mapping #204

smnorris opened this issue Dec 4, 2024 · 4 comments
Milestone

Comments

@smnorris
Copy link
Owner

smnorris commented Dec 4, 2024

As per #178, bc2pg creates numeric postgres columns for BCDC NUMBER columns.

Should bcdata :

  • automatically inspect the data to determine if it is integer and adjust the type accordingly?
  • accept an optional field map argument for overriding the BCDC types, maybe something like {"field_name1": "numeric", "field_name2": "integer"}?
  • parse the data_precision part of the schema?

As data are read by geopandas when writing to postgres:

@smnorris
Copy link
Owner Author

smnorris commented Dec 4, 2024

Distinct types of some frequently accessed datasets:

# get schemas
declare -a sources=(
"whse_fish.fiss_fish_obsrvtn_pnt_sp"  
"whse_fish.fiss_obstacles_pnt_sp" 
"whse_fish.pscis_assessment_svw" 
"whse_fish.pscis_design_proposal_svw" 
"whse_fish.pscis_habitat_confirmation_svw" 
"whse_fish.pscis_remediation_svw" 
"whse_admin_boundaries.adm_indian_reserves_bands_sp" 
"whse_admin_boundaries.adm_nr_districts_spg" 
"whse_basemapping.bcgs_20k_grid" 
"whse_basemapping.dbm_mof_50k_grid" 
"whse_basemapping.nts_250k_grid" 
"whse_basemapping.trim_cultural_lines"  
"whse_basemapping.trim_cultural_points" 
"whse_basemapping.trim_ebm_airfields" 
"whse_basemapping.trim_ebm_ocean" 
"whse_basemapping.utmg_utm_zones_sp" 
"whse_legal_admin_boundaries.abms_municipalities_sp" 
"whse_legal_admin_boundaries.abms_regional_districts_sp" 
"whse_admin_boundaries.clab_indian_reserves" 
"whse_admin_boundaries.clab_national_parks" 
"whse_basemapping.gba_local_reg_greenspaces_sp" 
"whse_basemapping.gba_railway_structure_lines_sp" 
"whse_basemapping.gba_railway_tracks_sp" 
"whse_basemapping.gba_transmission_lines_sp" 
"whse_basemapping.gns_geographical_names_sp" 
"whse_environmental_monitoring.envcan_hydrometric_stn_sp" 
"whse_fish.fiss_stream_sample_sites_sp" 
"whse_forest_tenure.ften_range_poly_svw" 
"whse_imagery_and_base_maps.mot_road_structure_sp" 
"whse_legal_admin_boundaries.abms_municipalities_sp" 
"whse_mineral_tenure.og_petrlm_dev_rds_pre06_pub_sp" 
"whse_tantalis.ta_conservancy_areas_svw" 
"whse_tantalis.ta_park_ecores_pa_svw" 
"whse_forest_tenure.ften_road_section_lines_svw" 
"whse_mineral_tenure.og_road_segment_permit_sp" 
)

for source in "${sources[@]}"
do 
	bcdata info $source | jq '.schema' > $source.json
done

# list unique types
$ cat *.json | jq -r '.[].data_type' | sort | uniq

BLOB
DATE
NUMBER
SDO_GEOMETRY
VARCHAR2

@smnorris
Copy link
Owner Author

smnorris commented Dec 4, 2024

This should enable a bc2parquet command, which should make bc2pg mostly redundant - various application databases can load the data directly from cached parquet files rather than requesting from WFS. CrunchyData/pg_parquet@e3b4476

@smnorris
Copy link
Owner Author

smnorris commented Dec 4, 2024

This replaces issue #178 because while BCDC provides data_precision for NUMBERs, it does not provide scale:
https://www.oracletutorial.com/oracle-basics/oracle-number-data-type.

@smnorris
Copy link
Owner Author

just using geopandas to_postgis and then adding the comments seems simplest, removing most of the existing bc2pg / bcdata.database code.

@smnorris smnorris changed the title improve type mapping bc2pg - improve type mapping Dec 11, 2024
@smnorris smnorris added this to the 1.0 milestone Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant