Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle CSV datasets & better handle updates to existing MCF #12

Merged
merged 12 commits into from
Feb 12, 2024

Conversation

davemfish
Copy link
Collaborator

@davemfish davemfish commented Feb 8, 2024

This PR adds support for .csv files, by treating them like other vector datasets that GDAL supports.

A couple other things here:

  • Debugged some issues with overwriting existing MCF data when a MetadataControl is initialized. The goal is to preserve any metadata that might already exist, except for properties that are intrinsic to the dataset (i.e. CRS, bounding box, field names & datatypes).
  • changed some function keyword arg names to match the names of the MCF properties they represent. This makes it easy to use a pattern like mc_a.set_license(**mc_b.get_license())

self.mcf['metadata']['datestamp'] = datetime.utcnow(
).strftime('%Y-%m-%d')
# fill all values that can be derived from the dataset
self._set_spatial_info()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making sure to call this whether or not an MCF already exists, in case properties of the dataset have changed.

@@ -418,9 +432,12 @@ def set_band_description(self, band_number, name=None, title=None, abstract=None
title (str): title for the raster band
abstract (str): description of the raster band
units (str): unit of measurement for the band's pixel values
type (str): of the band's values, either 'integer' or 'number'
Copy link
Collaborator Author

@davemfish davemfish Feb 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like using type as the argname, since it's a builtin function name, but that is the name of the MCF property, and making the setter's argnames match makes it convenient to do things like

mc_a.set_band_description(1, **mc_b.get_band_description(1)).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On this note, is it possible to automatically populate the band's min/max in the dimensions section? Dimensions correspond to this feature and seem to be required in the MCF, but not for ISO. I know there is a way to get this information in GDAL.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@empavia do you mean the min/max pixel values, or the number of rows and columns?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@phargogh the min/max pixels

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@phargogh actually I believe I am referring to the min/max of the band measurement. For example, the min/max elevation in meters for a DEM. An example is found in this document on Oak: "Z:\global-dataset-cache\awc-isric-soilgrids\awc.tif.ovr.aux.xml"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bringing this up @empavia , I created a separate issue to address it. Please see #15

@davemfish davemfish marked this pull request as ready for review February 9, 2024 18:57
@davemfish davemfish self-assigned this Feb 9, 2024
from geometamaker import MetadataControl

title = 'Title'
keyword = 'foo'
band_name = 'The Band'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎸

Copy link
Member

@phargogh phargogh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I appreciate that we can just throw a CSV at GDAL and then use the same API.

Have you tested what happens if we pass GDAL a malformed CSV, like one with too many records on a line or something like that?

@davemfish
Copy link
Collaborator Author

Nice! I appreciate that we can just throw a CSV at GDAL and then use the same API.

Have you tested what happens if we pass GDAL a malformed CSV, like one with too many records on a line or something like that?

Good question! It seems like GDAL uses the header row as the source of truth for number of fields, and then it ignores extra records in other lines. That seems totally fine. I don't feel like we need a test to ensure this.

@davemfish davemfish merged commit 59f0782 into natcap:main Feb 12, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants