Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major refactor to use dataclasses that loosely follow DataPackage - Resource models #24

Merged
merged 15 commits into from
Jul 25, 2024

Conversation

davemfish
Copy link
Collaborator

@davemfish davemfish commented Jul 22, 2024

This PR moves us away from Metadata Control Files and pygeometa and towards defining our own data models to represent metadata for vectors, rasters, tables, and compressed files.

The parent metadata model is models.Resource. It loosely follows https://datapackage.org/standard/data-resource/, but also includes some other properties that we liked from MCF.

Defining our own models means we can include just we want, in a structure that makes sense to us. For example,

  • Contact info section is simpler (see models.ContactSchema)
  • keywords section is simpler: it's just a list of keywords. We lose the ability to define multiple keyword lists that reference different dictionaries, use different languages, etc. If those things are useful, we can build them back in.
  • vectors & tables can have different schema from rasters, for describing fields vs. bands

Is there anything missing from the new models?

@phargogh I'm using a dataclass feature (keyword-only args) only available since Python 3.10. It doesn't look like there's an easy workaround, so before trying to make this compatible with early versions, I'm submitting as-is and we can discuss if we think it's critical to support earlier.

@davemfish davemfish self-assigned this Jul 24, 2024
Copy link

@empavia empavia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will defer to @phargogh on the code formatting, etc. To me, it seems all of the fields we care about are here and the file formats are represented.

@davemfish davemfish marked this pull request as ready for review July 25, 2024 13:54
Copy link
Member

@phargogh phargogh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a great refactor! I like the use of dataclasses, and the class hierarchy makes sense to me. I left a couple of questions in the PR that I don't think should hold up merging, they're just asking about two edge cases that could easily be added whenever it makes sense to do so.

On the topic of python 3.10+, I think that restriction is just fine to keep. The use of kw-only arguments is more self-documenting and so feels very in line with the spirit of the package.

Anyways, I think I'll merge!

class ArchiveResource(Resource):
"""Class for metadata for an archive resource."""

compression: str
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hypothetically, suppose we had an uncompressed tarfile (so the extension is .tar, not .tar.gz). Are we able to handle this case with ArchiveResource?

Comment on lines +536 to +540
class RasterResource(Resource):
"""Class for metadata for a raster resource."""

schema: RasterSchema
spatial: SpatialSchema
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inheritance hierarchy for rasters and vectors makes sense to me given the database programming model of a vector. Raster Attribute Tables might be another interesting sort of challenge, though ... should we handle RATs in some way? Not necessarily advocating for handling a RAT in a different way than the normal table would be handled, just wanted to ask!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants