Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build robot templates for ENVO subset membership #285

Closed
turbomam opened this issue Jan 15, 2025 · 1 comment · Fixed by #288
Closed

build robot templates for ENVO subset membership #285

turbomam opened this issue Jan 15, 2025 · 1 comment · Fixed by #288
Assignees

Comments

@turbomam
Copy link
Member

turbomam commented Jan 15, 2025

This process will require some input about which EnvO classes go into which subsets. We could do that by

  • adding a see also, or some similar relationship, in EnvO between the subset property (like http://purl.obolibrary.org/obo/ENVO_03605013 in https://github.com/EnvironmentOntology/envo/blob/4fd549cce4284265e06d5064f23915de35ad5bfb/src/envo/envo-edit.owl) and the IRI for the NMDC enumeration (EnvBroadScaleSoilEnum -> nmdc_sub_schema:EnvBroadScaleSoilEnum -> https://example.com/nmdc_sub_schema/EnvBroadScaleSoilEnum, not web resolvable!) or the NMDC enumeration page, like https://microbiomedata.github.io/submission-schema/EnvBroadScaleSoilEnum/
  • adding a see also, or some similar annotation, in submission-schema between the enumeration and the CURIe for the EnvO subset property... but @sierra-moxon could that be overwritten by the enumeration generation process ?
    • ########### ENV Triad PV generation ##########
      # Specify that 'ingest-triad' is a phony target, meaning it doesn't correspond to an actual file
      .PHONY: ingest-triad temp_target
      WATCHED_DIR := notebooks/environmental_context_value_sets
      # Match only .tsv files with the specific naming convention
      WATCHED_FILES := $(shell find $(WATCHED_DIR) -type f \( \
      -name 'post_google_sheets_*_local_scale.tsv' -o \
      -name 'post_google_sheets_*_broad_scale.tsv' -o \
      -name 'post_google_sheets_*_medium.tsv' \))
      # Define an intermediate target to prevent re-triggering
      .INTERMEDIATE: temp_target
      # Main target to run ingestion commands, depending on the intermediate target
      ingest-triad: temp_target
      # Intermediate target that dynamically processes matching TSV files
      temp_target: $(WATCHED_FILES) src/nmdc_submission_schema/schema/nmdc_submission_schema.yaml
      @echo "Processing all matching TSV files in $(WATCHED_DIR)..."
      @for tsv_file in $(WATCHED_FILES); do \
      echo "Processing $$tsv_file..."; \
      $(RUN) inject-env-triad-terms -f $$tsv_file \
      -i src/nmdc_submission_schema/schema/nmdc_submission_schema.yaml \
      -o src/nmdc_submission_schema/schema/nmdc_submission_schema.yaml || exit 1; \
      done
      @touch temp_target

Partially related to this: we don't want to insert axioms into EnvO that assert: some subject that isn't defined in EnvO is in an EnvO subset. So I think we should have access to "the latest" stable EnvO, presumably as a sem-sql file. For my development, I will be building an EnvO sem-sql file locally and placing it into my oaklib cache ~/.data/oaklib/

Presumably this will also require that robot is on the path

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant