-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #30 from CMIP-REF/dataset-cli
- Loading branch information
Showing
24 changed files
with
596 additions
and
287 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Renames `ref ingest` to `ref datasets ingest` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Adds `ref datasets list` command to list ingested datasets |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,81 @@ | ||
# Ingest Datasets | ||
|
||
This guide will walk you through the process of ingesting local datasets into the REF. | ||
Ingesting datasets is the first step in the REF workflow. | ||
|
||
The REF supports the following dataset formats: | ||
|
||
* CMIP6 | ||
|
||
Downloading the input data is out of scope for this guide, | ||
but we recommend using the [esgpull](https://esgf.github.io/esgf-download/) to download CMIP6 data. | ||
If you have access to a high-performance computing (HPC) system, | ||
you may have a local archive of CMIP6 data already available. | ||
|
||
|
||
## What is Ingestion? | ||
|
||
When processing metrics, the REF needs to know the location of the datasets and various metadata. | ||
Ingestion is the process of extracting metadata from datasets and storing it in a local database. | ||
This makes it easier to query and filter datasets for further processing. | ||
|
||
The REF extracts metadata for each dataset (and file if a dataset contains multiple files). | ||
The collection of metadata, also known as a data catalog, is stored in a local SQLite database. | ||
This database is used to query and filter datasets for further processing. | ||
|
||
## Ingesting Datasets | ||
|
||
To ingest datasets, use the `ref datasets ingest` command. | ||
This command takes a path to a directory containing datasets as an argument | ||
and the type of the dataset being ingested (only cmip6 is currently supported). | ||
|
||
This will walk through the provided directory looking for `*.nc` files to ingest. | ||
Each file will be loaded and its metadata extracted. | ||
|
||
```bash | ||
>>> ref --log-level INFO datasets ingest --source-type cmip6 /path/to/cmip6 | ||
2024-12-05 12:00:05.979 | INFO | ref.database:__init__:77 - Connecting to database at sqlite:///.ref/db/ref.db | ||
2024-12-05 12:00:05.987 | INFO | alembic.runtime.migration:__init__:215 - Context impl SQLiteImpl. | ||
2024-12-05 12:00:05.987 | INFO | alembic.runtime.migration:__init__:218 - Will assume non-transactional DDL. | ||
2024-12-05 12:00:05.989 | INFO | alembic.runtime.migration:run_migrations:623 - Running upgrade -> ea2aa1134cb3, dataset-rework | ||
2024-12-05 12:00:05.995 | INFO | ref.cli.datasets:ingest:115 - ingesting /path/to/cmip6 | ||
2024-12-05 12:00:06.401 | INFO | ref.cli.datasets:ingest:127 - Found 9 files for 5 datasets | ||
|
||
activity_id institution_id source_id experiment_id member_id table_id variable_id grid_label version | ||
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── | ||
ScenarioMIP CSIRO ACCESS-ESM1-5 ssp126 r1i1p1f1 Amon rlut gn v20210318 | ||
ScenarioMIP CSIRO ACCESS-ESM1-5 ssp126 r1i1p1f1 Amon rlut gn v20210318 | ||
ScenarioMIP CSIRO ACCESS-ESM1-5 ssp126 r1i1p1f1 Amon rsdt gn v20210318 | ||
ScenarioMIP CSIRO ACCESS-ESM1-5 ssp126 r1i1p1f1 Amon rsdt gn v20210318 | ||
ScenarioMIP CSIRO ACCESS-ESM1-5 ssp126 r1i1p1f1 Amon rsut gn v20210318 | ||
ScenarioMIP CSIRO ACCESS-ESM1-5 ssp126 r1i1p1f1 Amon rsut gn v20210318 | ||
ScenarioMIP CSIRO ACCESS-ESM1-5 ssp126 r1i1p1f1 Amon tas gn v20210318 | ||
ScenarioMIP CSIRO ACCESS-ESM1-5 ssp126 r1i1p1f1 Amon tas gn v20210318 | ||
ScenarioMIP CSIRO ACCESS-ESM1-5 ssp126 r1i1p1f1 fx areacella gn v20210318 | ||
|
||
2024-12-05 12:00:06.409 | INFO | ref.cli.datasets:ingest:131 - Processing dataset CMIP6.ScenarioMIP.CSIRO.ACCESS-ESM1-5.ssp126.r1i1p1f1.Amon.rlut.gn | ||
2024-12-05 12:00:06.431 | INFO | ref.cli.datasets:ingest:131 - Processing dataset CMIP6.ScenarioMIP.CSIRO.ACCESS-ESM1-5.ssp126.r1i1p1f1.Amon.rsdt.gn | ||
2024-12-05 12:00:06.441 | INFO | ref.cli.datasets:ingest:131 - Processing dataset CMIP6.ScenarioMIP.CSIRO.ACCESS-ESM1-5.ssp126.r1i1p1f1.Amon.rsut.gn | ||
2024-12-05 12:00:06.449 | INFO | ref.cli.datasets:ingest:131 - Processing dataset CMIP6.ScenarioMIP.CSIRO.ACCESS-ESM1-5.ssp126.r1i1p1f1.Amon.tas.gn | ||
2024-12-05 12:00:06.459 | INFO | ref.cli.datasets:ingest:131 - Processing dataset CMIP6.ScenarioMIP.CSIRO.ACCESS-ESM1-5.ssp126.r1i1p1f1.fx.areacella.gn | ||
``` | ||
|
||
|
||
### Querying ingested datasets | ||
|
||
You can query the ingested datasets using the `ref datasets list` command. | ||
This will display a list of datasets and their associated metadata. | ||
The `--column` flag allows you to specify which columns to display (defaults to all columns). | ||
See `ref datasets list-columns` for a list of available columns. | ||
|
||
```bash | ||
>>> ref datasets list --column instance_id --column variable_id | ||
|
||
instance_id variable_id | ||
───────────────────────────────────────────────────────────────────────────────────── | ||
CMIP6.ScenarioMIP.CSIRO.ACCESS-ESM1-5.ssp126.r1i1p1f1.Amon.rlut.gn rlut | ||
CMIP6.ScenarioMIP.CSIRO.ACCESS-ESM1-5.ssp126.r1i1p1f1.Amon.rsdt.gn rsdt | ||
CMIP6.ScenarioMIP.CSIRO.ACCESS-ESM1-5.ssp126.r1i1p1f1.Amon.rsut.gn rsut | ||
CMIP6.ScenarioMIP.CSIRO.ACCESS-ESM1-5.ssp126.r1i1p1f1.Amon.tas.gn tas | ||
CMIP6.ScenarioMIP.CSIRO.ACCESS-ESM1-5.ssp126.r1i1p1f1.fx.areacella.gn areacella | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
import inspect | ||
import logging | ||
|
||
from loguru import logger | ||
|
||
|
||
class _InterceptHandler(logging.Handler): | ||
def emit(self, record: logging.LogRecord) -> None: | ||
# Get corresponding Loguru level if it exists. | ||
level: str | int | ||
try: | ||
level = logger.level(record.levelname).name | ||
except ValueError: # pragma: no cover | ||
level = record.levelno | ||
|
||
# Find caller from where originated the logged message. | ||
frame, depth = inspect.currentframe(), 0 | ||
while frame and (depth == 0 or frame.f_code.co_filename == logging.__file__): | ||
frame = frame.f_back | ||
depth += 1 | ||
|
||
logger.opt(depth=depth, exception=record.exc_info).log(level, record.getMessage()) | ||
|
||
|
||
def capture_logging() -> None: | ||
""" | ||
Capture logging from the standard library and redirect it to Loguru | ||
Note that this replaces the root logger, so any other handlers attached to it will be removed. | ||
""" | ||
# logger.debug("Capturing logging from the standard library") | ||
logging.basicConfig(handlers=[_InterceptHandler()], level=0, force=True) | ||
|
||
|
||
__all__ = ["capture_logging", "logger"] |
Oops, something went wrong.