Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intake Take2 #153

Closed
dougiesquire opened this issue Feb 27, 2024 · 9 comments · Fixed by #266
Closed

Intake Take2 #153

dougiesquire opened this issue Feb 27, 2024 · 9 comments · Fixed by #266
Labels
enhancement New feature or request

Comments

@dougiesquire
Copy link
Collaborator

Intake Take2 is currently under development. It is a complete rewrite of Intake that aims "to be largely backward compatible with pre-V2 Intake sources and catalogs." However, Intake-ESM, which the access-nri-intake-catalog is built on, is a somewhat unusual application of Intake. At this point, it's not clear that there will be backwards compatibility for our application. This said, some of the new features promised by Intake Take2 may allow for a newer and better Intake-ESM, but this will obviously be a lot of work.

Strategy for now is to pin to Intake v1 and keep an eye on Intake Take2 progression/developments.

@rbeucher rbeucher added the enhancement New feature or request label Sep 30, 2024
@rbeucher
Copy link
Member

@charles-turner-1 I think it would be good to start looking into this.

@charles-turner-1
Copy link
Collaborator

Agreed. There's some discussion regarding v2 on intake-esm here, but it looks like the plan was just to pin the version for the time being.

I'll do a bit of poking into this and see what we might break if we upgrade intake to v2. At the very least we'll get an improved understanding of how hard the upgrade might be.

@charles-turner-1
Copy link
Collaborator

It looks at first glance like we might be able to upgrade from intake 0.7 => intake 2.0.7 without issue.

❯ pip list | rg 'intake'
access_nri_intake         0.1.4+47.g303f786.dirty            /Users/u1166368/catalog/access-nri-intake-catalog
intake                    2.0.7
intake_dataframe_catalog  0.2.4+0.g725e9f3.dirty             /Users/u1166368/catalog/intake-dataframe-catalog
intake-esm                2024.2.6.post16+g6ba67e1.d20241021 /Users/u1166368/catalog/intake-esm

❯ python
Python 3.12.7 | packaged by conda-forge | (main, Oct  4 2024, 15:57:01) [Clang 17.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> 

N.B I'm on 3.12.7 on my local machine - I haven't tested 3.9..3.11 yet.

  1. Upgrading from intake 0.7 => intake 2.0.7 doesn't break any tests on https://github.com/intake/intake-esm:
    Results (141.04s (0:02:21)):
       142 passed
         1 failed
           - tests/test_core.py:556 test_to_dask_opendap
         2 xfailed
    where the failure is due to a cloud resource being unavailable. This test is currently also
    broken on main.
  2. No tests are broken by the update in access-nri-intake-catalog:
    Results (20.93s):
        160 passed
  3. No tests are broken in intake-dataframe-catalog by the update:
    Results (18.43s):
      85 passed

I'm not convinced that these tests will be comprehensive - I still want to go through and double check that we can build the catalog as before, but it's definitely a very good sign.

@charles-turner-1
Copy link
Collaborator

charles-turner-1 commented Oct 22, 2024

Further investigation shows that upgrading intake to 2.0.7 is unlikely to break anything - see #187.

I'm planning to use tox to run our test suite against intake 0.7 & 2.0.7, as well as the versions of intake-esm with and without coordinate variable discovery. Once we've got that all working and properly validated we can start thinking about upgrading to take2 (if we decide to).

@marc-white
Copy link
Collaborator

marc-white commented Oct 22, 2024

@charles-turner-1 it's worth finding a way to test the catalog build process as well - the tests have pretty good coverage, but making sure everything locks together well is another matter. Then testing that we can actually read back what's been built.

@charles-turner-1
Copy link
Collaborator

How long does a full catalogue build take? I haven't actually run a full build myself - is it going to be prohibitively expensive to implement a full e2e integration test?

@marc-white
Copy link
Collaborator

The build_all.sh scripts suggests a full catalogue build is quite expensive. However, I can imagine a scenario where we only build a small subset of the full catalogue just to make sure things work right.

Ultimately, you can't really test the full catalogue build without just doing the full catalogue build, but in the case of a major upgrade like going to Intake Take2, I think it's worth the extra step before plunging in to a 3 hr, 48 cpu job.

@dougiesquire
Copy link
Collaborator Author

Further investigation shows that upgrading intake to 2.0.7 is unlikely to break anything - see #187.

That's great! Then there's the question of whether/how to use any of the cool new functionality in v2.

How long does a full catalogue build take?

It took about 1.5 hours (on 48 cores) before I went on leave...

@charles-turner-1
Copy link
Collaborator

Yeah, I agree - running a full build is also more of a moving target than building a subset. Maybe we should think about a smoke test where we build & run some queries against a fixed & hopefully representative subset of the full catalogue then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Development

Successfully merging a pull request may close this issue.

4 participants