Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use fsspec for all non-local urls #1906

Merged
merged 12 commits into from
Mar 14, 2025
Merged

Conversation

braingram
Copy link
Contributor

@braingram braingram commented Mar 10, 2025

Description

This PR uses fsspec for all non-local urls (including http). As fsspec is an optional dependency this PR:

  • adds a deprecation warning for http urls stating that in a future version fsspec will be required and then the existing http handling code is used
  • adds a new optional dependency group http to pyproject.toml that installs fsspec with http support (and adds this dependency group to all)

The above change allows for native handling of s3 urls (instead of requiring opening the url first in s3fs) allowing the following to work:

asdf.open("s3://bucket/file.asdf")

Adding a test for this behavior was complicated by:

  • the lack of a suitable small public asdf file on s3 to use for testing (so we need to mock s3)
  • the mock s3 (moto) is picky about it's dependencies, often conflicting with boto (the aws client), adding these as test dependencies for all runs seemed like it would make the test runs brittle
  • the s3 mocking needs to happen before any fsspec import. This could be done as part of the unit tests but that seemed overly complicated (as most tests don't require the mock s3). Given that the overall utility of the s3 tests is minimal (if we assume fsspec is doing what it advertises by abstracting out the backend, testing http should be sufficient) I chose to isolate the s3 tests (more on that below).

The s3 tests were added to integration_tests/s3 to:

  • keep them out of the packaged wheel
  • keep them out of normal pytest runs (so downstream package maintainers don't have to run these tests as part of their packaging workflows)
  • allow the s3 mocking to occur for the entire session (without interfering with other tests)

Since the compatibility_tests are sort of integration_tests I moved them into the same subdirectory (so they're now at integration_tests/compatibility.

I'll call out a couple other changes inline.

Tasks

  • run pre-commit on your machine
  • run pytest on your machine
  • Does this PR add new features and / or change user-facing code / API? (if not, label with no-changelog-entry-needed)
    • write news fragment(s) in changes/: echo "changed something" > changes/<PR#>.<changetype>.rst (see below for change types)
    • update relevant docstrings and / or docs/ page
    • for any new features, add unit tests
news fragment change types...
  • changes/<PR#>.feature.rst: new feature
  • changes/<PR#>.bugfix.rst: bug fix
  • changes/<PR#>.doc.rst: documentation change
  • changes/<PR#>.removal.rst: deprecation or removal of public API
  • changes/<PR#>.general.rst: infrastructure or miscellaneous change

@braingram braingram force-pushed the native_fsspec branch 10 times, most recently from 68be063 to 0dc4de3 Compare March 12, 2025 18:04
On this sad day in history our hero was
defeated by the great moto.

The combination of moto mocking s3,
boto to talk to the mocked s3 and fsspec
being used for http tests makes the s3 tests
a rather large wrench to throw into the unit tests.

The issues are:

- s3 tests must be setup before http tests so that
  the mock s3 configuration is setup before fsspec
  is imported. This complicates otherwise simple
  http tests.
- moto boto and s3fs often result in conflicting
  dependencies (looking at you aiobotocore) which
  in the past created impossible environments and
  currently results in installing and uninstalling
  certain versions to find one that works.

Since we already have "compatibility_tests" I figured
we could make "integration_tests" which now include:

- compability tests (same as before)
- s3 tests (to run s3 tests in isolation for 1 environment)

If we assume fsspec is doing what it says on the tin
then testing http should be enough. I am in favor of
doing some sort of s3 test since that's the main target
fsspec backend (and not hadoop, arrow, etc). We're not
testing the other backends so I'm ok with just having
minimal s3 tests and relying on fsspec to test that it
supports the backends it claims.
@braingram braingram changed the title TST: use fsspec for all non-local urls Use fsspec for all non-local urls Mar 13, 2025
@@ -402,7 +403,8 @@ def close(self):
"""
if self._close:
self._fd.close()
self._fix_permissions()
if hasattr(self, "_fix_permissions"):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a minor bugfix that I found when allowing fsspec to natively handle http urls. The _fix_permissions method is only present in the RealFile subclass.

"asdf[lz4]",
"asdf[http]",
]
lz4 = [
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although not required for this PR since I was changing optional-dependencies I figured I'd separate out the lz4 one so that downstream users (like roman_datamodels) could use asdf[lz4].

"psutil",
"pytest>=8",
"pytest-remotedata",
]
test = [
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I snuck this in because I often type pip install asdf[test] then it complains that tests was expected. I figured we could just make both work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do this too 😅

@braingram braingram marked this pull request as ready for review March 13, 2025 14:31
@braingram braingram requested a review from a team as a code owner March 13, 2025 14:31
Copy link
Member

@zacharyburnett zacharyburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, good isolation solution

@braingram
Copy link
Contributor Author

Testing on the roman research nexus (which has some ASDF files on s3) with this PR and roman_datamodels (post spacetelescope/roman_datamodels#453) I was able to run:

m = rdm.open("s3://...")

successfully.

@braingram braingram merged commit d70ac0f into asdf-format:main Mar 14, 2025
49 of 50 checks passed
@braingram braingram deleted the native_fsspec branch March 14, 2025 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants