-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider using a standardized API for file access #135
Comments
I'd really like to adopt a well-known/well-defined API for file management. I'm also not an expert in S3 or other potential file-related HTTP APIs. Anybody here has experience? A first look at the S3 REST API makes me feel that it is a bit too complex for our "simple" implementations. I'm not sure yet whether that API could be stripped down to just allow a minimal subset, which I'd say is mandatory to keep the openEO API simple in this regard. If it needs to be fully implemented I could only see that to be added as an extension. Are there other file APIs we could adopt? I found Azure and Google of course. |
Ideally this would be back-end heterogeneity that we would abstract away in the openEO API. |
@edzer That is what we are trying at the moment with our file API, but it is a bit limited and proprietary. If there is a standard that could be adopted with a good ecosystem, it would be a good idea to adopt it. Not sure whether the existing cloud services as S3, Azure, GCS could handle that as they usually have service specific things in their APIs. So we are basically looking for an existing standard that already did the abstraction. If there is none, we probably continue with what we have at the moment. |
GDAL supports the /vsi prefixes: /vsizip/, /vsis3/, /vsigcs/ etc see here that abstracts over many cases operationally, i.e. it is a working implementation. It does mean that a script needs to be adapted when porting from AWS to GCS. |
But that might be OK (and could even be automated). |
@edzer As discussed, that could be useful for back-end implementations, but I don't see a direct benefit for the API specification. I'm more looking for something like a simple and "modern" WebDAV. |
Maybe remoteStorage is what we are looking for: https://remotestorage.io/
Sound great, but I'm wondering how we can integrate that given the fact that we need to merge the openEO and remoteStorage authentication procedures somehow. Another interesting repo to look at is https://github.com/scality/cloudserver |
I also like remoteStorage a lot, but it has a long way to go before it replaces S3 API as the go-to REST interface. The industry seems to have settled on S3's interface for object storage - in addition to scality/cloudserver , many other solutions use the same API or provide S3-compatible proxy to GCS and others, e.g. |
Conclusion from 3rd year planning:
If S3 is not manageable for back-ends to implement, we'll fall back to what we have at the moment. |
For Sinergise, S3 (or a subset thereof) would be the preferred interface for file access and management. Swagger 2.0 spec. generated using https://github.com/APIs-guru/aws2openapi (looks quite current): https://github.com/APIs-guru/openapi-directory/blob/master/APIs/amazonaws.com/s3/2006-03-01/swagger.yaml |
Thanks @mkadunc , appreciate the links! The swagger file looks quite complicated (the file is 8000 lines, openEO API is not even half as long). Also, the generated version seems to have some issues regarding compatibility with OpenAPI. |
I suggest we focus mostly on the Object operations, and leave management of buckets up to the backend (it seems that's how we started anyway) - from the Bucket operations we'll probably only need GET (list object). I suggest we keep the openEO-mandated subset of supported API calls as small as possible, i.e. only the minimum required for basic functioning of openEO web editor. |
Makes sense. Still need to figure out what is the minimum set of endpoints you need to implement. What I don't like at all about S3 that it mandates using a different authentication procedure (HMAC?) as we currently use, which is the same reason for which we rejected remoteStorage.io. Also, the endpoints use XML, which we tried to avoid mixing with JSON at all costs. So I have more concerns implementing it after having a (quick) look at it. |
No updates yet according to the dev telco today. |
@jdries Any news on this? I'll move to "future" until there are new insights posted here. |
We have currently defined our own API for sharing files with OpenEO.
The S3 API is also a well known http-based file api (object storage).
I'm not an expert, so this is really more like a question to investigate if this would be usable.
If S3 covers all of our requirements, using it would simplify our own API, and also backend impementations as it is very widely adopted and supported by existing software.
The text was updated successfully, but these errors were encountered: