Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[backend] Pipeline run's input artifacts in S3 are accessible but output artifacts are not accessible (403 Error) #9975

Closed
guntiseiduks opened this issue Sep 12, 2023 · 4 comments
Assignees
Labels
area/backend kind/bug lifecycle/stale The issue / pull request is stale, any activities remove this label.

Comments

@guntiseiduks
Copy link

guntiseiduks commented Sep 12, 2023

Environment

  • How did you deploy Kubeflow Pipelines (KFP)?
    Kubeflow Pipelines as part of a full Kubeflow deployment provides all Kubeflow components and more integration with each platform
    Kubeflow is deployed on top of AWS EKS cluster.

  • KFP version: v2

  • KFP SDK version: Issue identified in UI, possible source in backend.

Steps to reproduce

  1. In Kubeflow UI Pipeline Runs section (using "[Tutorial] Data passing in python components" pipeline run) click on any of pipeline completed steps.
  2. Click on Input artifact s3 url (s3://kf-artifacts-store-..../) and open input artifact and that works fine, i.e. one can see contents of artifact.
  3. BUT when click on output artifacts "main-logs" s3 url one gets HTTP 403 Forbidden error.

Expected result

Both input and output artifact contents are visible in preview and when clicking on S3 url.

Materials and Reference

Additional observations:

  • Issue so far is observed for s3 objects with extension .log.
  • s3 objects with extension .tgz are opened without a problem.

Impacted by this bug? Give it a 👍.

@ghaering
Copy link

ghaering commented Oct 5, 2023

Adding to what @guntiseiduks found in our environment:
Digged into this a bit more:

According to the logs

ml-pipeline-ui-788f6d84bb-k8ngv ml-pipeline-ui Getting logs for pod:pipeline-88j97-2110958434 from kf-artifact-store-20230810042827133000000001/artifacts/pipeline-88j97/pipeline-88j97-2110958434/main.log.

this is where the logs are looked for:

kf-artifact-store-20230810042827133000000001/artifacts/pipeline-88j97/pipeline-88j97-2110958434/main.log

But they are actually located here:

aws s3 ls s3://kf-artifact-store-20230810042827133000000001/artifacts/admins/pipeline-88j97/2023/10/03/pipeline-88j97-2110958434/main.log

Any ideas where this mismatch comes from?

@ghaering
Copy link

I am an engineer working on the platform @guntiseiduks was working on. There are two things going on here.
The fact that we can access tgz artifacts but not main.log is due to an Amazon WAF that gets deployed in our environment. So not a bug in Kubeflow pipelines.

The fact that the preview is not working is related to the mismatch in the paths for S3. It appears to be related to the setting artifactRepository.s3.keyFormat.

This seems to be a longstanding issue. #6704 seems related. And ours here seems to be a duplicate of #6428

I'm fine with closing this as a dup of #6428

Copy link

github-actions bot commented Jan 9, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Jan 9, 2024
Copy link

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/backend kind/bug lifecycle/stale The issue / pull request is stale, any activities remove this label.
Projects
No open projects
Development

No branches or pull requests

3 participants