-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Small fixes + UI jobs refresh + spam submission failure #1013
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…t not redirecting to stripe before submission
…o support_demo_job
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Skipped Deployment
|
supraja-968
changed the title
Support demo job
Small fixes + UI jobs refresh + spam submission failure
Aug 7, 2024
this PR has already been included in the plex migration PR to convexity. changes are in main. deployed to test and prod. closing this PR. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What type of PR is this?
Description
This PR addresses the below changes:
Bug example: user's compute tally = 450 so far. Tier threshold = 500. Each colabdesign job costs = 50 credits. So the user shouldn't be able to submit more than 1 job in this tier without being prompted to subscribe. But the user was still able to submit combinatorially more than 1, as we were updating the compute tally post job creation. So in this case, for example with 3 jobs submitted combinatorially, all 3 will be submitted, and the DB will be updated with compute tally = 600, tier = 1. This tier = 1 will then trigger the subscription the next time user submits 1 or more jobs combinatorially.
Fix: calculate compute tally before job creation(not update. the update still comes after the job creation), and redirect to subscribe page without actually submitting these jobs.
Bug: jobs weren't updating live on the UI. Everytime a user has to refresh to see the current state of the experiment.
Fix: A polling mechanism just within the jobs accordion, so the whole page doesn't refresh when the jobs refresh with their current state. (Note to dev: the dot next to experiment name still is a bit behind that it requires a refresh to catch up. But this can be addressed in a following PR).
Bug: API keys were getting created, but with a refresh, they disappear. So the creation worked, not the fetch.
Fix: the fetch was using wallet_address, where as the column name was user_id. Which holds the wallet_address still. This got missed in the big DB migration. So I have temporarily fixed it with the fetch looking for user_id, instead of migrating the column and naming it wallet_address.
Bug: With a combinatorial submission or a spam of resubmissions, some jobs were failing with 'unexpected Ray state running'.
Fix: This is due to carry over of some of the logic from ray services when we migrated to ray jobs. The gateway was setting a job to pending and subsequently running states, BEFORE submitting the job to the ray's internal queue. This is fixed by removing setting these states before submission. So the status lifecycle looks like: queued -> processing -> submit to ray -> set it to pending -> start monitoring -> set it to running/stopped/failed/succeeded based on the result of the response. With this fix, we start monitoring jobs that are in running as well as pending state. Note: 'pending' is Ray's internal convention for pending jobs. So in a previous PR we introduced another status 'processing' to differentiate jobs that are pending on the gateway side to be submitted vs jobs in the internal ray queue waiting to be picked up by a worker.
Bug: PDB files were only being used to display checkpoints, but there was no way to download them.
Fix: the addFilesToDB function was handling only the files other than PDB because they are categorized separately in the RayJobResponse struct. This is fixed by adding PDB files to DB separately after the rest of the files are added.