You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Requests take an indeterminate time. They can be very long particularly in a self hosted environment.
Mutiple simultaneous requests can overwhelm a self hosted server particularly in a multi user environment. Simultaneous requests on API can result in hitting token limits.
Describe the solution you'd like
Scheduling should allow requests to be diarised to run at a time. This could allow multiple requests to run overnight for example.
Queuing should allow queuing as per rules. E.g. 1, 2, etc simultaneously with and or conditions that include which models they request is for - e.g. Allowing two simultaneous requests on 4o but only 1 on o1.
Describe alternatives you've considered
Throttling can be controlled on the backend using a proxy but this seems like an inferior solution
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
Requests take an indeterminate time. They can be very long particularly in a self hosted environment.
Mutiple simultaneous requests can overwhelm a self hosted server particularly in a multi user environment. Simultaneous requests on API can result in hitting token limits.
Describe the solution you'd like
Scheduling should allow requests to be diarised to run at a time. This could allow multiple requests to run overnight for example.
Queuing should allow queuing as per rules. E.g. 1, 2, etc simultaneously with and or conditions that include which models they request is for - e.g. Allowing two simultaneous requests on 4o but only 1 on o1.
Describe alternatives you've considered
Throttling can be controlled on the backend using a proxy but this seems like an inferior solution
The text was updated successfully, but these errors were encountered: