You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hermetic test is a fast way to verify functionality e2e without requiring an integration test env. In addition to the basic test we have today, we should add the following:
Test framework improvements:
Run a fake k8s API server so we don't need to fake the reconcilers
Verify metrics (not added yet)
Test Case
Test when model is not found in LLMService
Test when ModelServerPool is not found
Test when no backend pods are available
Test invalid request (e.g., doesn't contain "model")
Test backend server error, client should receive an error with an appropriate error code
Verify traffic split
Test algorithm
Sheddable request succeeds when resource is available; and dropped when resource is constrained
Verify min KV cache algo without LoRA
Verify LoRA affinity algo for the "warm up" case (when no pods has loaded any LoRA yet). This requires sending multiple requests and verify later requests will be sticky to backend pods.
The text was updated successfully, but these errors were encountered:
Hermetic test is a fast way to verify functionality e2e without requiring an integration test env. In addition to the basic test we have today, we should add the following:
Test framework improvements:
Test Case
The text was updated successfully, but these errors were encountered: