-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LocalVLLMModel and deployment handler #102
base: main
Are you sure you want to change the base?
Conversation
…soft/eureka-ml-insights into mharrison/local-vllm-model
@microsoft-github-policy-service agree company="Microsoft" |
init_args["model_config"] = model_config | ||
# Logic above is that certain deployment parameters like ports and num_servers | ||
# can be variable and so we allow them to be overridden by command line args. | ||
except: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please specify what exception you are catching here (AttributeError?)
init_args["model_config"] = ModelConfig( | ||
LocalVLLMModel, | ||
{ | ||
"model_name": args.model_config, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should model_name be passed as a separate commandline argument instead of re-using the exisiting model_config argument? Looks like this should be a string not a ModelConfig object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my thought was: someone passing --local-vllm has the option to provide --model_config as a ModelConfig or a string which uniquely identifies the model to vLLM, so we can try to catch either. this allows a bit more flexibility so that they don't have to change scripts based on which type they're passing when they call eureka (and having both model_config and model_name as command line args might confuse someone in case it's not clear to only pass the latter in conjunction with vllm?) i don't really have a strong opinion though, happy to do what you think is best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(the obvious negative of the way i've done it is that the name "model_config" is no longer accurate)
@@ -2,9 +2,13 @@ | |||
|
|||
import json | |||
import logging | |||
import random | |||
import re |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This import seems to be unused. isort should remove unused imports if you run make format-inplace
"n_output_tokens": raw_output["usage"]["completion_tokens"] | ||
} | ||
|
||
def generate(self, text_prompt, query_images=None, system_message=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have support for previous_messages similar to other models to enable chat?
|
||
|
||
@dataclass | ||
class LocalVLLMModel(Model, OpenAICommonRequestResponseMixIn): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we extend from EndpointModel instead of Model to inherit the retry, exception handling, chat history maintenance etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not confident about threadsafety here, don't we have the same potential issues with all calls to EndpointModel's generate() sharing the instance variables model_output, is_valid, etc? But if this is solved in EndpointModel and OpenAICommonRequestResponseMixIn I do think LocalVLLMModel can basically inherit its methods from these two (with just the minor logic to get a random client)
This PR introduces the
LocalVLLMModel
andLocalVLLMDeploymentHandler
, the former is a usual Model (i.e. implements generate) and the latter is a new class to handle aspects of deployment (deployment itself, health checks on your deployment, shutdown of servers).Use LocalVLLMModel either by defining a ModelConfig or by passing info in the command line so that vllm can recognize your deployment or even deploy for you.
If you have already deployed, pass "ports" parameter, otherwise the LocalVLLMDeploymentHandler will spin up "num_servers" (default = 1) servers for you, wait for deployment to finish, and continue with the eval pipeline.