Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LocalVLLMModel and deployment handler #102

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

michaelharrisonmai
Copy link
Collaborator

This PR introduces the LocalVLLMModel and LocalVLLMDeploymentHandler, the former is a usual Model (i.e. implements generate) and the latter is a new class to handle aspects of deployment (deployment itself, health checks on your deployment, shutdown of servers).

Use LocalVLLMModel either by defining a ModelConfig or by passing info in the command line so that vllm can recognize your deployment or even deploy for you.

If you have already deployed, pass "ports" parameter, otherwise the LocalVLLMDeploymentHandler will spin up "num_servers" (default = 1) servers for you, wait for deployment to finish, and continue with the eval pipeline.

gugarosa
gugarosa previously approved these changes Mar 5, 2025
@michaelharrisonmai
Copy link
Collaborator Author

@microsoft-github-policy-service agree company="Microsoft"

init_args["model_config"] = model_config
# Logic above is that certain deployment parameters like ports and num_servers
# can be variable and so we allow them to be overridden by command line args.
except:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please specify what exception you are catching here (AttributeError?)

init_args["model_config"] = ModelConfig(
LocalVLLMModel,
{
"model_name": args.model_config,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should model_name be passed as a separate commandline argument instead of re-using the exisiting model_config argument? Looks like this should be a string not a ModelConfig object

Copy link
Collaborator Author

@michaelharrisonmai michaelharrisonmai Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my thought was: someone passing --local-vllm has the option to provide --model_config as a ModelConfig or a string which uniquely identifies the model to vLLM, so we can try to catch either. this allows a bit more flexibility so that they don't have to change scripts based on which type they're passing when they call eureka (and having both model_config and model_name as command line args might confuse someone in case it's not clear to only pass the latter in conjunction with vllm?) i don't really have a strong opinion though, happy to do what you think is best.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(the obvious negative of the way i've done it is that the name "model_config" is no longer accurate)

@@ -2,9 +2,13 @@

import json
import logging
import random
import re
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This import seems to be unused. isort should remove unused imports if you run make format-inplace

"n_output_tokens": raw_output["usage"]["completion_tokens"]
}

def generate(self, text_prompt, query_images=None, system_message=None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have support for previous_messages similar to other models to enable chat?



@dataclass
class LocalVLLMModel(Model, OpenAICommonRequestResponseMixIn):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we extend from EndpointModel instead of Model to inherit the retry, exception handling, chat history maintenance etc?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not confident about threadsafety here, don't we have the same potential issues with all calls to EndpointModel's generate() sharing the instance variables model_output, is_valid, etc? But if this is solved in EndpointModel and OpenAICommonRequestResponseMixIn I do think LocalVLLMModel can basically inherit its methods from these two (with just the minor logic to get a random client)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants