-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance Metrics #38
base: main
Are you sure you want to change the base?
Conversation
elm/wizard.py
Outdated
token_budget=None, | ||
new_info_threshold=0.7, | ||
convo=False, | ||
timeit=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add this to the parameters (input args) docstrings. Please make sure there are line breaks before Parameters
and Returns
(looks like those line breaks got removed here, not sure why)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed by adding necessary spacing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ABC
@@ -87,6 +90,9 @@ def engineer_query(self, query, token_budget=None, new_info_threshold=0.7, | |||
references : list | |||
The list of references (strs) used in the engineered prompt is | |||
returned here | |||
vector_query_time : float |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know you didnt do this but can you add a docstring for used_index
here? not sure why we dont have one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added used_index to engineer_query function
elm/wizard.py
Outdated
@@ -184,7 +192,8 @@ def chat(self, query, | |||
valid ref_col. | |||
return_chat_obj : bool | |||
Flag to only return the ChatCompletion from OpenAI API. | |||
|
|||
timeit : bool | |||
Flag to return the performance metrics on API calls. | |||
Returns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same general comments about docstrings here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed by adding spaces
total_chat_time = start_chat_time - end_time | ||
performance = { | ||
"total_chat_time": total_chat_time, | ||
"chat_completion_time": chat_completion_time, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
am i understanding correctly that the chat completion time is time to get first response from the LLM? That's great and important but if the LLM is in "stream" mode, does this actually work or does the API return something immediately but it only starts responding in the for chunk in response
generator loop? If this is the case, maybe we should have the end time be calculated at the first chunk generation in the response object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are correct in all of this. in the chat method, we default stream
to True
. I moved the chat completions metrics to after this loop. Therefore, the logic is: start_timer, start completions, stream completions, stop timer
. This change jives with your concern!
elm/wizard.py
Outdated
@@ -152,10 +160,10 @@ def chat(self, query, | |||
token_budget=None, | |||
new_info_threshold=0.7, | |||
print_references=False, | |||
return_chat_obj=False): | |||
return_chat_obj=False, | |||
timeit=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't we just always have timeit=True? I think we should just remove this as a kwarg and always return performance metrics. This will break anything that expects a certain number of outputs but the compute is basically free and it's useful information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed by removing argument
hey @jeisenman23 there are still a few outstanding comments on this PR. Can you respond to the comments and make the appropriate edits? I think this is good to go! |
total_chat_time = start_chat_time - end_time | ||
performance = { | ||
"total_chat_time": total_chat_time, | ||
"chat_completion_time": chat_completion_time, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are correct in all of this. in the chat method, we default stream
to True
. I moved the chat completions metrics to after this loop. Therefore, the logic is: start_timer, start completions, stream completions, stop timer
. This change jives with your concern!
elm/web/osti.py
Outdated
@@ -196,8 +196,8 @@ def _get_first(self): | |||
.format(self._response.status_code, | |||
self._response.reason)) | |||
raise RuntimeError(msg) | |||
first_page = self._response.json() | |||
|
|||
raw_text = self._response.text.encode('utf-8').decode('unicode-escape') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When hitting the OSTI API, GitHub actions was having trouble interpreting the json response. This is because the delimiters were messed up -- sometimes it was '/' and others '//'. this code solves it by encoding the response into utf-8, where no matter the delimiter, the encoding will be the same. Then we can decode properly.
elm/wizard.py
Outdated
token_budget=None, | ||
new_info_threshold=0.7, | ||
convo=False, | ||
timeit=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed by adding necessary spacing
@@ -87,6 +90,9 @@ def engineer_query(self, query, token_budget=None, new_info_threshold=0.7, | |||
references : list | |||
The list of references (strs) used in the engineered prompt is | |||
returned here | |||
vector_query_time : float |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added used_index to engineer_query function
elm/wizard.py
Outdated
@@ -152,10 +160,10 @@ def chat(self, query, | |||
token_budget=None, | |||
new_info_threshold=0.7, | |||
print_references=False, | |||
return_chat_obj=False): | |||
return_chat_obj=False, | |||
timeit=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed by removing argument
elm/wizard.py
Outdated
@@ -184,7 +192,8 @@ def chat(self, query, | |||
valid ref_col. | |||
return_chat_obj : bool | |||
Flag to only return the ChatCompletion from OpenAI API. | |||
|
|||
timeit : bool | |||
Flag to return the performance metrics on API calls. | |||
Returns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed by adding spaces
This PR adds time tracking capability within ELM so that we can measure how long queries to the database take, how long chat completions take, and how long the chat function takes to invoke.