Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Metrics #38

Open
wants to merge 54 commits into
base: main
Choose a base branch
from
Open

Performance Metrics #38

wants to merge 54 commits into from

Conversation

jeisenman23
Copy link
Collaborator

This PR adds time tracking capability within ELM so that we can measure how long queries to the database take, how long chat completions take, and how long the chat function takes to invoke.

@grantbuster grantbuster self-requested a review November 15, 2024 20:20
elm/wizard.py Outdated
token_budget=None,
new_info_threshold=0.7,
convo=False,
timeit=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add this to the parameters (input args) docstrings. Please make sure there are line breaks before Parameters and Returns (looks like those line breaks got removed here, not sure why)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed by adding necessary spacing

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ABC

@@ -87,6 +90,9 @@ def engineer_query(self, query, token_budget=None, new_info_threshold=0.7,
references : list
The list of references (strs) used in the engineered prompt is
returned here
vector_query_time : float
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you didnt do this but can you add a docstring for used_index here? not sure why we dont have one.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added used_index to engineer_query function

elm/wizard.py Outdated
@@ -184,7 +192,8 @@ def chat(self, query,
valid ref_col.
return_chat_obj : bool
Flag to only return the ChatCompletion from OpenAI API.

timeit : bool
Flag to return the performance metrics on API calls.
Returns
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same general comments about docstrings here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed by adding spaces

total_chat_time = start_chat_time - end_time
performance = {
"total_chat_time": total_chat_time,
"chat_completion_time": chat_completion_time,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

am i understanding correctly that the chat completion time is time to get first response from the LLM? That's great and important but if the LLM is in "stream" mode, does this actually work or does the API return something immediately but it only starts responding in the for chunk in response generator loop? If this is the case, maybe we should have the end time be calculated at the first chunk generation in the response object.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are correct in all of this. in the chat method, we default stream to True. I moved the chat completions metrics to after this loop. Therefore, the logic is: start_timer, start completions, stream completions, stop timer. This change jives with your concern!

elm/wizard.py Outdated
@@ -152,10 +160,10 @@ def chat(self, query,
token_budget=None,
new_info_threshold=0.7,
print_references=False,
return_chat_obj=False):
return_chat_obj=False,
timeit=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't we just always have timeit=True? I think we should just remove this as a kwarg and always return performance metrics. This will break anything that expects a certain number of outputs but the compute is basically free and it's useful information.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed by removing argument

@grantbuster
Copy link
Member

grantbuster commented Dec 17, 2024

hey @jeisenman23 there are still a few outstanding comments on this PR. Can you respond to the comments and make the appropriate edits?

I think this is good to go!

total_chat_time = start_chat_time - end_time
performance = {
"total_chat_time": total_chat_time,
"chat_completion_time": chat_completion_time,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are correct in all of this. in the chat method, we default stream to True. I moved the chat completions metrics to after this loop. Therefore, the logic is: start_timer, start completions, stream completions, stop timer. This change jives with your concern!

elm/web/osti.py Outdated
@@ -196,8 +196,8 @@ def _get_first(self):
.format(self._response.status_code,
self._response.reason))
raise RuntimeError(msg)
first_page = self._response.json()

raw_text = self._response.text.encode('utf-8').decode('unicode-escape')
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When hitting the OSTI API, GitHub actions was having trouble interpreting the json response. This is because the delimiters were messed up -- sometimes it was '/' and others '//'. this code solves it by encoding the response into utf-8, where no matter the delimiter, the encoding will be the same. Then we can decode properly.

elm/wizard.py Outdated
token_budget=None,
new_info_threshold=0.7,
convo=False,
timeit=False):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed by adding necessary spacing

@@ -87,6 +90,9 @@ def engineer_query(self, query, token_budget=None, new_info_threshold=0.7,
references : list
The list of references (strs) used in the engineered prompt is
returned here
vector_query_time : float
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added used_index to engineer_query function

elm/wizard.py Outdated
@@ -152,10 +160,10 @@ def chat(self, query,
token_budget=None,
new_info_threshold=0.7,
print_references=False,
return_chat_obj=False):
return_chat_obj=False,
timeit=False):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed by removing argument

elm/wizard.py Outdated
@@ -184,7 +192,8 @@ def chat(self, query,
valid ref_col.
return_chat_obj : bool
Flag to only return the ChatCompletion from OpenAI API.

timeit : bool
Flag to return the performance metrics on API calls.
Returns
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed by adding spaces

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants