You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Try to use this tool instead of our self-written solution.
Create separate implementation of validation and prompt testing with deepeval.
Describe any difficulties or bugs discovered on the way.
Create an article or blog post describing proc and cons of using this tool for prompt testing, validation, and results visualisation comparing to current solution.
Which other problems this tool could solve for us in the future?
The text was updated successfully, but these errors were encountered:
DeepEval Documentation Review and Metric Selection
I am currently conducting an in-depth review of the DeepEval documentation, focusing on the detailed specifications of various evaluation metrics. My goal is to determine which metric best suits our specific test case. I have also implemented the initial version of testing through DeepEval, as shown in the attached screenshot, and successfully executed the first tests on the dataset. Moving forward, I plan to continue deepening my expertise in this area and refining the testing process.
I have successfully updated the CSV file testing functionality, expanded the application's overall functionality, and implemented real-time manual testing of LLM responses using DeepEval metrics.
https://docs.confident-ai.com/
https://docs.confident-ai.com/docs/confident-ai-introduction
Try to use this tool instead of our self-written solution.
Create separate implementation of validation and prompt testing with deepeval.
Describe any difficulties or bugs discovered on the way.
Create an article or blog post describing proc and cons of using this tool for prompt testing, validation, and results visualisation comparing to current solution.
Which other problems this tool could solve for us in the future?
The text was updated successfully, but these errors were encountered: