Skip to content

Using Human vs AI in Dataset Experiments for RAG evaluation #4789

Answered by axiomofjoy
omrihar asked this question in Q&A
Discussion options

You must be logged in to vote

Hey @omrihar, this is a great question.

I don't really understand the differences between the Evals in the Evals section and the Evals in the experiment section

As you noted, the llm_classify API is designed for use with dataframes in a notebook environment, and results in annotations on spans. In contrast, the evaluators in the experiment section are intended for running across a dataset, and result in annotations on dataset examples.

Are these two interfaces compatible with each other?

The interfaces are not compatible with each other. This is an area in which we'd like to improve.

In which context should I use which?

In your case, it sounds like you want to use the experiments AP…

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@omrihar
Comment options

@axiomofjoy
Comment options

Answer selected by omrihar
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants