-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Branch][DeepSparse Evaluation API] Update lm-eval, perplexity, additional datasets #1580
Conversation
…eepsparse into feature/damian/fix_lm_eval
@@ -79,24 +81,66 @@ def if_generative_language_model(pipeline: Pipeline) -> bool: | |||
return False | |||
|
|||
|
|||
def args_to_dict(args: Tuple[Any, ...]) -> Dict[str, Any]: | |||
def parse_kwarg_tuples(kwargs: tuple) -> Dict: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is 1:1 copy of the same function from SparseML. As proposed by @rahul-tuli, in the future let's have it in the nm-utils
so it can be shared between Deepsparse and SparseML
src/deepsparse/evaluation/utils.py
Outdated
|
||
LM_EVALUATION_HARNESS = "lm-evaluation-harness" | ||
_LOGGER = logging.getLogger(__name__) | ||
LM_EVALUATION_HARNESS = "lm-eval-harness" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering lm-eval-harness
is still the name of the repo, I would propose to keep it as lm-eval-harness
. I think if you'd like to alias lm_eval
as well since that is the name of their CLI command, that would be fine
TYPOS FIXED: Considering lm-evaluation-harness
is still the name of the repo, I would propose to keep it as lm-evaluation-harness
. I think if you'd like to alias lm_eval
as well since that is the name of their CLI command, that would be fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is something that has been introduced in this PR right? We are indeed committing to the name 'lm-eval-harness'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry there are multiple typos - i meant to keep lm-evaluation-harness
and contest the change to lm-eval-harness
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mgoin note: this comes from the docs: https://neuralmagic.github.io/docs-v2/get-started/deploy (bottom of the page). I am happy to change this to whatever product sees fit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's get this in in the current state, I can always change this detail if needed.
@@ -24,8 +24,7 @@ def try_import_lm_evaluation_harness(raise_error=False): | |||
if raise_error: | |||
raise ImportError( | |||
"Unable to import lm_eval. " | |||
"To install run 'pip install " | |||
"git+https://github.com/EleutherAI/lm-evaluation-harness@b018a7d51'" | |||
"To install run 'pip install lm-eval==0.4.0'" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when or how will this error during normal use if raise_error=False
by default? once the eval actually begins?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Good point. Yes, I will change the default behavior of this function, and set raise_error to True.
This is the intended behavior when the acual eval is being ran. At runtime, when the user intends to use lm-eval
, the module will try to do the hot import of the lm-eval
. If it fails to find the dependency, installed, it will raise the error.
However, when testing, I do not want to raise errors, but use the output of this function (boolean) to skip the tests that require lm-eval installed.
src/deepsparse/evaluation/integrations/lm_evaluation_harness.py
Outdated
Show resolved
Hide resolved
* initial commit * Update src/deepsparse/evaluation/integrations/__init__.py * design ready, time to define additional features * split prep_for_generation operator * fix logits * update non-kv cache pipeline and tests * add tests to address edge cases * add condition to check of kv_cache full during prompt inference, add test to cover this case, revert debugging changes * fix typing * remove commented code * remove irrelevant condition * perplexity for non-kv cache pipelines works! * logic is working * ready for review * [DeepSparse Evaluation API] Perplexity eval support for `openai_humaneval`, `c4`, `wikitext2` (#1586) * fix tests 2 * initial commit * add return to a function * make script more robust --------- Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
…y, additional datasets (#1580)
This PR updates the version of the lm-eval from
0.3
to0.4
.Supported and tested datasets to evaluate on
gsm8k
,hellaswag
,arc_challange
.Example usage
Example using CLI (when lm-eval is not installed):
Example using CLI
Example using
evaluate
function:Example running unit tests (requires
lm-eval==0.4
to be installed)