Release Unitxt 1.17.0 - New LLM as Judges! · IBM/unitxt

Importnat Changes

write abstract for update talk about unitxt covering the following topics:

Criteria based LLM as Judges - Improved class of llm as judges with customizable judging criteria (read more)
Unitxt assistant - A textual assistant expert in unitxt to help developers (read more)
New benchmarks: Tables, Vision - Benchmarks for table understanding and image understanding compiled by the community and collaborators (read more)
Support for all major inference providers - Inference for evaluation or llm as judges can be channel to any inference provider such as: azure, aws and watsonx (read more)

Detailed Changes

Fix typing notation for python 3.8 by @elronbandel in #1453
Instance_metric and apply_metric keep only one instance at a time in mem, at the expense of repeated passes over input stream (2 times for instance_metric, #metrics for apply_metric) by @dafnapension in #1448
simplify class parameter listing on web page by @dafnapension in #1454
Bring code coverage tests back to life by @elronbandel in #1455
Fix coverage tests by @elronbandel in #1456
make demos_pool a local var rather than a separate stream by @dafnapension in #1436
Adding upper case and last non empty line processor by @antonpibm in #1458
performance by bluebench by @dafnapension in #1457
Add UNITXT_MOCK_INFERENCE_MODE environment variable to performance workflow by @elronbandel in #1461
remove redundant lines from performance.yml by @dafnapension in #1462
Benjams/add bioasq miniwiki datasets by @BenjSz in #1460
Add SocialIQA dataset by @elronbandel in #1468
Add parallelization to RITS inference by @arielge in #1441
Fix the type handeling for tasks to support string types by @elronbandel in #1470
Update version to 1.16.1 by @elronbandel in #1472
extend choices arrangement functionality with ReorderableMultipleChoi… by @eliyahabba in #1464
Add GPQA dataset by @elronbandel in #1474
Add simple QA dataset by @elronbandel in #1475
Add LongBench V2 dataset by @elronbandel in #1476
Adding typed recipe test by @antonpibm in #1473
Add place_correct_choice_position to set the correct choice index and… by @eliyahabba in #1481
Add MapReduceMetric a new base class to integrate all metrics into by @elronbandel in #1459
Add multi document support and FRAMES benchmark by @elronbandel in #1477
Update version to 1.16.2 by @elronbandel in #1483
Add Azure support and expand OpenAI model options in inference engine by @elronbandel in #1485
Benjams/fix bioasq card by @BenjSz in #1486
add separator to csv loader by @BenjSz in #1488
Fix bug in metrics loading in tasks by @elronbandel in #1487
Update version to 1.16.3 by @elronbandel in #1489
Fix bootstrap condition to handle cases with insufficient instances by @elronbandel in #1490
Update version to 1.16.4 by @elronbandel in #1491
Simplify artifact link [Non Backward Compatible!] by @elronbandel in #1494
Added NER example by @yoavkatz in #1492
Add example for evaluating tables as images using Unitxt APIs by @elronbandel in #1495
Mm updates by @alfassy in #1465
Fix wrong saving of artifact initial dict by @elronbandel in #1499
Accelerate and improve RAG Metrics by @elronbandel in #1497
Make clinc preparation faster by @elronbandel in #1501
Fix templates lists in vision cards by @elronbandel in #1500
Add vision benchmark example by @elronbandel in #1502
Update vis bench by @elronbandel in #1505
Add Balance operator by @elronbandel in #1507
Fix for demos_pool with images. by @elronbandel in #1509
Remove new balance operator and use existing implementation by @elronbandel in #1510
Fixes and adjustment in rag metrics and related inference engines by @lilacheden in #1466
Tables bench by @ShirApp in #1506
Keep metadata over main unitxt stages by @eladven in #1512
Fix: Improved handling of place_correct_choice_position for flexibl… by @eliyahabba in #1511
Fixes in LLMJudge by @lilacheden in #1498
Verify metrics prediction_type without loading metric by @elronbandel in #1519
Add Unitxt Assistant beta by @elronbandel in #1513
Ensure fusion do not call streams before use by @elronbandel in #1518
Minor llm as judge fix/changes by @martinscooper in #1467
Fix: Selected option for supporting negative indexes in place_correct… by @eliyahabba in #1522
Refactor rag metrics and judges by @lilacheden in #1515
Add Llama 3.1 on Vertex AI to CrossProviderInferenceEngine by @yifanmai in #1525
fix external_rag example by @lilacheden in #1526
Add search to assistant for much faster response by @elronbandel in #1524
fixed division by 0 in compare performance results by @dafnapension in #1523
Add two criteria based direct llm judges by @lilacheden in #1527
Update version to 1.17.0 by @elronbandel in #1535

New Contributors

@eliyahabba made their first contribution in #1464

Full Changelog: 1.16.0...1.17.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unitxt 1.17.0 - New LLM as Judges!

Importnat Changes

Detailed Changes

New Contributors

Contributors