Unitxt 1.17.0 - New LLM as Judges!
Importnat Changes
write abstract for update talk about unitxt covering the following topics:
- Criteria based LLM as Judges - Improved class of llm as judges with customizable judging criteria (read more)
- Unitxt assistant - A textual assistant expert in unitxt to help developers (read more)
- New benchmarks: Tables, Vision - Benchmarks for table understanding and image understanding compiled by the community and collaborators (read more)
- Support for all major inference providers - Inference for evaluation or llm as judges can be channel to any inference provider such as: azure, aws and watsonx (read more)
Detailed Changes
- Fix typing notation for python 3.8 by @elronbandel in #1453
- Instance_metric and apply_metric keep only one instance at a time in mem, at the expense of repeated passes over input stream (2 times for instance_metric, #metrics for apply_metric) by @dafnapension in #1448
- simplify class parameter listing on web page by @dafnapension in #1454
- Bring code coverage tests back to life by @elronbandel in #1455
- Fix coverage tests by @elronbandel in #1456
- make demos_pool a local var rather than a separate stream by @dafnapension in #1436
- Adding upper case and last non empty line processor by @antonpibm in #1458
- performance by bluebench by @dafnapension in #1457
- Add UNITXT_MOCK_INFERENCE_MODE environment variable to performance workflow by @elronbandel in #1461
- remove redundant lines from performance.yml by @dafnapension in #1462
- Benjams/add bioasq miniwiki datasets by @BenjSz in #1460
- Add SocialIQA dataset by @elronbandel in #1468
- Add parallelization to RITS inference by @arielge in #1441
- Fix the type handeling for tasks to support string types by @elronbandel in #1470
- Update version to 1.16.1 by @elronbandel in #1472
- extend choices arrangement functionality with ReorderableMultipleChoi… by @eliyahabba in #1464
- Add GPQA dataset by @elronbandel in #1474
- Add simple QA dataset by @elronbandel in #1475
- Add LongBench V2 dataset by @elronbandel in #1476
- Adding typed recipe test by @antonpibm in #1473
- Add place_correct_choice_position to set the correct choice index and… by @eliyahabba in #1481
- Add MapReduceMetric a new base class to integrate all metrics into by @elronbandel in #1459
- Add multi document support and FRAMES benchmark by @elronbandel in #1477
- Update version to 1.16.2 by @elronbandel in #1483
- Add Azure support and expand OpenAI model options in inference engine by @elronbandel in #1485
- Benjams/fix bioasq card by @BenjSz in #1486
- add separator to csv loader by @BenjSz in #1488
- Fix bug in metrics loading in tasks by @elronbandel in #1487
- Update version to 1.16.3 by @elronbandel in #1489
- Fix bootstrap condition to handle cases with insufficient instances by @elronbandel in #1490
- Update version to 1.16.4 by @elronbandel in #1491
- Simplify artifact link [Non Backward Compatible!] by @elronbandel in #1494
- Added NER example by @yoavkatz in #1492
- Add example for evaluating tables as images using Unitxt APIs by @elronbandel in #1495
- Mm updates by @alfassy in #1465
- Fix wrong saving of artifact initial dict by @elronbandel in #1499
- Accelerate and improve RAG Metrics by @elronbandel in #1497
- Make clinc preparation faster by @elronbandel in #1501
- Fix templates lists in vision cards by @elronbandel in #1500
- Add vision benchmark example by @elronbandel in #1502
- Update vis bench by @elronbandel in #1505
- Add Balance operator by @elronbandel in #1507
- Fix for demos_pool with images. by @elronbandel in #1509
- Remove new balance operator and use existing implementation by @elronbandel in #1510
- Fixes and adjustment in rag metrics and related inference engines by @lilacheden in #1466
- Tables bench by @ShirApp in #1506
- Keep metadata over main unitxt stages by @eladven in #1512
- Fix: Improved handling of
place_correct_choice_position
for flexibl… by @eliyahabba in #1511 - Fixes in LLMJudge by @lilacheden in #1498
- Verify metrics prediction_type without loading metric by @elronbandel in #1519
- Add Unitxt Assistant beta by @elronbandel in #1513
- Ensure fusion do not call streams before use by @elronbandel in #1518
- Minor llm as judge fix/changes by @martinscooper in #1467
- Fix: Selected option for supporting negative indexes in place_correct… by @eliyahabba in #1522
- Refactor rag metrics and judges by @lilacheden in #1515
- Add Llama 3.1 on Vertex AI to CrossProviderInferenceEngine by @yifanmai in #1525
- fix external_rag example by @lilacheden in #1526
- Add search to assistant for much faster response by @elronbandel in #1524
- fixed division by 0 in compare performance results by @dafnapension in #1523
- Add two criteria based direct llm judges by @lilacheden in #1527
- Update version to 1.17.0 by @elronbandel in #1535
New Contributors
- @eliyahabba made their first contribution in #1464
Full Changelog: 1.16.0...1.17.0