Release v4.6.0
Announcing release v4.6.0
Here are the main changes:
Changes
1. 14B Multi-Dataset Competition.
Activation block: 4_252_646
Datasets used during evaluation:
- HuggingFaceFW/fineweb-edu-score-2 (95%)
- bigcode/the-stack-dedup (5%)
2. Retiring the 700M competition
Activation block: 4_252_646
We will sunset the 700M competition at the same time that the new 14B rises. The 14B multi-dataset competition will be taking in the emissions of 700M.
3. New epsilon lower bound and decay interval
Activation block 4_252_646
We will slightly increase the lower bound for epsilon from 0.0001 to 0.0005 and decrease the decay interval from 10 to 7 days.
4. Evaluation data syncing
Effected immediately.
Validation batches will now be synced across all validators. We will also introduce a delay in picking up models for validation to prevent exploits by training on the exact upcoming batch.
5. Deduplicating evaluation data
Effected immediately.
This issue has been raised on Discord. Picking a random offset when sampling batches could result in two validation pages with some overlap when the offset difference is less than the number of samples pulled at each offset position. Although this is a very rare issue when dealing with large datasets, this issue has been fixed now only on FineWeb-Edu2 but it will be generalized to all loaders in a following releases.
6. Setting repo visibility at model upload
Effected immediately.
Added the --update_repo_visibility
argument to the upload_mode.py
to enable changing the HuggingFace repo at model upload rather than setting this manually.
7. New emission distribution
3B → 29%
14B → 57%
14B Multi-Dataset → 14%
NOTES TO VALIDATORS
-The newly added dataset for code The Stack V1-dedup
requires an Hugging Face access token. You can learn how to obtain one in our validator documentation here.
-Please also make sure to rerun pip install to ensure updated dependencies.
python -m pip install -e