Update docstring for the_cauldron_dataset to match the default subset…

… value as 'orcvqa'
pytorch · Jan 3, 2025 · 48ec8b7 · 48ec8b7
1 parent e979109
commit 48ec8b7
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/torchtune/datasets/multimodal/_the_cauldron.py b/torchtune/datasets/multimodal/_the_cauldron.py
@@ -181,7 +181,7 @@ def __call__(self, sample: Mapping[str, Any]) -> Mapping[str, Any]:
             transforms on the keys. It should consist of at minimum two components: text tokenization (called
             on the "messages" field) and image transform (called on the "images" field). The keys returned by
             the model transform should be aligned with the expected inputs into the model.
-        subset (str): name of the subset of the dataset to load. See the `dataset card
+        subset (str): name of the subset of the dataset to load. Default is `orcvqa`, see the `dataset card
             <https://huggingface.co/datasets/HuggingFaceM4/the_cauldron>`_ for options.
         source (str): path to dataset repository on Hugging Face. For local datasets,
             define source as the data file type (e.g. "json", "csv", "text") and pass