CCBY and embedded audio support

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
vllm-project · Jan 28, 2025 · bc8b44e · bc8b44e
1 parent 6710715
commit bc8b44e
Show file tree

Hide file tree

Showing 3 changed files with 51 additions and 3 deletions.
diff --git a/examples/multimodal_audio/README.md b/examples/multimodal_audio/README.md
@@ -1,11 +1,13 @@
 # Quantizing Multimodal Audio Models #
 
 <audio controls>
-    <source src="https://datasets-server.huggingface.co/cached-assets/MLCommons/peoples_speech/--/f10597c5d3d3a63f8b6827701297c3afdf178272/--/clean/test/0/audio/audio.wav?Expires=1738010344&Signature=V6eMq7mQo1~wrkdswghsWaf9aklEQwoqw8FwJUiHAL75K7BcarTepBYcQkFIRi6usgU5J0TlX~wBwIlobAE7GzEXTUI7j5KA1MbFTiLo-nIYiq-WpA70EHW3mGy5HyCm01wKD49ngQDOgHX0-NrvTuXJCkTBhfYBwbQ5QsM8Wv3sbgEyadE~RMEGJLTfQL5fzQp3l1FWMdGuBJHDqSZa1SzTbOJYfmNQjGlfgWpm8Fhf5KWDl1NQSgWaiWRC0evbxt~C9Z8sEYwIEma7tTJafWqc2T9Awn8RdMqNKXnqSZ-mQBBxWVAV9cJbGKsj5JXJJwMPl23AUpzfSale71602g__&Key-Pair-Id=K3EI6M078Z3AC3">
+    <source src="assets/audio.mp4">
     Your browser does not support the audio element.
 </audio>
 <em>
 
+Audio provided by Daniel Galvez et al. under creative commons license
+
 ``` 
 <|startoftranscript|> <|en|>
 ...
@@ -59,4 +61,32 @@ Because the architectures of vision-language models is often times more complex
 For a guide on adding smoothquant mappings for your dataset, see the [SmoothQuant Guide](/src/llmcompressor/modifiers/smoothquant/README.md).
 
 ## Adding Your Own Data Collator ##
-Most examples utilize a generic `data_collator` which correctly correlates data for most multimodal datasets. If you find that your model needs custom data collation (as is the case with [pixtral](/examples/multimodal_vision/pixtral_example.py)), you can modify this function to reflect these model-specific requirements.
+Most examples utilize a generic `data_collator` which correctly correlates data for most multimodal datasets. If you find that your model needs custom data collation (as is the case with [pixtral](/examples/multimodal_vision/pixtral_example.py)), you can modify this function to reflect these model-specific requirements.
+
+## Sample Audio Provided Under a Creative Commons Attribution License ##
+https://creativecommons.org/licenses/by/4.0/legalcode
+```
+@article{DBLP:journals/corr/abs-2111-09344,
+  author    = {Daniel Galvez and
+               Greg Diamos and
+               Juan Ciro and
+               Juan Felipe Cer{\'{o}}n and
+               Keith Achorn and
+               Anjali Gopi and
+               David Kanter and
+               Maximilian Lam and
+               Mark Mazumder and
+               Vijay Janapa Reddi},
+  title     = {The People's Speech: {A} Large-Scale Diverse English Speech Recognition
+               Dataset for Commercial Usage},
+  journal   = {CoRR},
+  volume    = {abs/2111.09344},
+  year      = {2021},
+  url       = {https://arxiv.org/abs/2111.09344},
+  eprinttype = {arXiv},
+  eprint    = {2111.09344},
+  timestamp = {Mon, 22 Nov 2021 16:44:07 +0100},
+  biburl    = {https://dblp.org/rec/journals/corr/abs-2111-09344.bib},
+  bibsource = {dblp computer science bibliography, https://dblp.org}
+}
+```
diff --git a/examples/multimodal_audio/assets/audio.mp4 b/examples/multimodal_audio/assets/audio.mp4
diff --git a/examples/multimodal_vision/README.md b/examples/multimodal_vision/README.md
@@ -61,4 +61,22 @@ Because the architectures of vision-language models is often times more complex
 For a guide on adding smoothquant mappings for your dataset, see the [SmoothQuant Guide](/src/llmcompressor/modifiers/smoothquant/README.md).
 
 ## Adding Your Own Data Collator ##
-Most examples utilize a generic `data_collator` which correctly correlates data for most multimodal datasets. If you find that your model needs custom data collation (as is the case with [pixtral](/examples/multimodal_vision/pixtral_example.py)), you can modify this function to reflect these model-specific requirements.
+Most examples utilize a generic `data_collator` which correctly correlates data for most multimodal datasets. If you find that your model needs custom data collation (as is the case with [pixtral](/examples/multimodal_vision/pixtral_example.py)), you can modify this function to reflect these model-specific requirements.
+
+## Sample Image Provided Under a Creative Commons Attribution License ##
+https://creativecommons.org/licenses/by/4.0/legalcode
+```
+@article{cocodataset,
+  author    = {Tsung{-}Yi Lin and Michael Maire and Serge J. Belongie and Lubomir D. Bourdev and Ross B. Girshick and James Hays and Pietro Perona and Deva Ramanan and Piotr Doll{'{a} }r and C. Lawrence Zitnick},
+  title     = {Microsoft {COCO:} Common Objects in Context},
+  journal   = {CoRR},
+  volume    = {abs/1405.0312},
+  year      = {2014},
+  url       = {http://arxiv.org/abs/1405.0312},
+  archivePrefix = {arXiv},
+  eprint    = {1405.0312},
+  timestamp = {Mon, 13 Aug 2018 16:48:13 +0200},
+  biburl    = {https://dblp.org/rec/bib/journals/corr/LinMBHPRDZ14},
+  bibsource = {dblp computer science bibliography, https://dblp.org}
+}
+```