Skip to content

Commit

Permalink
CCBY and embedded audio support
Browse files Browse the repository at this point in the history
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
  • Loading branch information
kylesayrs committed Jan 28, 2025
1 parent 6710715 commit bc8b44e
Show file tree
Hide file tree
Showing 3 changed files with 51 additions and 3 deletions.
34 changes: 32 additions & 2 deletions examples/multimodal_audio/README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
# Quantizing Multimodal Audio Models #

<audio controls>
<source src="https://datasets-server.huggingface.co/cached-assets/MLCommons/peoples_speech/--/f10597c5d3d3a63f8b6827701297c3afdf178272/--/clean/test/0/audio/audio.wav?Expires=1738010344&Signature=V6eMq7mQo1~wrkdswghsWaf9aklEQwoqw8FwJUiHAL75K7BcarTepBYcQkFIRi6usgU5J0TlX~wBwIlobAE7GzEXTUI7j5KA1MbFTiLo-nIYiq-WpA70EHW3mGy5HyCm01wKD49ngQDOgHX0-NrvTuXJCkTBhfYBwbQ5QsM8Wv3sbgEyadE~RMEGJLTfQL5fzQp3l1FWMdGuBJHDqSZa1SzTbOJYfmNQjGlfgWpm8Fhf5KWDl1NQSgWaiWRC0evbxt~C9Z8sEYwIEma7tTJafWqc2T9Awn8RdMqNKXnqSZ-mQBBxWVAV9cJbGKsj5JXJJwMPl23AUpzfSale71602g__&Key-Pair-Id=K3EI6M078Z3AC3">
<source src="assets/audio.mp4">
Your browser does not support the audio element.
</audio>
<em>

Audio provided by Daniel Galvez et al. under creative commons license

```
<|startoftranscript|> <|en|>
...
Expand Down Expand Up @@ -59,4 +61,32 @@ Because the architectures of vision-language models is often times more complex
For a guide on adding smoothquant mappings for your dataset, see the [SmoothQuant Guide](/src/llmcompressor/modifiers/smoothquant/README.md).

## Adding Your Own Data Collator ##
Most examples utilize a generic `data_collator` which correctly correlates data for most multimodal datasets. If you find that your model needs custom data collation (as is the case with [pixtral](/examples/multimodal_vision/pixtral_example.py)), you can modify this function to reflect these model-specific requirements.
Most examples utilize a generic `data_collator` which correctly correlates data for most multimodal datasets. If you find that your model needs custom data collation (as is the case with [pixtral](/examples/multimodal_vision/pixtral_example.py)), you can modify this function to reflect these model-specific requirements.

## Sample Audio Provided Under a Creative Commons Attribution License ##
https://creativecommons.org/licenses/by/4.0/legalcode
```
@article{DBLP:journals/corr/abs-2111-09344,
author = {Daniel Galvez and
Greg Diamos and
Juan Ciro and
Juan Felipe Cer{\'{o}}n and
Keith Achorn and
Anjali Gopi and
David Kanter and
Maximilian Lam and
Mark Mazumder and
Vijay Janapa Reddi},
title = {The People's Speech: {A} Large-Scale Diverse English Speech Recognition
Dataset for Commercial Usage},
journal = {CoRR},
volume = {abs/2111.09344},
year = {2021},
url = {https://arxiv.org/abs/2111.09344},
eprinttype = {arXiv},
eprint = {2111.09344},
timestamp = {Mon, 22 Nov 2021 16:44:07 +0100},
biburl = {https://dblp.org/rec/journals/corr/abs-2111-09344.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
```
Binary file added examples/multimodal_audio/assets/audio.mp4
Binary file not shown.
20 changes: 19 additions & 1 deletion examples/multimodal_vision/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,4 +61,22 @@ Because the architectures of vision-language models is often times more complex
For a guide on adding smoothquant mappings for your dataset, see the [SmoothQuant Guide](/src/llmcompressor/modifiers/smoothquant/README.md).

## Adding Your Own Data Collator ##
Most examples utilize a generic `data_collator` which correctly correlates data for most multimodal datasets. If you find that your model needs custom data collation (as is the case with [pixtral](/examples/multimodal_vision/pixtral_example.py)), you can modify this function to reflect these model-specific requirements.
Most examples utilize a generic `data_collator` which correctly correlates data for most multimodal datasets. If you find that your model needs custom data collation (as is the case with [pixtral](/examples/multimodal_vision/pixtral_example.py)), you can modify this function to reflect these model-specific requirements.

## Sample Image Provided Under a Creative Commons Attribution License ##
https://creativecommons.org/licenses/by/4.0/legalcode
```
@article{cocodataset,
author = {Tsung{-}Yi Lin and Michael Maire and Serge J. Belongie and Lubomir D. Bourdev and Ross B. Girshick and James Hays and Pietro Perona and Deva Ramanan and Piotr Doll{'{a} }r and C. Lawrence Zitnick},
title = {Microsoft {COCO:} Common Objects in Context},
journal = {CoRR},
volume = {abs/1405.0312},
year = {2014},
url = {http://arxiv.org/abs/1405.0312},
archivePrefix = {arXiv},
eprint = {1405.0312},
timestamp = {Mon, 13 Aug 2018 16:48:13 +0200},
biburl = {https://dblp.org/rec/bib/journals/corr/LinMBHPRDZ14},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
```

0 comments on commit bc8b44e

Please sign in to comment.