Skip to content

Commit

Permalink
Merge pull request #131 from hearbenchmark/update-org
Browse files Browse the repository at this point in the history
Update GH Org Name
  • Loading branch information
jorshi authored Jun 7, 2022
2 parents 444c645 + 8f5991d commit 279fba1
Show file tree
Hide file tree
Showing 8 changed files with 29 additions and 28 deletions.
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,7 @@ RUN echo 20211120
RUN APT_INSTALL="apt-get install -y --no-install-recommends" && \
PIP_INSTALL="python3 -m pip --no-cache-dir install" && \
GIT_CLONE="git clone --depth 10" && \
$GIT_CLONE https://github.com/neuralaudio/hear-preprocess.git
$GIT_CLONE https://github.com/hearbenchmark/hear-preprocess.git
RUN cd hear-preprocess && \
python3 -m pip --no-cache-dir install -e ".[dev]"

Expand Down
27 changes: 14 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
![HEAR2021](https://neuralaudio.ai/assets/img/hear-header-sponsor.jpg)
# hear-preprocess

Dataset preprocessing code for the HEAR 2021 NeurIPS competition.
Dataset preprocessing code for the HEAR Benchmark and for all the tasks used during
the 2021 HEAR NeurIPS challenge. To find out more about HEAR please visit https://hearbenchmark.com.

Unless you are a HEAR organizer or want to contribute a task,
Unless you need to pre-process HEAR benchmark tasks yourself or want to contribute a task,
you won't need this repo. Use
[hear-eval-kit](https://github.com/neuralaudio/hear-eval-kit/) to
[hear-eval-kit](https://github.com/hearbenchmark/hear-eval-kit/) to
evaluate your embedding models on these tasks.

Pre-processed datasets (at 48000Hz) for all HEAR 2021 tasks are available on
Pre-processed datasets (at 48000Hz) for all HEAR Benchmark tasks are available on
[zenodo](https://doi.org/10.5281/zenodo.5802571). Other sampling rates
(16000, 22050, 32000, 44100), are available for download (requester pays) from Google Storage
[gs://hear2021-archive/tasks/](https://console.cloud.google.com/storage/browser/hear2021-archive/tasks)
Expand All @@ -18,7 +18,7 @@ This preprocessing is slow and disk-intensive but safe and careful.
## Cloud Usage

See [hear-eval's
README.spotty](https://github.com/neuralaudio/hear-eval-kit/blob/main/README.spotty.md)
README.spotty](https://github.com/hearbenchmark/hear-eval-kit/blob/main/README.spotty.md)
for information on how to use spotty.

## Installation
Expand All @@ -34,7 +34,7 @@ because pip3 installs are very finicky, but it might work.

Clone repo:
```
git clone https://github.com/neuralaudio/hear-preprocess
git clone https://github.com/hearbenchmark/hear-preprocess
cd hear-preprocess
```

Expand All @@ -55,14 +55,15 @@ python3 -m pytest

### Preprocessing

You probably don't need to do this unless you are implementing the
HEAR challenge.
You probably don't need to do this unless you can't use the [available pre-processed
datasets](https://hearbenchmark.com/hear-tasks.html#downloading) and need to preprocess
the data yourself..

If you want to run preprocessing yourself:
* You will need `ffmpeg>=4.2` installed (possibly from conda-forge).
* You will need `soxr` support, which might require package
libsox-fmt-ffmpeg or [installing from
source](https://github.com/neuralaudio/hear-eval-kit/issues/156#issuecomment-893151305).
source](https://github.com/hearbenchmark/hear-eval-kit/issues/156#issuecomment-893151305).

These Luigi pipelines are used to preprocess the evaluation tasks
into a common format for downstream evaluation.
Expand Down Expand Up @@ -173,7 +174,7 @@ speech_commands-v0.0.2/03-ExtractMetadata/labelcount_valid.json

The small flag runs the preprocessing pipeline on a small version
of each dataset stored at [Downsampled HEAR Open
Tasks](https://github.com/neuralaudio/hear2021-open-tasks-downsampled). This
Tasks](https://github.com/hearbenchmark/hear2021-open-tasks-downsampled). This
is used for development and continuous integration tests for the
pipeline.

Expand All @@ -190,7 +191,7 @@ small version of the dataset for development.

If the open tasks have changed enough to break the downstream CI,
(for example in the heareval repo), the [Preprocessed Downsampled HEAR Open
Tasks](https://github.com/neuralaudio/hear2021-open-tasks-downsampled/tree/main/preprocessed)
Tasks](https://github.com/hearbenchmark/hear2021-open-tasks-downsampled/tree/main/preprocessed)
should be updated. An example of an obvious breaking changes can be modification of the task configuration.

The version should be bumped up in `hearpreprocess/__init__.py` and the pipeline should
Expand All @@ -199,7 +200,7 @@ be run for the open tasks with `--mode small` flag
Thereafter, the following command can be used to copy the tarred files produced by running the pipeline for the open tasks to the repo( Please clone the repo )

```
git clone git@github.com:neuralaudio/hear2021-open-tasks-downsampled.git
git clone git@github.com:hearbenchmark/hear2021-open-tasks-downsampled.git
cp hear-LATEST-speech_commands-v0.0.2-small-44100.tar.gz ./hear2021-open-tasks-downsampled/preprocessed/
cp hear-LATEST-nsynth_pitch-v2.2.3-small-44100.tar.gz ./hear2021-open-tasks-downsampled/preprocessed/
cp hear-LATEST-dcase2016_task2-hear2021-small-44100.tar.gz ./hear2021-open-tasks-downsampled/preprocessed/
Expand Down
4 changes: 2 additions & 2 deletions hearpreprocess/dcase2016_task2.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,13 +63,13 @@
{
"split": "train",
"name": "dev",
"url": "https://github.com/neuralaudio/hear2021-open-tasks-downsampled/raw/main/dcase2016_task2_train_dev-small.zip", # noqa: E501
"url": "https://github.com/hearbenchmark/hear2021-open-tasks-downsampled/raw/main/dcase2016_task2_train_dev-small.zip", # noqa: E501
"md5": "aa9b43c40e9d496163caab83becf972e",
},
{
"split": "train",
"name": "eval",
"url": "https://github.com/neuralaudio/hear2021-open-tasks-downsampled/raw/main/dcase2016_task2_test_public-small.zip", # noqa: E501
"url": "https://github.com/hearbenchmark/hear2021-open-tasks-downsampled/raw/main/dcase2016_task2_test_public-small.zip", # noqa: E501
"md5": "14539d85dec03cb7ac75eb62dd1dd21e",
},
],
Expand Down
6 changes: 3 additions & 3 deletions hearpreprocess/nsynth_pitch.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,17 +72,17 @@
"download_urls": [
{
"split": "train",
"url": "https://github.com/neuralaudio/hear2021-open-tasks-downsampled/raw/main/nsynth-train-small.zip", # noqa: E501
"url": "https://github.com/hearbenchmark/hear2021-open-tasks-downsampled/raw/main/nsynth-train-small.zip", # noqa: E501
"md5": "c17070e4798655d8bea1231506479ba8",
},
{
"split": "valid",
"url": "https://github.com/neuralaudio/hear2021-open-tasks-downsampled/raw/main/nsynth-valid-small.zip", # noqa: E501
"url": "https://github.com/hearbenchmark/hear2021-open-tasks-downsampled/raw/main/nsynth-valid-small.zip", # noqa: E501
"md5": "e36722262497977f6b945bb06ab0969d",
},
{
"split": "test",
"url": "https://github.com/neuralaudio/hear2021-open-tasks-downsampled/raw/main/nsynth-test-small.zip", # noqa: E501
"url": "https://github.com/hearbenchmark/hear2021-open-tasks-downsampled/raw/main/nsynth-test-small.zip", # noqa: E501
"md5": "9a98e869ed4add8ba9ebb0d7c22becca",
},
],
Expand Down
4 changes: 2 additions & 2 deletions hearpreprocess/secrettasks/hearsecrettasks/coughvid.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
"sample_duration": 10.19,
# Original Dataset Paper - https://www.nature.com/articles/s41597-021-00937-4.pdf
# TODO: Implement AUC or other measure?
# https://github.com/neuralaudio/hear2021-secret-tasks/issues/16
# https://github.com/hearbenchmark/hear2021-secret-tasks/issues/16
"evaluation": ["top1_acc", "mAP", "d_prime", "aucroc"],
"download_urls": [
# test and valid split will be sampled from the train set only
Expand Down Expand Up @@ -82,7 +82,7 @@ def get_requires_metadata(self, split: str) -> pd.DataFrame:
label_map = (
pd.read_csv(split_path.joinpath("metadata_compiled.csv"))
# Filter out the data points with null status
# TODO: https://github.com/neuralaudio/hear2021-secret-tasks/issues/17
# TODO: https://github.com/hearbenchmark/hear2021-secret-tasks/issues/17
# Select entries with cough detected probability greater than 0.8
.loc[lambda df: df["cough_detected"] > 0.8]
# Select entries with self reported status available
Expand Down
4 changes: 2 additions & 2 deletions hearpreprocess/speech_commands.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,12 +66,12 @@
"download_urls": [
{
"split": "train",
"url": "https://github.com/neuralaudio/hear2021-open-tasks-downsampled/raw/main/speech_commands_v0.02-small.zip", # noqa: E501
"url": "https://github.com/hearbenchmark/hear2021-open-tasks-downsampled/raw/main/speech_commands_v0.02-small.zip", # noqa: E501
"md5": "455123a88b8410d1f955c77ad331524f",
},
{
"split": "test",
"url": "https://github.com/neuralaudio/hear2021-open-tasks-downsampled/raw/main/speech_commands_test_set_v0.02-small.zip", # noqa: E501
"url": "https://github.com/hearbenchmark/hear2021-open-tasks-downsampled/raw/main/speech_commands_test_set_v0.02-small.zip", # noqa: E501
"md5": "26d08374a7abd13ca2f4a4b8424f41d0",
},
],
Expand Down
2 changes: 1 addition & 1 deletion hearpreprocess/spoken_digit.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
"download_urls": [
{
"split": "train",
"url": "https://github.com/neuralaudio/hear2021-open-tasks-downsampled/raw/main/spoken_digit-small.zip", # noqa: E501
"url": "https://github.com/hearbenchmark/hear2021-open-tasks-downsampled/raw/main/spoken_digit-small.zip", # noqa: E501
"md5": "69d50c15805ea11beb12d9a4db1d4c2a",
}
],
Expand Down
8 changes: 4 additions & 4 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,14 @@
description="Holistic Evaluation of Audio Representations (HEAR) 2021 -- Preprocessing Pipeline",
author="",
author_email="",
url="https://github.com/neuralaudio/hear-preprocess",
download_url="https://github.com/neuralaudio/hear-preprocess",
url="https://github.com/hearbenchmark/hear-preprocess",
download_url="https://github.com/hearbenchmark/hear-preprocess",
license="Apache-2.0",
long_description=long_description,
long_description_content_type="text/markdown",
project_urls={
"Bug Tracker": "https://github.com/neuralaudio/hear-preprocess/issues",
"Source Code": "https://github.com/neuralaudio/hear-preprocess",
"Bug Tracker": "https://github.com/hearbenchmark/hear-preprocess/issues",
"Source Code": "https://github.com/hearbenchmark/hear-preprocess",
},
packages=find_packages(exclude=("tests",)),
python_requires=">=3.7",
Expand Down

0 comments on commit 279fba1

Please sign in to comment.