You can use HawkEars to create your own classifier, for example to identify birds in your region. The steps are as follows:
- Select classes to include in your model
- For each class
- Collect recordings
- Create a database of all spectrograms from recordings
- Remove any duplicate recordings
- Select spectrograms to use in training
- Train the model
The first step is to identify the primary classes, such as a list of regional bird species. At least two secondary classes, Noise and Other, should be added to this list, where Noise contains wind sounds and rain, and Other contains sounds such as human voices, farm animals, traffic sounds, etc. For a bird classifier, you might also want classes for regionally common amphibian or mammal species, so that their sounds are not misidentified as birds.
During training, the data augmentation process checks for a class called Noise and treats it specially, so it is important to use that name. The name of the Other class is not important though.
Here are some public sources of recordings that can be used to populate your training dataset:
Source | Contents | Comments |
---|---|---|
Xeno-Canto | birds, amphibians and more | public API supports downloads |
iNaturalist | birds, amphibians and more | public API supports downloads |
Macaulay Library | birds, amphibians and more | no public API for downloads |
Google Audioset | 632 classes, including cars, trains, voices, dogs, etc. | dataset can be downloaded |
ESC-50 | 50 classes, including wind, rain, dogs, cows, etc. | dataset can be downloaded |
UrbanSound8K | 10 classes including dogs, sirens, idling engines, etc. | dataset can be downloaded |
The HawkEars/tools directory contains scripts xeno.py and inat.py for downloading from Xeno-Canto and iNaturalist respectively. To see a list of arguments for a script, run it with the -h argument, like this:
python xeno.py -h
To download up to 500 American Robin recordings from Xeno-Canto to directory ~/data/AMRO, type the following in the tools directory:
python xeno.py --name "American Robin" --dir ~/data/AMRO --max 500
When using inat.py to download from iNaturalist, be sure to use scientific names, not common names. For example, type "dryophytes versicolor" instead of "Gray Treefrog". The iNaturalist API does not do exact matching, so a search for "Gray Treefrog" will include results for "Cope’s Gray Treefrog", but specifying "dryophytes versicolor" will return only recordings that match the requested species.
For Xeno-Canto, you can use the recording number in the URL to see the corresponding observation with comments and other details. For example, https://xeno-canto.org/123456 shows recording number 123456. For iNaturalist, recording numbers do not match observation numbers or URLs, but the inat.py script creates an inat.csv file with the mapping in the output directory.
Once you have created a directory of recordings for a class, you need to extract the spectrograms into a database. In this step we use the extract_all.py script to extract all the spectrograms. The important arguments are:
Argument | Default | Description |
---|---|---|
--code | None | Species code (abbreviated name). |
--db | None | Database name. A SQLite database file called "HawkEars/data/<db_name>.db" will be created. |
--dir | None | Directory path containing recordings. |
--name | None | Species name. |
--offset | 1.5 | Create a 3-second spectrogram at offsets of this many seconds per recording. By default, spectrograms will be created at offsets 0, 1.5, 3.0, 4.5 etc. If you specify --offset 3, only half as many spectrograms will be created, at offsets 0, 3, 6, etc. |
After running this, you can use an open source application called DB Browser for SQLite to open the database file and browse the database.
It is common to find that the same recording has been uploaded more than once to a site such as Xeno-Canto, iNaturalist or Macaulay Library, or that it has been uploaded to multiple sites. Therefore it is useful to remove duplicate recordings before selecting your training data.
This step and the next one both make use of "embeddings". In this context an embedding is a relatively small 1-dimensional representation of a 2-dimensional image, generated by a deep learning model. Embeddings encode relevant features but not irrelevant features, such as small horizontal shifts in the case of spectrograms. As a result, the distance between embeddings is a useful measure of image similarity.
The first step is to populate your newly created database with an embedding per spectrogram. The embeddings are used to detect duplicates and to search for spectrograms later. Insert embeddings as follows:
python embed.py --db <database name>
Then you can list duplicates as follows:
python find_dup_recordings.py --db <database name> --name <class name>
To remove the duplicate recordings, re-run the script with the "--mode 1" argument.
In this phase you will populate a training database with selected spectrograms. For each class, repeat the following steps until you feel you have enough good spectrograms, or are no longer able to find new ones:
- Find a good example of a sound type for the class
- Search the class database for similar spectrograms
- Review the matches, deleting any bad matches
- Copy the remaining ones to the training database
If you have very little data for a class, you can just plot all the spectrograms in the class database, using tools/plot_from_db.py, rather than using the search method described below.
A program such as Audacity is helpful in this phase. Audacity lets you review a recording’s spectrogram, listen to a selected segment, and apply edits such as low-pass or high-pass filters. For example, you may find a good 3-second example, except that it has a lot of high-frequency noise. In that case you can save the segment in a separate audio file and apply a low-pass filter to remove the noise before searching for similar spectrograms in the class database.
Using your audio program of choice, examine your recordings to find a spectrogram to use as the basis for selecting further training spectrograms of that type. Once you have a good example, use the search.py script to find similar spectrograms in the database created above. The main arguments for search.py are:
Argument | Default | Description |
---|---|---|
--db | None | Name of database to search. |
--dist | .5 | Do not show matches with a cosine distance greater than this. |
--name | None | Name of class to search. |
--num | 200 | Maximum number of spectrograms to plot. |
--inp | None | Path to recording containing selected example segment to search for. |
--offset | 0 | Start time in seconds within the recording of selected example segment to search for. |
--omit | None | A second database name, e.g. "training". Do not plot spectrograms that already exist in this database. |
--out | output | Output directory, which will contain matching spectrograms. |
After running search, the output directory will contain images of matching spectrograms, as in this example:
The second image name above is 1~N464010-3.00~0.085.jpeg. The first 1 indicates that it is the closest match (the one starting with 0 is the spectrogram we are searching for). The N464010-3.00 indicates that it is 3 seconds into recording N464010, and the final 0.085 is the cosine distance.
Next you should scan the output images, deleting any that clearly contain secondary species, or are problematic in other ways.
Once that is done, you can use the extract_by_image.py script to insert spectrograms corresponding to the remaining images into the training database. Its main arguments are as follows:
Argument | Default | Description |
---|---|---|
--code | None | Species code (abbreviated name). |
--db | training | Name of training database. |
--dir | None | Directory containing recordings. |
--inp | None | Directory containing spectrogram images. |
--name | None | Species name. |
The training code reads from a “pickle” file, rather than reading the database directly. This improves training performance, and makes it easy to use a data subset, or merge data from different databases. To create a pickle file from your training database:
- Create a classes file.
- Run pickle_db.py.
The classes file is a text file (e.g. HawkEars/data/classes.txt), in which each non-comment line has the format “name,code” for a class. You can also include comment lines, which begin with a # character. For example:
# Amphibians
Northern Leopard Frog,FROG-LEOP
# Birds
American Robin,AMRO
Hermit Thrush,HETH
The main arguments for pickle_db.py are as follows:
Argument | Default | Description |
---|---|---|
--db | training | Name of training database. |
--classes | classes | Name of classes file in data directory. |
--name | ../data/specs.pickle | Path to output file. |
After creating a pickle file, you need to define a training configuration within HawkEars/core/configs.py, as in the following example:
# Simple training configuration
@dataclass
class Train_All(base_config.BaseConfig):
def __init__(self):
self.misc.train_pickle = "data/specs.pickle"
self.train.model_name = "custom_hgnet_2"
self.train.num_epochs = 10
# map names to configurations
configs = {"base": base_config.BaseConfig,
"all": Train_All}
This defines the training data (self.misc.train_pickle), model type (self.train.model_name) and number of epochs (self.train.num_epochs), which is all you need to run training. Model types and sizes can be viewed in HawkEars/models. The custom_hgnet_2 configuration is an hgnet_v2 model with about 1.6M parameters, as described in HawkEars/models/hgnet_v2.py. You can use any model name supported by timm, or any custom configuration from the HawkEars/models directory, using the “custom_xxx_y” convention as above.
Finally, you can train a model, by typing the following in the root HawkEars folder:
python train.py -c all
Note that “all” is the name we assigned the training configuration in configs.py above. Once training is complete, the output will be in HawkEars/logs/fold-0/version_x. The first training run will output to version_0, the second to version_1, etc. The checkpoints sub-directory will contain the generated models. By default, there will be one checkpoint for each of the last three epochs, since the save_last_n setting defaults to 3.
To view graphical summaries of a training run, type this command:
tensorboard --logdir logs/fold-0/version_0
That will display a localhost URL, which you can open in a browser to display graphs such as this:
Using the configuration above would train a new model from scratch. It is also easy to do transfer learning from a HawkEars model. Here is a configuration that does that, freezing the feature extractor so only the classifier head is trained:
class Transfer_Learning(base_config.BaseConfig):
def __init__(self):
self.misc.train_pickle = "data/specs.pickle"
self.train.load_ckpt_path = "data/ckpt/hgnet1.ckpt"
self.train.freeze_backbone = True
self.train.num_epochs = 10
Specifying self.train.load_ckpt_path indicates that training should begin with a pretrained checkpoint (model). Setting freeze_backbone to true causes training to run much faster, since only the classifier head is trained. Also, if you have limited training data and domain shift is minimal, this may give you the best results. You can also set freeze_backbone to false, in which case the whole model is re-trained, but it is initialized with weights from HawkEars.
Review core/base_config.py to see all the additional settings you can use. For example, you can:
- Specify k-fold cross-validation.
- Change the learning rate and batch size.
- Change the number of checkpoints that are saved.
- Change data augmentation settings.
- Change audio and spectrogram settings.
If you want to try different audio or spectrogram settings after you create your training database, you can change the settings in core/base_config.py and then run tools/reextract.py. It will populate a new database with spectrograms using the new settings, using the same recordings and offsets as specified in the original database. If you are doing transfer learning from the HawkEars models though, you should stick with the default audio and spectrogram settings.
To use your trained models, simply copy the checkpoints to a suitable directory and run inference using analyze.py. When running analyze.py it will use the model(s) contained in the directory defined in the main_ckpt_folder parameter in core/base_config.py. One option is to backup the existing data/ckpt directory and then replace its contents with your newly created checkpoints. Alternatively, you can create a new directory, such as data/my-ckpt, and update the main_ckpt_folder parameter in core/base_config.py to point to your new directory.
To create an ensemble, simply copy multiple checkpoint files to your checkpoint directory. In that case, analyze.py will use the average score from all models. Make sure that all models were trained on the same class list though.
You may also need to edit data/ignore.txt, which lists classes that are ignored during inference.
In HawkEars/testing, there are three scripts to generate metrics from test data:
Script | For Use When |
---|---|
per_sound_tester.py | Ground truth annotations specify individual sounds |
per_minute_tester.py | Ground truth annotations specify classes per minute |
per_recording_tester.py | Ground truth annotations specify classes per recording |
The most granular data is generated by per_sound_tester.py. It requires an annotation CSV file with four columns:
Column Name | Description |
---|---|
recording | Stem of the recording file name (e.g. XC10400 if it’s XC10400.mp3) |
species | Class code |
start_time | Start of the sound in seconds from the beginning of the recording |
end_time | End of the sound in seconds from the beginning of the recording |
To use per_sound_tester.py, create an annotations file, then run analyze.py on your test recordings with --min_score 0, --overlap 0, and --merge 0. Then run per_sound_tester.py, which has the following main arguments:
Argument | Default | Description |
---|---|---|
--annotations | None | Path to annotations CSV file |
--recordings | None | Path to recordings directory |
--input | None | Name of directory containing output labels from inference (assumed to be a sub-directory of recordings directory) |
--output | None | Output directory for reporting |
The script will then output statistics to the console, and write a large number of reports and images (graphs) to the output directory. Many of the output file names have the suffix “annotated” or “trained”, indicating whether they include only the classes in the annotations, or all classes supported by the model. For example, here is a file called pr_curve_annotated.png from one test: