Skip to content

Commit

Permalink
doc: update information
Browse files Browse the repository at this point in the history
  • Loading branch information
syq163 committed Dec 15, 2023
1 parent d6ed14f commit d32305f
Show file tree
Hide file tree
Showing 4 changed files with 18 additions and 17 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ You may find more information from our [wiki](https://github.com/netease-youdao/

## Training

To be released.
[Voice Cloning with your personal data](https://github.com/netease-youdao/EmotiVoice/wiki/Voice-Cloning-with-your-personal-data) has been released on December 13th, 2023.


## Roadmap & Future work
Expand Down
2 changes: 1 addition & 1 deletion README.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ uvicorn openaiapi:app --reload

## 训练

待推出
[用你自己的数据定制音色](https://github.com/netease-youdao/EmotiVoice/wiki/Voice-Cloning-with-your-personal-data)已于2023年12月13日发布上线

## 路线图和未来的工作

Expand Down
13 changes: 7 additions & 6 deletions ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,15 +10,16 @@ The plan is to finish 0.2 to 0.4 in Q4 2023.
## EmotiVoice 0.4

- [ ] Updated model with potentially improved quality.
- [ ] If time allows, release training code to support fine-tuning using your own data.

## EmotiVoice 0.3

- [ ] First version of desktop application.
- [ ] Support longer text.
- [ ] Documentation: wiki page for hardware requirements. [#30](../../issues/30)

## EmotiVoice 0.2
## EmotiVoice 0.3 (2023.12.13)

- [x] Release [The EmotiVoice HTTP API](https://github.com/netease-youdao/EmotiVoice/wiki/HTTP-API) provided by [Zhiyun](https://mp.weixin.qq.com/s/_Fbj4TI4ifC6N7NFOUrqKQ).
- [x] Release [Voice Cloning with your personal data](https://github.com/netease-youdao/EmotiVoice/wiki/Voice-Cloning-with-your-personal-data) along with [DataBaker Recipe](https://github.com/netease-youdao/EmotiVoice/tree/main/data/DataBaker) and [LJSpeech Recipe](https://github.com/netease-youdao/EmotiVoice/tree/main/data/LJspeech).
- [x] Documentation: wiki page for hardware requirements. [#30](../../issues/30)

## EmotiVoice 0.2 (2023.11.17)

- [x] Support mixed Chinese and English input text. [#28](../../issues/28)
- [x] Resolve bugs related to certain modal particles, to make it more robust. [#18](../../issues/18)
Expand Down
18 changes: 9 additions & 9 deletions data/DataBaker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,8 @@ mkdir data/DataBaker/raw

### Step1 Preprocess Data

For this recipe, since DataBaker has already provided phoneme labels, we will simply utilize that information.

```bash
# format data
python data/DataBaker/src/step1_clean_raw_data.py \
Expand All @@ -50,16 +52,14 @@ python data/DataBaker/src/step2_get_phoneme.py \
--data_dir data/DataBaker
```

If you want to get phonemes from the TTS frontend, then run:
```bash
# get phoneme
python data/DataBaker/src/step2_get_phoneme.py \
--data_dir data/DataBaker \
--generate_phoneme True
```
If you have prepared your own data with only text labels, you can obtain phonemes using the Text-to-Speech (TTS) frontend. For example, you can run the following command: `python data/DataBaker/src/step2_get_phoneme.py --data_dir data/DataBaker --generate_phoneme True`. However, please note that in this specific DataBaker's recipe, you should omit this command.



### Step2 Run MFA (Optional, since we already have labeled prosody)

Please be aware that in this particular DataBaker's recipe, **you should skip this step**. Nonetheless, if you have already prepared your own data with only text labels, the following commands might assist you:

```bash
# MFA environment install
conda install -c conda-forge kaldi sox librosa biopython praatio tqdm requests colorama pyyaml pynini openfst baumwelch ngram postgresql -y
Expand Down Expand Up @@ -174,7 +174,7 @@ Training tips:
tensorboard --logdir=exp/DataBaker
```
- The model checkpoints are saved at `exp/DataBaker/ckpt`.
- The bert features are extracted in the first epoch and saved in `tmp/` folder, you can change the path in `exp/DataBaker/config/config.py`.
- The bert features are extracted in the first epoch and saved in `exp/DataBaker/tmp/` folder, you can change the path in `exp/DataBaker/config/config.py`.


### Step5 Inference
Expand All @@ -187,6 +187,6 @@ python inference_am_vocoder_exp.py \
--checkpoint g_00010000 \
--test_file $TEXT
```
__Please change the speaker name in the `data/inference/text`__
__Please change the speaker names in the `data/inference/text`__

the synthesized speech is under `exp/DataBaker/test_audio`.

0 comments on commit d32305f

Please sign in to comment.