You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I'm doing a test of extracting se_features.pt with openvoice and putting it in another TTS speaker_embedding. However, the result is too noisy.
I extracted the SE file with openvoice, the size of the tensor comes out (1,256,1), but the size required by the TTS model is (1,512,1), so I padded it and adjusted the size. Maybe this is the problem, so I fixed the size of the tensor in a different way, but it was still there. When extracting the SE file with openvoice, is there a way to extract it in size (1,512,1)?
The text was updated successfully, but these errors were encountered:
Hey @pes0427! 👋 I'm here to assist you with any bugs, questions, or contributions you have. Let's squash those bugs together!
The tensor size of the speaker embeddings extracted with OpenVoice is determined by the architecture of the ref_enc module, which outputs a tensor of size (1, 256, 1) [1]. Unfortunately, there is no direct configuration or parameter in OpenVoice to specify a different tensor size, such as (1, 512, 1) [2].
To address the size mismatch, you might consider transforming the output tensor using techniques like linear transformation or neural network layers to map the 256-dimensional embeddings to 512 dimensions. This approach could help maintain the integrity of the embeddings while adapting them to the required size for your TTS model.
Hello, I'm doing a test of extracting se_features.pt with openvoice and putting it in another TTS speaker_embedding. However, the result is too noisy.
I extracted the SE file with openvoice, the size of the tensor comes out (1,256,1), but the size required by the TTS model is (1,512,1), so I padded it and adjusted the size. Maybe this is the problem, so I fixed the size of the tensor in a different way, but it was still there. When extracting the SE file with openvoice, is there a way to extract it in size (1,512,1)?
The text was updated successfully, but these errors were encountered: