jBark Library Documentation

Overview and Introduction

jBark is a powerful Python library that builds upon the capabilities of the original Bark text-to-speech project [https://github.com/suno-ai/bark], adding simple voice conversion features. It provides a seamless interface for generating high-quality speech from text, extracting basic voice characteristics, and applying these characteristics to generated audio.

Key features of jBark include:

Text-to-speech generation using the Bark model
Simple voice characteristic extraction
Basic voice conversion using pitch shifting and tempo adjustment
Support for multiple languages
CPU-based computations (no GPU required)
Suppression of common warnings for a cleaner user experience

Whether you're developing a virtual assistant, creating audiobooks, or working on any project that requires flexible and high-quality speech synthesis, jBark provides the tools you need to bring your ideas to life.

Installation Guide

To install jBark, follow these steps:

Ensure you have Python 3.7 or later installed on your system.
Install jBark and its dependencies:
```
pip install jbark numpy torch scipy librosa resampy
```
Note: jBark uses CPU for computations by default. If you want to use GPU acceleration, make sure to install the appropriate CUDA-enabled version of PyTorch.

Usage Instructions

Here's a basic example of how to use jBark:

from jbark import JBark

# Initialize jBark
jbark = JBark()

# Generate audio from text
text = "Hello, this is a test of jBark text-to-speech."
output_path = "output.wav"
audio_array = jbark.generate_audio(text, output_path)

# Extract voice characteristics
sample_audio = "sample_voice.wav"
voice_chars = jbark.simple_voice_clone(sample_audio)

# Generate audio with simple voice conversion
converted_text = "This is speech using simple voice conversion."
converted_output_path = "converted_output.wav"
converted_audio = jbark.generate_with_cloned_voice(converted_text, voice_chars, converted_output_path)

# List supported languages
languages = jbark.list_supported_languages()
print("Supported languages:", languages)

This example demonstrates the basic workflow of generating speech, extracting voice characteristics, and applying simple voice conversion to the generated audio.

Configuration and Customization

jBark provides several options for customization:

Warning Suppression: By default, jBark suppresses common warnings. This behavior is handled internally and doesn't require user configuration.

Voice Presets: When generating audio, you can specify a voice preset:

audio_array = jbark.generate_audio(text, history_prompt="v2/en_speaker_6")

Voice Conversion Parameters: You can adjust the strength of pitch shifting and tempo adjustment by modifying the simple_voice_conversion method in the JBark class.

API Reference

JBark Class

`init(self)`

Initializes the JBark instance, suppresses warnings, and preloads necessary models.

`generate_audio(self, text_prompt: str, output_path: str = None, history_prompt: str = None) -> numpy.ndarray`

Generates audio from the given text prompt.

text_prompt: The text to convert to speech.
output_path: Optional. Path to save the generated audio.
history_prompt: Optional. Voice preset to use.

Returns: Numpy array containing the audio data.

`simple_voice_clone(self, audio_path: str) -> dict`

Extracts basic voice characteristics from an audio sample.

audio_path: Path to the audio sample for voice characteristic extraction.

Returns: Dictionary containing basic voice characteristics (pitch and tempo).

`generate_with_cloned_voice(self, text_prompt: str, voice_characteristics: dict, output_path: str) -> numpy.ndarray`

Generates audio using simple voice conversion based on extracted voice characteristics.

text_prompt: The text to convert to speech.
voice_characteristics: Dictionary containing voice characteristics (pitch and tempo).
output_path: Path to save the generated audio.

Returns: Numpy array containing the audio data.

`simple_voice_conversion(self, audio: numpy.ndarray, voice_characteristics: dict) -> numpy.ndarray`

Applies simple voice conversion to the input audio based on the given voice characteristics.

audio: Input audio array.
voice_characteristics: Dictionary containing voice characteristics (pitch and tempo).

Returns: Converted audio array.

`custom_time_stretch(self, audio: numpy.ndarray, rate: float) -> numpy.ndarray`

Custom time stretching function using resampling.

audio: Input audio array.
rate: Time stretch rate.

Returns: Time-stretched audio array.

`list_supported_languages(self) -> dict`

Returns a dictionary of supported languages.

Code Architecture and Design

jBark is designed with modularity and extensibility in mind. The main components are:

JBark Class: The central interface for all functionality.
Bark Model: Handles text-to-speech generation.
Simple Voice Conversion Module: Manages basic voice characteristic extraction and application.

The library follows a facade pattern, where the JBark class provides a simplified interface to the underlying text-to-speech and voice conversion systems.

Testing and Debugging

jBark comes with two test suites:

Basic Test Suite (test_jbark.py): To run the basic test suite:
```
python test_jbark.py
```
This will launch an interactive menu allowing you to test various features of the jBark library.
Expanded Test Suite (test2.py): To run the expanded test suite:
```
python test2.py
```
This suite provides more comprehensive testing, including variations in audio generation, voice cloning, voice conversion, language support, error handling, and performance testing.

For debugging, you can use Python's built-in pdb module or an IDE like PyCharm or VSCode.

Common Issues and FAQs

Q: Why is the audio generation slow? A: Audio generation speed depends on your hardware. jBark uses CPU for computations by default. For faster processing, consider using a machine with a more powerful CPU or implementing GPU support.

Q: How effective is the simple voice conversion? A: The simple voice conversion feature in jBark provides basic pitch and tempo adjustments. While it can alter some voice characteristics, it does not provide the same level of voice cloning quality as more advanced methods. Results may vary depending on the input text and target voice characteristics.

Q: How can I improve the quality of voice conversion? A: Use high-quality audio samples for voice characteristic extraction, ideally with clear speech and minimal background noise. You can also experiment with adjusting the pitch shifting and tempo adjustment parameters in the simple_voice_conversion method for better results.

Q: How do I use different language models? A: jBark supports multiple languages. You can specify the desired language when generating audio by using the appropriate language code in the history prompt. For example:

audio = jbark.generate_audio("Bonjour!", history_prompt="v2/fr_speaker_1")

For a list of supported languages and their codes, use the list_supported_languages() method.

For more questions and answers, visit our GitHub issues page or join our community forum.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
__pycache__		__pycache__
src		src
README.md		README.md
jbark.py		jbark.py
logo.png		logo.png
requirements.txt		requirements.txt
test.py		test.py
test2.py		test2.py
test_jbark.py		test_jbark.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

jBark Library Documentation

Table of Contents

Overview and Introduction

Installation Guide

Usage Instructions

Configuration and Customization

API Reference

JBark Class

`init(self)`

`generate_audio(self, text_prompt: str, output_path: str = None, history_prompt: str = None) -> numpy.ndarray`

`simple_voice_clone(self, audio_path: str) -> dict`

`generate_with_cloned_voice(self, text_prompt: str, voice_characteristics: dict, output_path: str) -> numpy.ndarray`

`simple_voice_conversion(self, audio: numpy.ndarray, voice_characteristics: dict) -> numpy.ndarray`

`custom_time_stretch(self, audio: numpy.ndarray, rate: float) -> numpy.ndarray`

`list_supported_languages(self) -> dict`

Code Architecture and Design

Testing and Debugging

Common Issues and FAQs

About

Releases

Packages

Languages

jgravelle/jBark

Folders and files

Latest commit

History

Repository files navigation

jBark Library Documentation

Table of Contents

Overview and Introduction

Installation Guide

Usage Instructions

Configuration and Customization

API Reference

JBark Class

__init__(self)

generate_audio(self, text_prompt: str, output_path: str = None, history_prompt: str = None) -> numpy.ndarray

simple_voice_clone(self, audio_path: str) -> dict

generate_with_cloned_voice(self, text_prompt: str, voice_characteristics: dict, output_path: str) -> numpy.ndarray

simple_voice_conversion(self, audio: numpy.ndarray, voice_characteristics: dict) -> numpy.ndarray

custom_time_stretch(self, audio: numpy.ndarray, rate: float) -> numpy.ndarray

list_supported_languages(self) -> dict

Code Architecture and Design

Testing and Debugging

Common Issues and FAQs

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

`init(self)`

`generate_audio(self, text_prompt: str, output_path: str = None, history_prompt: str = None) -> numpy.ndarray`

`simple_voice_clone(self, audio_path: str) -> dict`

`generate_with_cloned_voice(self, text_prompt: str, voice_characteristics: dict, output_path: str) -> numpy.ndarray`

`simple_voice_conversion(self, audio: numpy.ndarray, voice_characteristics: dict) -> numpy.ndarray`

`custom_time_stretch(self, audio: numpy.ndarray, rate: float) -> numpy.ndarray`

`list_supported_languages(self) -> dict`

Packages