You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Anime AI Waifu is an AI powered voice assistant with VTuber's model, that combines the charm of anime characters with cutting-edge technologies. This project is meant to create an engaging experience where you can interact with desired character in real-time without powerful hardware.
Features
🎤 Voice Interaction: Speak to your AI waifu and get instant (almost) responses.
Whisper - openai's paid speech recognition.
Google sr - free speech recognition alternative.
Console - if you don't want use microphone just type prompts with your keyboard.
🤖 AI Chatbot Integration: Conversations are powered by an AI chatbot, ensuring engaging and dynamic interactions.
Openai's 'gpt-3.5-turbo' or any other available model.
File with personality and behaviour description.
Remembers previous messages.
📢 Text-to-Speech: Hear your AI waifu's responses as she speaks back to you, creating an immersive experience.
Google tts - free and simple solution.
ElevenLabs - amazing results, tons of voices.
Console - get text responses in your console (but VTube model will be just idle).
🌐 Integration with VTube Studio: Seamlessly connect your AI waifu to VTube Studio for an even more lifelike and visually engaging interaction.
Lipsync while talking.
Showcase
*Demonstration in real time without cutouts or speed up. This is real delay in answers.
Installation
To run this project, you need:
Install Python 3.10.5 if you don't already have it installed.
Clone the repository by running git clone https://github.com/JarikDem-Bot/ai-waifu.git
Install the required Python packages by running pip install -r requirements.txt in the project directory.
Create .env file inside the project directory and enter your API keys
Select CABLE Output as microphone. Select Preview microphone audio to hear waifu's answers
Select input and output for Mouth Open. Optionally you can set "breathing" to get idle movents.
Select your required settings in main.py in waifu.initialize
Arguments:
user_input_service (str) - the way to interact with Waifu
"whisper" - OpenAI's whisper speech to text service; paid, requires OpanAi API key.
"google" - free google speech to text service.
"console" - type your promt in console with text (absoulutely free).
None or unspecified - default value is "whisper".
stt_duration (float) - the maximum number of seconds that it will dynamically adjust the threshold for before returning. This value should be at least 0.5 in order to get a representative sample of the ambient noise. Default value is 0.5.
mic_index (int) - index of the device to use for audio input. If None or unspecified will use default microphone.
chatbot_service (str) - service that will generate responses
"openai" - OpenAI text generation servise; paid, requires OpanAi API key.
"test" - returns prewritten message; used as dummy text for developement to reduce time and cost of testings.
None or unspecified - default value is "openai".
chatbot_model (str) - model used for text generation. List of available models you can find here. Default value is "gpt-3.5-turbo".
chatbot_temperature (float) - determines creativity of the generated text. A higher value leads to more creative result. A lower value leads to less creative and more similar results. Default value is 0.5.
personality_file (str) - relative path to txt file with waifu's description. Default value is "personality.txt".
tts_service (str) - service that "reads" Waifu's responses
"google" - free Google's tts, voice feels very "robotic".
"elevenlabs" - ElevenLabs tts with good quality; paid, requires ElevenLabs API key.
"console" - output will be printed in console (free).
None or unspecified - default value is "google".
output_device - (int) output device ID or (str) output device name substring. If VB-Cable is used, you need to find device, that will start with CABLE Input (VB-Audio Virtual using sd.query_devices() command.
tts_voice (str) - ElevenLabs voice name. Default value is "Elli".
tts_model (str) - ElevenLabs model. Recommended values are "eleven_monolingual_v1" and "eleven_multilingual_v1". Default value is "eleven_monolingual_v1".
Run the project by executing python main.py in the project directory.
Depending on the selected input mode, program may send all recorded sounds or other data to the 3-rd parties such as: Google (stt, tts), OpenAI (stt, text generation), ElevenLabs (tts).