GPT-VCC v1.2
Changed API for speech recognition and made recording manual instead of automatic (to avoid premature stopping)
Change Log
New Features
- Switched speech recognition API from Google speech recognition to OpenAI's Whisper. This change enables higher quality transcriptions and enables non-english speakers to communicate with the bot. So far I've tested German, Spanish, and Japanese -- all of which were properly recognized and transcribed.
- Switched recording method from manual-trigger-automatic-stop to manual-trigger-manual-stop. This should help you speak for longer, think about what you're saying, and communicate more naturally. Trade off is that you'll have to press the space bar one more time.
- Added phrase commands to the GPTCLI help section (e.g. "please set tokens to ...").
- Changed message cancellation method from pressing 'p' to saying variations of 'cancel message'.
- Updated ElevenLabs TTS to use their new multilingual model. Now you can have the fancy TTS speak to you in languages like Spanish or German
Bug Fixes
- Fixed bug preventing the setting of tokens to a value that depends on the model being used.
- Fixed bugs associated with making the change to Whisper speech recognition and manual recording.
Controls
Keyboard
-
SPACEBAR: This starts and stops a recording. Whatever you say will be then transcribed and sent to GPT (if it passes filters) once you press space a second time.
-
ESCAPE: This exits without memorizing.
-
Q: This quits and has bot remember details about you and your conversations (data is saved in the text file called memories.txt)
-
P: This is a depreciated command to cancel a message. Now just say, "please cancel a message" while recording to cancel.
Voice Commands
-
Say 'please set tokens to #': When the bot recognizes this phrase, it will try to set the max_tokens of the reply to the value you specified.
-
Say 'speak like a robot': This will set all responses from GPT to be spoken with a robotic TTS program that works offline. In CLI mode, enter '!robospeak()' to toggle this mode.
-
Say 'stop speaking like a robot': This will revert bot's TTS to whatever you had before (either Google or ElevenLabs TTS). In CLI mode, enter '!robospeak()' to toggle this mode.
-
Say 'please display conversation': This will output your entire conversation to the terminal window.
-
Say 'please display memories': This will provide an output of all memories saved into long term storage.
-
Say 'please restore memory': This will attempt to repair the working memory of the bot by consolidating a certain number of memories from the long term storage .
-
Say 'please set preset to': This will set the preset (a text string given to AI at start of every conversation) for the bot. For example, the preset 'speak like a pirate' makes AI speak like a pirate. You can find example presets here: https://github.com/Adri6336/gpt-voice-conversation-chatbot/wiki/Example-Presets .
-
Say 'please reset preset': This will delete the preset you made.
-
Say 'please set name to': This will set the name of the bot to whatever you specify, so long as it is in accordance with OpenAI's usage policies. After setting name, the bot will refer to itself by the name you set.
-
Say 'please toggle gpt4': This will toggle between ChatGPT and GPT-4 models. On start up, your switch will be preserved. In CLI mode, enter '!gpt4()' to toggle the model.
-
Say 'please set creativity to': This will set the bot's default randomness to a value you specify between 1 and 15 (used to be 9). In CLI mode, use '!creativity(#)' where the # sign is a value between 0.01 and 1.5.
-
Say 'please list commands': This will have the bot list out the available commands for you.
-
Say 'please toggle ElevenLabs': This will toggle the bot's use of ElevenLabs TTS on and off. In CLI mode, use !11ai() to toggle it.
-
Say 'please cancel message': This will cancel the message, preventing it from being sent to GPT.