Skip to content

Commit

Permalink
v0.1.1: microphone selection support
Browse files Browse the repository at this point in the history
  • Loading branch information
lee-b committed May 21, 2023
1 parent 6b5e2e6 commit 30659a1
Show file tree
Hide file tree
Showing 5 changed files with 36 additions and 10 deletions.
22 changes: 18 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,16 @@ You can tweak the assistant name, speech-to-text model, text-to-speech model, pr

- Install as instructed below
- Make sure `koboldcpp` (preferably), `koboldai` or `text-generation-webui` are running, with a suitable LLM model loaded, and serving a KoboldAI compatible API at `http://localhost:5001/api/v1/generate` (see Configuration, below, if you need to change this URL).
- Run one or more of the commands below. If you get any errors about missing libraries, follow the instructions about that under Installation, below.

### `serve`

- Run `kobold-assistant serve` after installing.
- Give it a while to start up, especially the first time, as it downloads a few GB of AI models to do the text-to-speech and speech-to-text.
- If you get any errors about missing libraries, follow the instructions about that under Installation, below.
- Give it a while (at least a few minutes) to start up, especially the first time that you run it, as it downloads a few GB of AI models to do the text-to-speech and speech-to-text, and does some time-consuming generation work at startup, to save time later. It runs much more comfortably once it gets to the main conversational loop.

### `list-mics`

This lists available microphones that `kobold-assistant` can use, to listen for instructions. See the Configuration and Troubleshooting sections below, for more details on `list-mics` and related settings.

## Requirements

Expand Down Expand Up @@ -61,6 +67,12 @@ and `new_value` is some new value that you want to use instead.

**NOTE:** Some values depend on others. For now, you need to copy any dependent variables that come after the variable that you're modifying into your file, so that they use the custom setting. Again, this is hacky, and I'll clean it up soon.

The most important settings are:

```
MICROPHONE_DEVICE_INDEX # The device number of the microphone to listen for instructions on. Run `kobold-assistant list-mics` for a list.
GENERATE_URL # The server KoboldAI API endpoint for generating text from a prompt using a large language model
```

## Building (for developers)

Expand All @@ -84,9 +96,11 @@ This is a bug in the TTS library, if you press Ctrl-C while it's download a mode

### 'Detected speech-to-text hallucination: ...'

This happens when the whisper text-to-speech model hallucinates, and kobold-assistant notices. Essentially, it just means that the text-to-speech model misheard you, or only heard noise and made a guess. Check your microphone settings, that the default microphone works, it's not too quiet or too loud, and so on, or just try again: kobold-assistant will recover from this and just go on as if you didn't say anything yet. If this happens every time, your default microphone setup isn't working. I'll add more intelligent microphone selection and settings for the mic choice in future.
**CHECK the MICROPHONE_DEVICE_INDEX setting. See Running, above.**

This happens when the whisper text-to-speech model hallucinates, and kobold-assistant notices. Essentially, it just means that the text-to-speech model misheard you, or only heard noise and made a guess. Check the MICROPHONE\_DEVICE\_INDEX setting (or it may be listening for audio on a device that's not producing any audio!). Check your microphone settings (such as the microphone volume and noise cancellation options), and generally ensure that your microphone works: that it's not too quiet or too loud, and so on. OR, just try again: kobold-assistant will try to recover from this and just go on as if you didn't say anything yet. If this happens every time, though, you have a configuration issue.

There may be other hallucinations (random text detected that you didn't actually say) that whisper generates. If you encounter any others, please file a PR or bug report.
There may be other hallucinations (random text detected that you didn't actually say) that whisper generates, that aren't currently detected. If you encounter any others, please file a PR or bug report. However, sometimes it will just mishear what you say; that much is normal. Try to perfect your microphone settings, and enunciate as clearly as you can.


## Bugs and support
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[tool.poetry]
name = "kobold_assistant"
version = "0.1.0"
version = "0.1.1"
description = ""
authors = ["Lee Braiden <leebraid@gmail.com>"]
license = "AGPL v3"
Expand Down
1 change: 1 addition & 0 deletions src/kobold_assistant/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
custom_settings.py
19 changes: 14 additions & 5 deletions src/kobold_assistant/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,8 +38,9 @@


def text_to_phonemes(text: str) -> str:
# passthrough, since we get around this by prompting the AI to
# spell-out any abbreviations instead.
# passthrough, since we (try to) get around this by prompting
# the AI to spell-out any abbreviations instead, for now.
# (but it doesn't work with the current prompt)
return text


Expand All @@ -53,7 +54,8 @@ def prompt_ai(prompt: str) -> str:
response_json = f.read().decode('utf-8')
response = json.loads(response_json)['results'][0]['text']

print(f"prompt_ai({prompt!r}) -> {response!r}")
print(f"The AI returned {response!r}")

return response


Expand Down Expand Up @@ -230,7 +232,7 @@ def serve():

# set up microphone and speech recognition
stt_engine = stt.Recognizer()
mic = stt.Microphone()
mic = stt.Microphone(device_index=settings.MICROPHONE_DEVICE_INDEX)

# configure speech recognition
stt_engine.energy_threshold = settings.STT_ENERGY_THRESHOLD
Expand Down Expand Up @@ -267,7 +269,7 @@ def serve():

def main():
parser = argparse.ArgumentParser()
parser.add_argument('mode', choices=('serve',))
parser.add_argument('mode', choices=('serve', 'list-mics',))

args = parser.parse_args()

Expand All @@ -277,3 +279,10 @@ def main():
except KeyboardInterrupt:
print("Exiting on user request.")

elif args.mode == "list-mics":
stt_engine = stt.Recognizer()
print(f"Using mic_device_index {settings.MICROPHONE_DEVICE_INDEX}, per settings. These are the available microphone devices:\n")
mic_list = stt.Microphone.list_microphone_names()
for k, v in enumerate(mic_list):
print(f"Device {k}: {v}")

2 changes: 2 additions & 0 deletions src/kobold_assistant/default_settings.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
MICROPHONE_DEVICE_INDEX = 0

USER_NAME = "User"

GENERATE_URL = "http://localhost:5001/api/v1/generate"
Expand Down

0 comments on commit 30659a1

Please sign in to comment.