Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement: "push to talk" and keyboard shortcuts for easier voice prompting (STT) #4807

Open
1 task done
danielrosehill opened this issue Nov 29, 2024 · 1 comment
Open
1 task done
Labels
enhancement New feature or request

Comments

@danielrosehill
Copy link

What features would you like to see added?

Hey!

I would really love to begin prompting by speech (ie, using voice recognition)

If it would be of interest, I'd also like to contribute some documentation around the various STT features as I couldn't find the parameters covered in the STT page.

image

Specifically: what does "conversation mode" toggle on and "auto transcribe audio".

I have a couple of ideas for this which I'm batching under one feature enhancement with the intention of looking into the feasibility of trying to work on these myself:

  • Hotkey support to start and stop voice detection to facilitate (almost) hands-free usage
  • Some implementation of "push to talk" mode ... hold down an icon (e.g. the mic button) until you're ready to send.

The second feature is really just a workaround for what I find to be the main frustration of STT and which is specifically challenging when trying to use it for prompting: the automatic cutoffs / pause detection. I don't know if this is baked into the engine or if it's a parameter that can be adjusted. But it would be really helpful to increase the buffer time to a few seconds so that users had time to think about what they want to instruct.

More details

I think the above pretty much covers it!

I'm possibly in the minority of LLM users who feel this way, but I find the idea of voice prompting much more potentially useful than having real time chats with LLMs (ie, simultaneous STT and TTS). I mean, it would be nice to have both. But if I had to choose, voice prompting would actually speed up my workflow the most!

Which components are impacted by your request?

General, UI

Pictures

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@danielrosehill danielrosehill added the enhancement New feature or request label Nov 29, 2024
@berry-13
Copy link
Collaborator

@danielrosehill hey, my bad for the late reply, I didn’t fully get the question at first. I hadn’t thought about the "push to talk" thing, but you can already trigger the STT with Shift + Alt + L

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants