Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to limit the maximum length of each split text segment #128

Open
Swarzox opened this issue Sep 5, 2024 · 1 comment
Open
Labels
enhancement New feature or request

Comments

@Swarzox
Copy link

Swarzox commented Sep 5, 2024

Hello,

I am using the wtpsplit library to split text into segments. However, I would like to have the ability to limit the maximum length of each segment to a specific number of characters, such as 250 characters.

Currently, I don't see an option in the split() method to set a maximum length for the segments. It would be very helpful if such a feature could be added to the library.

If adding this feature is not feasible or aligned with the library's design goals, I would greatly appreciate any recommendations or alternative approaches to achieve the desired functionality of splitting text while keeping each segment within a specified maximum length.

Thank you for your time and consideration.

@markus583
Copy link
Collaborator

Hi,

Thanks for raising this! This is indeed a feature we have wanted to implement for some time now. We will add it eventually, but it may still take some time.
In the meantime, you can check out how some popular RAG libraries and their text splitters handle such cases, e.g., LangChain or LlamaIndex.

Btw, out of curiosity, would you mind sharing your use case for this feature?

@markus583 markus583 added the enhancement New feature or request label Sep 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants