You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using the wtpsplit library to split text into segments. However, I would like to have the ability to limit the maximum length of each segment to a specific number of characters, such as 250 characters.
Currently, I don't see an option in the split() method to set a maximum length for the segments. It would be very helpful if such a feature could be added to the library.
If adding this feature is not feasible or aligned with the library's design goals, I would greatly appreciate any recommendations or alternative approaches to achieve the desired functionality of splitting text while keeping each segment within a specified maximum length.
Thank you for your time and consideration.
The text was updated successfully, but these errors were encountered:
Thanks for raising this! This is indeed a feature we have wanted to implement for some time now. We will add it eventually, but it may still take some time.
In the meantime, you can check out how some popular RAG libraries and their text splitters handle such cases, e.g., LangChain or LlamaIndex.
Btw, out of curiosity, would you mind sharing your use case for this feature?
Hello,
I am using the wtpsplit library to split text into segments. However, I would like to have the ability to limit the maximum length of each segment to a specific number of characters, such as 250 characters.
Currently, I don't see an option in the split() method to set a maximum length for the segments. It would be very helpful if such a feature could be added to the library.
If adding this feature is not feasible or aligned with the library's design goals, I would greatly appreciate any recommendations or alternative approaches to achieve the desired functionality of splitting text while keeping each segment within a specified maximum length.
Thank you for your time and consideration.
The text was updated successfully, but these errors were encountered: