Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task: Migrate ICU4N.Text.OpenStringBuilder as J2N.Text.StringBuilder #117

Open
NightOwl888 opened this issue Nov 4, 2024 · 0 comments
Open
Assignees
Labels
design is:feature pri:high up for grabs This issue is open to be worked on by anyone
Milestone

Comments

@NightOwl888
Copy link
Owner

IMPORTANT: Before getting started on this task, be sure to add using StringBuilder = System.Text.StringBuilder; in each file where StringBuilder is already in use. APIs that use System.Text.StringBuilder must still use System.Text.StringBuilder after this task is complete.

System.Text.StringBuilder has known performance problems that often make it a poor replacement for java.lang.StringBuilder. System.Text.StringBuilder is ideal for building large strings, but when building text and calling methods that require a ReadOnlySpan<char>, System.Text.StringBuilder is wholly inadequate. We need a StringBuilder that maintains a contiguous block of memory to call these methods. And since in Lucene.NET, StringBuilder is often used as a reusable buffer when building analyzer components, we need a public class for users.

We must also have a way to utilize the ArrayPool to make maximum reuse of memory in cases where a StringBuilder is used as a buffer in a disposable type.

What We Have

We already have an implementation to use that has been ported in ICU4N as OpenStringBuilder. This is primarily a port of ValueStringBuilder with the primary difference being that it is a class rather than a ref struct.

The name OpenStringBuilder was chosen because it is similar in concept to Lucene.Net.Analysis.Util.OpenStringBuilder and technically can be a replacement for it. The goal of that class is to provide a StringBuilder that the user can extend by inheriting it.

OpenStringBuilder has some Java-like methods already, such as their style of Replace() that uses indices instead of text to find to make a replacement of and Delete(), which is like Remove() but instead of throwing an exception when the indices are out of range it removes only the part that is in range and ignores the fact that the indices are out of bounds.

We also have some tests that are a combination of Apache Harmony and BCL tests:

https://github.com/NightOwl888/ICU4N/blob/80c0560da2d9c16f84df649def496cd169804031/tests/ICU4N.Tests/Support/Text/OpenStringBuilderTest.cs

And we have an implementation of PooledOpenStringBuilder, which uses ArrayPool to manage memory. This is a subclass of OpenStringBuilder.

What We Need

  • The new classes should be named J2N.Text.StringBuilder and J2N.Text.PooledStringBuilder
  • J2N.Text.StringBuilder should be a drop in replacement for System.Text.StringBuilder. So, it must implement every member that Systm.Text.StringBuilder has. This includes an implementation of GetChunks() even though it will only ever have a single chunk.
  • J2N.Text.StringBuilder must at a minimum implement the existing Java members that ICU4N.Text.OpenStringBuilder currently implements. Ideally, we would implement more of the API, but this can be challenging when there is existing .NET functionality that almost covers what Java does. Functionality that is entirely missing from .NET can be ported, but we should be sure to fix the APIs so they are .NET-like (guard clauses when indices are out of bounds or arguments are null, index and length rather than start and end index, etc.).
  • J2N.Text.StringBuilder must pass all of the BCL tests except for those that relate to how it manages memory internally.
  • J2N.Text.StringBuilder must pass the Apache Harmony tests for members that we bring over from Apache Harmony or the JDK.
  • J2N.Text.StringBuilder must use the same approach used by ICU4N.Text.OpenStringBuilder to manage memory.
  • J2N.Text.StringBuilder must have AsSpan() and AsMemory() overloads so it can be utilized with methods that accept ReadOnlySpan<char> or ReadOnlyMemory<char>.
  • J2N.Text.StringBuffer must implement IAppendable for compatibility with J2N APIs.
  • J2N.Text.StringBuffer must implement ICharSequence for compatibility with J2N APIs even though most future development will rely on conversion to ReadOnlySpan<char>.
  • J2N.Text.StringBuilder should have virtual methods (where sensible) so the user can extend it through inheritance. Test: It should allow extensibility in all cases where Lucene.Net.Analysis.Util.OpenStringBuilder does.
  • J2N.Text.StringBuilder should have protected fields.
  • J2N.Text.StringBuilder must have a subclass named J2N.Text.PooledStringBuilder that provides all of the same functionality, but manages memory with ArrayPool.
  • J2N.Text.StringBuffer should use J2N.Text.StringBuilder as a base class instead of wrapping System.Text.StringBuilder, but this change should wait until J2N 3.0.
  • J2N.Text.StringBuffer must have a fully documented API. For the most part, we can copy the docs from the BCL or Apache Harmony, as appropriate.

The implementations of ICU4N.Text.OpenStringBuilder and ICU4N.Text.PooledOpenStringBuilder can be used as a starting point. However, we would probably be best off organizing the members the same way as System.Text.StringBuilder.

For string comparison operations, we should not implement IComparable<T>, but instead rely on the ability to convert the class to ReadOnlySpan<char>, at least for the time being. Java defaults to ordinal and .NET to culture-sensitive, so it probably makes sense for users to wrap J2N.Text.StringBuilder in a custom comparer if they want to choose one or the other. This is up for debate, though.

We should consider carefully whether to make J2N.Text.StringBuilder implicitly convertible to ReadOnlySpan<char> and ReadOnlyMemory<T> so it can be used without the .AsSpan() and .AsMemory() methods as was the case in Java.

All extension methods in J2N.Text.StringBuilderExtensions and J2N.Character should be considered for addition to the J2N.Text.StringBuilder API if they haven't already been added to ICU4N.Text.OpenStringBuilder.

The tests for System.Text.StringBuilder can be found at:

The tests for java.lang.StringBuilder can be found at:

Side Notes

  • We should look at the string interpolation tests. They may not be relevant for this task, but for the J2N and ICU4N formatters we should look at how to provide direct interpolation support.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design is:feature pri:high up for grabs This issue is open to be worked on by anyone
Projects
None yet
Development

No branches or pull requests

1 participant