Skip to content

implement Iter::chunk_by #2029

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

juliano
Copy link
Contributor

@juliano juliano commented Apr 28, 2025

Copy link

peter-jerry-ye-code-review bot commented Apr 28, 2025

Potential memory inefficiency with fixed buffer capacity

Category
Correctness
Code Snippet
buffer = Array::new(capacity=16)
Recommendation
Consider using a dynamic initial capacity based on the previous buffer size or make it configurable:

let initial_capacity = max(16, buffer.capacity())
buffer = Array::new(capacity=initial_capacity)

Reasoning
Using a fixed capacity of 16 may be inefficient for iterators with large chunks. If most chunks are significantly larger than 16 elements, this will cause frequent reallocations. Using the previous buffer's capacity as a hint could improve performance.

Missing edge case documentation

Category
Maintainability
Code Snippet
/// Groups elements of an iterator according to a discriminator function.
Recommendation
Add documentation about edge cases:

/// Groups consecutive elements of an iterator according to a discriminator function.
/// 
/// # Notes
/// - Empty iterators produce no chunks
/// - Single element iterators produce one chunk
/// - The discriminator function is called exactly once per element

Reasoning
The current documentation doesn't mention important edge cases that are actually tested. Making these explicit helps users understand the behavior better and maintains the connection between docs and tests.

Potential issue with early return in yield pattern

Category
Correctness
Code Snippet
if current_key is Some(old_key) {
yield_((old_key, buffer.iter()))
} else {
IterContinue
}
Recommendation
Make the return value handling consistent:

if current_key is Some(old_key) {
  guard yield_((old_key, buffer.iter())) is IterContinue else {
    return IterEnd
  }
}
IterContinue

Reasoning
The current code doesn't handle the case where the last yield returns IterEnd. While this might be intentional, it's inconsistent with the handling of yields in the main loop. Making it consistent would prevent potential subtle bugs if the iterator protocol changes.

@coveralls
Copy link
Collaborator

coveralls commented Apr 28, 2025

Pull Request Test Coverage Report for Build 6575

Details

  • 9 of 9 (100.0%) changed or added relevant lines in 1 file are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage increased (+0.01%) to 92.411%

Files with Coverage Reduction New Missed Lines %
bench/stats.mbt 1 76.32%
Totals Coverage Status
Change from base Build 6574: 0.01%
Covered Lines: 6149
Relevant Lines: 6654

💛 - Coveralls

@juliano juliano force-pushed the iter-chunk-by branch 2 times, most recently from 1223960 to 74c2ad5 Compare April 29, 2025 10:33
@peter-jerry-ye
Copy link
Collaborator

I'm still not quite sure if the signature should be -> Iter[(K, Iter[V])] or -> Iter[(K, Array[V])]. Maybe we need some advices from the others here

@bobzhang
Copy link
Contributor

bobzhang commented May 6, 2025

Iter was introduced as a data conversion layer between different collections. It was designed that compiler can optimize all the indirections so such conversion is almost free.

It is less powerful than other iterators (e.g, lazy lists) where meomizations and zip is easy to implement.
So if people want powerful iterators, I think the best practice is to roll a new one instead of build it on top of Iter, as its original scope is quite limited (no meomization intended).

I would be happy to hear your thoughts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants