Skip to content

v2.5.3

Latest
Compare
Choose a tag to compare
@bitfaster bitfaster released this 13 Jan 04:14
· 3 commits to main since this release
798fe76

What's changed

  • Eliminate volatile writes in ConcurrentLru internal bookkeeping code for pure reads, improving concurrent read throughput by 175%.
  • Vectorize the hot methods in CmSketch using Neon intrinsics for ARM CPUs. This results in slightly better ConcurrentLfu cache throughput measured on Apple M series and Azure Cobalt 100 CPUs.
  • Unroll loops in the hot methods in CmSketch. This results in slightly better ConcurrentLfu throughput on CPUs without vector support (i.e. neither x86 AVX2 nor Arm Neon).
  • On vectorized code paths (AVX2 and Neon), CmSketch allocates the internal buffer using the pinned object heap on .NET6 or newer. Use of the fixed statement is removed, eliminating a very small overhead. Sketch block pointers are then aligned to 64 bytes, guaranteeing each block is always on the same CPU cache line. This provides a small speedup for the ConcurrentLfu maintenance thread by reducing CPU cache misses.
  • Minor improvements to the AVX2 JITted code via MethodImpl(MethodImplOptions.AggressiveInlining) and removal of local variables to improve performance on .NET8/9 and dynamic PGO.

Full changelog: v2.5.2...v2.5.3