Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync::Cache evicts entries when there's still space in other shards #55

Open
LafeWessel opened this issue Oct 16, 2024 · 15 comments
Open
Labels
question Further information is requested

Comments

@LafeWessel
Copy link

I've been experiencing an issue with the sync::Cache where it will evict cache entries when the cache is not yet full, the hot_allocation parameter is set very low (<= 0.05), and the cache_shards are set to any number >1. The issue appears to stem from having multiple cache shards as I cannot reproduce it with only 1 shard.

My workload has a high bias towards recency and thus I have set the hot_allocation parameter very low, usually < 0.05. I have tried multiple ghost_allocation values ([0.01, 0.5]), and that does not appear to cause the issue. The issue does not happen every time when the cache shards are set to > 1 (or default/left unset), but it happens enough that I see significant performance penalties for re-fetching evicted data.

If desirable, I can put together a reproduction of the issue and add it here.

@arthurprs
Copy link
Owner

arthurprs commented Oct 16, 2024

One thing that comes to mind is that the cache will immediately evict anything (unless pinned) larger than the maximum weight of hot section. That will be tiny given that hot_allocation, and it'll also decrease linearly with the number of shards.

In general, I think the cache might behave oddly with a hot_allocation < 0.5, even for use cases where it's recency-biased, so I might have to clamp that value in the future.

May I suggest trying 0.5 in your workload?

@LafeWessel
Copy link
Author

I've tested with a variety of hot_allocation values from 0.01 to 0.99 and determined that any values > 0.10 have pretty severe performance impacts. I can do some further testing if that's desirable. It is also worth noting that I am using the UnitWeighter and, even if I wasn't, all the cache entries are the same size.

My workload is essentially a single pass through a set of data (with some "jitter" back and forth). I'm fairly memory-constrained, so I've been trying to keep the cache as small as possible while also preventing thrashing. From my testing, even small amount of lost cache accuracy (~0.3%) results in a 2x performance degradation. I've attached several plots below that I created from my testing.

Personally, I would prefer that the hot_allocation parameter is not clamped at all. Maybe make a note about the behavior instead. Please let me know if there's other information that I can supply to help.

image-20241007-122111

image-20241007-123458

This plot was created with hot_allocation set to 0.05.

image-20241007-131002

@LafeWessel
Copy link
Author

It's worth noting that I was experiencing this issue using a cache size of 256, not the 32 I mentioned above.

@arthurprs
Copy link
Owner

arthurprs commented Oct 16, 2024

Wow, this is a wild use case, a 0.1% change of hit ratio causes a double digit% performance hit.

Given your description and graph, does a pure LRU work well in your case?

@arthurprs
Copy link
Owner

arthurprs commented Oct 16, 2024

Are these graphs generated with an unsharded/unsync cache? Because the sharded/sync cache has some hardcoded minimum sizes per shard and rounding, so they could skew your results.

@LafeWessel
Copy link
Author

Yes, a pure LRU cache would work well, but I haven't seen a good sync implementation in the Rust ecosystem that is Send + Sync (could well be a skill issue on my part...). I tried the lru crate with a Arc<Mutex<LruCache>> and Arc<RwLock<LruCache>>, but it was extremely slow in comparison to quick_cache. I've used moka as well, but, again, it wasn't as fast as quick_cache. If it's not clear already, the workload I'm running does a lot of number crunching and needs to be extremely performant.

While the workload that I'm sharing does not need to be Send + Sync, I have other workloads that do require the struct holding the cache to be Send + Sync. I could have different versions of that struct to use the unsync and sync caches as necessary but haven't gone down that path yet.

@LafeWessel
Copy link
Author

Would it be difficult to allow different expiration policies for your cache implementation? I'm going to guess so based on poking around the code a bit but am nonetheless curious.

@arthurprs
Copy link
Owner

A fully generic policy would be a large refactor, so I think difficult.

That said, it should be reasonably easy to make it work in a CLOCK mode (aka. 2nd-chance or approx. LRU). It should be roughly equivalent to a 0%/100% hot allocation (I'm not sure which one), but it may need some modifications.

@arthurprs
Copy link
Owner

If you have the availability, could you try to create a reproduction? Evicting an entry when there's still space looks like a bug.

@LafeWessel
Copy link
Author

I tested locally, and this appears to reproduce the behavior. If you set the shards to > 1, then you will see eviction messages even though (by default) the number of samples passed over is the same as the size of the cache. If you set the shards to 1, then you will not see any eviction messages.

use clap::Parser;
use quick_cache::Lifecycle;

#[derive(Clone, Debug)]
struct LoggingLifecycle;

impl<Key, Val> Lifecycle<Key, Val> for LoggingLifecycle
where
    Key: std::fmt::Debug,
{
    type RequestState = ();

    fn begin_request(&self) -> Self::RequestState {}

    fn on_evict(&self, _state: &mut Self::RequestState, key: Key, _val: Val) {
        println!("Evicting item {key:?}")
    }
}

/// CLI for the application
#[derive(Parser)]
struct App {
    /// Number of cache shards to initialize the cache with
    #[arg(short, long, default_value_t = 1)]
    shards: usize,
    /// Hot allocation parameter to use
    #[arg(long, default_value_t = 0.05)]
    hot_allocation: f64,
    /// Size of the cache
    #[arg(short, long, default_value_t = 100)]
    capacity: u64,
    /// Number of passes through the data
    #[arg(long, default_value_t = 10)]
    passes: u32,
    /// Number of samples in a single pass
    #[arg(long, default_value_t = 100)]
    samples: u32,
}

fn main() {
    let args = App::parse();

    let opts = quick_cache::OptionsBuilder::new()
        .weight_capacity(args.capacity)
        .estimated_items_capacity(args.capacity as usize)
        .hot_allocation(args.hot_allocation)
        .shards(args.shards)
        .build()
        .unwrap();

    let cache: quick_cache::sync::Cache<
        u32,
        u32,
        quick_cache::UnitWeighter,
        quick_cache::DefaultHashBuilder,
        LoggingLifecycle,
    > = quick_cache::sync::Cache::with_options(
        opts,
        quick_cache::UnitWeighter,
        quick_cache::DefaultHashBuilder::default(),
        LoggingLifecycle {},
    );

    for pass in 0..args.passes {
        println!("Running pass {pass}/{}", args.passes);
        for sample in 0..args.samples {
            // try to get the item at the sample or insert it with separate calls to mimic the
            // workload as best we can
            match cache.get(&sample) {
                Some(_) => {}
                None => {
                    cache.insert(sample, sample);
                }
            }
        }
    }
}

@LafeWessel
Copy link
Author

This is a pretty close approximate to my workload as I make several passes over a set of data in a (more or less) sequential order. There isn't a way to synchronize the passes to only make a single pass over the data.

@arthurprs
Copy link
Owner

arthurprs commented Oct 16, 2024

I get the confusion now.

When using the sync cache (sharding), you get N shards with approx Capacity / N each. Since shards work in isolation, eviction may kick in one shard even though others may have spare capacity.

This is usually not observable in practice as caches are much larger than the number of shards, but here, they're tiny.

@LafeWessel
Copy link
Author

Ahhh, I see, that would explain it.

@LafeWessel
Copy link
Author

LafeWessel commented Oct 16, 2024

Would it make sense to do something like moka that allows for changing the eviction policy to one of two options? That way the user can make some configuration on their own without making it fully generic.

https://docs.rs/moka/latest/moka/policy/struct.EvictionPolicy.html

@arthurprs
Copy link
Owner

Yes. That's what I intended with the ghost and hot section configurations. These should be enough knobs for most workloads (but maybe it needs a third one for the frequency counter).

@arthurprs arthurprs added the question Further information is requested label Oct 22, 2024
@arthurprs arthurprs changed the title sync::Cache evicts entries when the cache is not yet full with >1 cache shards sync::Cache evicts entries when there's still space in other shards Oct 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants