Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an Allocator::deallocate_zeroed trait method #137

Open
fitzgen opened this issue Dec 30, 2024 · 3 comments
Open

Add an Allocator::deallocate_zeroed trait method #137

fitzgen opened this issue Dec 30, 2024 · 3 comments

Comments

@fitzgen
Copy link
Member

fitzgen commented Dec 30, 2024

Summary

Add an Allocator::deallocate_zeroed trait method that is a dual to allocate_zeroed. It would inherit Allocator::deallocate's unsafe contract and additionally require that callers only pass memory that is already zeroed.

trait Allocator {
    // ...

    unsafe fn deallocate_zeroed(&self, ptr: NonNull<u8>, layout: Layout) { ... }
}

Motivation

I am working in a soft real-time embedded system where

  1. I need to allocate large blocks of zeroed memory on the critical path
  2. I cannot use virtual memory or make syscalls

Because of (2) I cannot rely on fresh mmaps to get zero pages, use madvise(DONTNEED), or let the OS zero pages in the background for me.

Instead, I use a custom allocator which keeps track of already-zeroed memory blocks. This makes on-demand zeroing during allocation unnecessary, and we can avoid zeroing memory on the critical path. The zeroing is performed asynchronously, off the critical path, in a cooperatively-yielding loop that zeroes up-to-N-bytes-large chunks of a memory block during each iteration. Additionally, I can sometimes avoid zeroing the full memory block, relying on application logic to determine that certain portions of the (initially zero) memory block were not modified (and therefore remain zeroed). Once the memory block has been fully zeroed, it is then returned to the allocator, which inserts the block into its already-zeroed freelist.1

Anyways, this system works if I call my allocator's methods directly, but it does not play nice with the Allocator trait. This prevents abstraction, librarification, and separation of concerns.

Open Questions

  • Should this be a provided method? If so, what should the default be?

    • Should it be effectively a memset(0) followed by self.free?

      core::ptr::write_bytes(pointer, 0, size)
      self.free(pointer)

      That would always be correct, and is ideal when there is no OS/virtual memory. However, it may be undesirable for large allocations when virtual memory and syscalls are available, and you can just do the equivalent of madvise(DONTNEED). The tricky bit is that (AFAIK) there isn't a mechanism for std to override default provided methods of traits from alloc. I suppose concrete Allocator implementations can always override the method to do madvise(DONTNEED) themselves, but if every implementation is doing the same override under the same conditions, then that seems unfortunate.

    • Alternatively, the method could be fallible, and could simply return an error by default and force callers to figure out what to do in that case. This essentially un-asks the previous question around memset(0) vs madvise(DONTNEED). But it doesn't really seem like all callers can make effective decisions here, what should a collection library do if the method returns an error? Unclear.

    • We could also make it required method, but that seems like an annoying complication to an otherwise-simple trait. Especially for a relatively niche use case.

Alternatives

  • The biggest alternative I can think of is to simply not add this trait method, force people with these use cases to tie their application to a specific allocator, and accept that librarification is unattainable. I don't think this is the end of the world, but I do think it is not ideal.

Footnotes

  1. This system is similar to the concurrent bulk zeroing described in Why Nothing Matters: The Impact of Zeroing by Yang et al, and Allocator::allocate_zeroed's default implementation is what that paper calls "hot-path zeroing". Essentially, the Allocator trait only supports lazy, on-demand, "hot-path" zeroing. Adding this method would extend its support to eager, concurrent, background zeroing.

@CAD97
Copy link

CAD97 commented Dec 31, 2024

The default implementation would just be to call deallocate, if the memory being zero is a safety prerequisite. Doing a memset would be unnecessary in this case.

How do you expect collections to make use of deallocate_zeroed? If the memory contained values at some point, it's not zeroed anymore, and it's going to be more efficient to give the block to the allocator to clean up than it is to zero the memory before passing it back. So the extent of data structures that can see benefit from deallocate_zeroed is quite small, likely just those written alongside a project providing its own allocator, thus able to directly use the allocator's specific methods.

Adding deallocate_zeroed would extend its support to eager, concurrent, background zeroing.

No? I think you must have mixed something up somewhere. Background zeroing is already supported just fine, e.g. something like

unsafe fn deallocate(&self, ptr: NonNull<u8>, layout: Layout) {
    self.work_queue.push(|| async move {
        cooperative_memset(ptr, 0, layout.size()).await;
        unsafe { self.deallocate_zeroed(ptr, layout) };
    });
}

deallocate_zeroed supports the application reporting if an entire allocation is still zeroed. If it's a general purpose collection, this effectively only happens when allocate_zeroed was used and nothing was written to the allocation. This is not a meaningful use case IMO.

@fitzgen
Copy link
Member Author

fitzgen commented Jan 21, 2025

Thanks for the response!

The default implementation would just be to call deallocate, if the memory being zero is a safety prerequisite. Doing a memset would be unnecessary in this case.

Thanks, I was indeed thinking backwards in this case (was thinking about adapting a deallocate to deallocate_zeroed instead of the other way around).

How do you expect collections to make use of deallocate_zeroed?

A Vec-like data structure could keep track of a max-len high water mark, for example, and then only zero the data below that line, relying on the fact that the memory above that line was never written to (and originally allocated with allocate_zeroed).

Background zeroing is already supported just fine

Two things:

  1. This example does additional, unnecessary zeroing when the allocation is already known to be zeroed (or mostly zeroed, like in the above example where a collection tracks which parts are guaranteed zeroed or not).

  2. This example is not composable. It works if you are implementing the allocator, its trait implementation, and the background-zeroing runtime all in on place. It doesn't work too well when you are trying to reuse existing crates and/or are writing allocator combinators and wrappers.


Anyways, I do appreciate that this is a fairly niche use case, and might not be worth investing in from an ecosystem point of view.

FWIW, I've started a crate that introduces a DeallocateZeroed extension trait and a ZeroAwareAllocator that layers on top of another Allocator implementation and adds bookkeeping for already-zeroed memory blocks, if anyone else needs this functionality: https://github.com/fitzgen/deallocate-zeroed

@CAD97
Copy link

CAD97 commented Jan 21, 2025

A Vec-like data structure could keep track of a max-len high water mark, for example, and then only zero the data below that line,

This isn't particularly composable either, as now your structure is doing more work to zero the used memory region even for allocators that just treat the entire dealloc as uninitialized. So what you need is something more like realloc, in that you pass both the alloc layout and more info, in this case the subregions which need to be zeroed.


I think your use case would be best served by something like the provide_any mechanism that's being tried in std for extensible errors and async contexts, to be able to access additional extended functionality on the standard allocator abstraction. For global allocation specifically you could solve it by having a wrapped #[global_allocator]-like resource that provides the additional API, but localized allocators can't really have extensions attached like that, unfortunately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants