Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance Memory Allocator to support allocator for just i/o(data) memory #927

Closed
leodotcloud opened this issue Sep 1, 2022 · 4 comments
Closed
Labels
feature-request A feature should be added or improved. needs-review This issue or pull request needs review from a core team member.

Comments

@leodotcloud
Copy link

Currently, the SDK provides the ability to specify a custom memory allocator. The same allocator is used for generic data structure allocations as well as for actual data transfers. This feature request is for adding another allocator which can be used for allocating memory for actual data transfers. For example, let's when using S3, memory for the file transfers uses this new allocator.

@graebm
Copy link
Contributor

graebm commented Sep 1, 2022

What problem are you having? Are you trying to reduce memory fragmentation by pooling the large I/O allocations? Are you trying to use different allocation schemes for large vs small allocations? Something else?

In our low-level I/O code, we do pool the 16KB buffers that get passed back and forth between the between the HTTP, TLS, and Socket layers. The regular allocator is used to create these pools, but we re-use the buffers for the lifetime of the connection.

But aws-c-s3 introduces a lot of its own buffering before sending things to the low-level HTTP API, and it isn't using pools.

It seems to me that the obvious solution is for aws-c-s3 to do more pooling.

We'd be very hesitant to add complexity by introducing the concept of a 2nd allocator that users need to pass in. I'm also hesitant to tell every customer that wants good performance that they need to write their own custom allocator. I'd prefer that aws-c-s3 just be smarter about how it's doing allocations (i.e use pools).

@leodotcloud
Copy link
Author

We have our own Custom Memory Manager (MM) specifically optimized for data movements and using it as the default allocator burns up the entire pool due to non-data allocations. Looking at the code, I couldn't see any obvious way to at least identify which of the calls are data related. Do you think at least passing an identifying flag for the allocations is a reasonable approach? That way, we can pass a single allocator but within our MM, we can hand off smaller allocations to system MM.

@graebm
Copy link
Contributor

graebm commented Sep 9, 2022

Refactoring how our allocators work would be a large revamp that we're not able to undertake right now. I can suggest some hacks/patches you could try locally to get the effect you want.

Hack 1 - branch on size

This doesn't require any changes to SDK code. In your custom allocator, just check the size of the allocation, and assume anything small is a datastructure, while anything big is data transfer. (I don't know exact numbers to pick, 1KB seems like a good starting guess)

Hack 2 - thread-local flag for allocation-types

This does require changes to SDK code. It would be a pain to change the ACTUAL allocator API to add an extra argument. So instead, use a thread-local variable to indicate the type of allocation being performed on a particular thread, like:

static AWS_THREAD_LOCAL enum aws_alloctype tl_aws_alloctype;

void aws_set_alloctype(enum aws_alloctype type) {
    tl_aws_alloctype = type;
}

enum aws_alloctype aws_get_alloctype(void) {
    return tl_aws_alloctype;
}

Then, find places that are allocating I/O data and make sure the flag is set before they allocate stuff. memory_pool.c is one such place (see here and here). Just make edits like:

    aws_set_alloctype(AWS_ALLOCTYPE_IODATA); // <-- add this
    void *memory = aws_mem_acquire(alloc, segment_size);
    aws_set_alloctype(AWS_ALLOCTYPE_DEFAULT); // <-- add this

In aws-c-s3, you'll need to discover which places are making big IO allocations and add the types (add breakpoints or print stack traces whenever large allocation is made to find the offending call sites), and make sure the alloctype is set while those calls are being made.

@graebm graebm added the feature-request A feature should be added or improved. label Sep 9, 2022
@jmklix jmklix added the needs-review This issue or pull request needs review from a core team member. label Sep 6, 2023
@jmklix
Copy link
Member

jmklix commented Sep 6, 2023

We appreciate your feedback and suggestion. This is not something that we are planning on currently supporting, please take a look at the workarounds that graebm posted above.

@jmklix jmklix closed this as completed Sep 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request A feature should be added or improved. needs-review This issue or pull request needs review from a core team member.
Projects
None yet
Development

No branches or pull requests

3 participants