-
-
Notifications
You must be signed in to change notification settings - Fork 21.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Significantly reduce per-frame memory allocations from the heap in the Mobile renderer #103794
Conversation
u.binding = 6; | ||
u.uniform_type = RD::UNIFORM_TYPE_TEXTURE; | ||
Vector<RID> textures; | ||
textures.resize(scene_state.max_lightmaps * 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this function is called a lot, you probably want to use a stack fixed array. Basically a fancy fixed C array with all the usual machinery for push_back
etc.
I have a basic one I wrote for 3.x (core/fixed_array.h
) but you can equally well modify LocalVector
template to be capable of storing on the stack (let me know if this is of interest, I recently did this for a third party module).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see it's getting pass to RD::Uniform below and stored there in which case maybe this allocation is unavoidable. 🙁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the better option here is just to track the textures that may have changed and only recreate the Uniforms array when we know there will be a change.
Basically this function recreates the array of Uniforms every frame and then indexes into a hash map to see if we have cached this uniform set or not. Instead, with minimal tracking, we can just check if any Uniform actually changed, then only run this code when it has. Realistically, this won't run most frames.
But that is a riskier change to make, so I'd like to do it in a follow up PR as this gives us 99% of the benefit and is totally safe
I'm seeing a lot of Is it possible to just create one and share it? Yes, thinking about it, in the longterm, I suspect that if such a This suggests longterm having a file somewhere in e.g. the renderer with a bunch of these |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not super familiar with the renderer in 4.x but this looks ok to me from quick look.
Obviously as you say there are more improvements to come, but perfect is the enemy of good enough as they say, and it's an incremental thing.
I agree. Long term we need to weigh the costs of making more drastic changes with how much benefit we actually get. The renderer is now doing about 20,000 allocations per second with this test scene. That means our upper bound for improvement is half of what I just did. So a very drastic solution may not be warranted. Most of the remaining allocations come from one specific function too (draw_list_begin). Once we address that case it may not be beneficial to fix everything else That being said, I agree it's worth investigating how we can avoid using thread_local vectors everywhere. That comes with its own cost |
Thanks! |
The aim of this PR is to reduce memory allocations, not necessarily to increase performance. But it does have a nice performance increase as well.
In last week's core meeting we agreed that we should work towards reducing per-frame memory allocations as much as possible. Both for performance and stability.
Results using the Legend of the Nuku Warriors demo with tons of omnilights.
Before: ~95,000 allocations per second: 31-32 mspf

After: ~45,000 allocations per second: 29-30 mspf

The most important changes here are:
push_back()
uniform_set_create()
allocates a lot of memory.There is a lot more that can be done. But this PR contains a very safe, very impactful set of optimizations. So I would prefer to merge this quickly and then to move on to the other places.