Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High Memory Usage of built_value_generator to due memoization + keeping all SerializerSourceClass objects in memory #1288

Open
knaeckeKami opened this issue Dec 14, 2023 · 6 comments
Assignees

Comments

@knaeckeKami
Copy link
Contributor

knaeckeKami commented Dec 14, 2023

Hi, it's me again with issues that pop up due to ferry using built_value for serialization of generated graphql code, which can get quite big ;)

built_value_generator keeps a list of SerializerSourceClass for a SerializerSourceLibrary.

So all the SerializerSourceClass objects are kept in memory, and they use memoization to keep e.g. their parsedLibrary and SerializerSourceField fields cached.

This leads to very high memory usage in cases where there are a very high number of serializers.

For reference: gql-dart/ferry#558 (comment)

This can probably be fixed with little changes to the code, e.g. be copying the currently active SerializerSourceClass into a new object which can is not held in a list after the code generation is done
(e.g. sourceClasses.map((clazz) => clazz.rebuild((b) => b)), so the objects with memoized fields are eligible for garbage collection after work is finished for that class.

@davidmorgan
Copy link
Collaborator

Thanks Martin.

Sounds good to me; not sure when I can take a look, I'll be working next week then on vacation for two weeks; so if it doesn't happen this year it'll be after 8th Jan.

@davidmorgan
Copy link
Collaborator

I looked into this a bit, checking how often the various built_value pieces get instantiated; I didn't find anything obviously wrong.

The underlying memory model is owned by the analyzer, so it's not super clear what is expensive and what is cheap.

I tried the flutter_ferry_unions_issue repro, what would really help is if there is a way I can quickly increase/decrease the input size, so e.g. I can keep increasing 10x until something breaks, and see if memory use is linear or worse than linear in the input size. Is there any obvious way to do that please? :)

The parsedLibrary is I think always used to extract only a small amount of data, so if that is indeed the problem then it should be possible to just immediately extract that data and discard the parsedLibrary.

Thanks.

@knaeckeKami
Copy link
Contributor Author

knaeckeKami commented Dec 18, 2023

Thanks!

I played around with the repro a bit, here is my fork with a copy.py script in lib/graphql. It will copy the same query into N files with a different query name which allows for testing with different input sizes.

Around ~250 increasing the number of files seems to really take a toll on build time.

N=250:
[INFO] Build:Succeeded after 2m 28s with 2530 outputs (3035 actions)

N=500
[INFO] Build:Succeeded after 15m 1s with 5030 outputs (6035 actions)

I wiped .dart_tool and the generated files from disk between runs to ensure clean runs without caching.

I did not dig into what causes these non-linear build times however, and did not use any of the profiling options of build_runner yet.

My initial suspicious from the stack trace of the user was that keeping all the parsedLibrary objects for all SerializerSourceClass objects might cause issues, but I also don't have a full repro of that issue yet.

Here is my fork: https://github.com/knaeckeKami/flutter_ferry_unions_issue

@davidmorgan
Copy link
Collaborator

Thanks! That should help. I experimented with a bit, but still didn't get to the answer--and we're out of time for 2023. See you in 2024 ;)

@knaeckeKami
Copy link
Contributor Author

Sure! Merry Christmas and have a nice vacation ;)

@knaeckeKami
Copy link
Contributor Author

knaeckeKami commented Jan 14, 2024

I dug a little deeper into this issue, the worse-than-linear time complexity is due to the resolver.libraryFor calls, which get much slower as more generated classes exist.

I suspect this is due to the bigger serializers.dart file, since every Built class references it.

also, source_gen|combining_builder takes up a large portion of the build time and gets much slower on more generated classes, possibly for similar reasons.

Unfortunately, this behavior seems to be inherent to the architecture of built_value and not easy to change.

I could not reproduce the out-of-memory errors that were reported to me, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants