-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize in-memory representation of string and datetime columns #764
base: master
Are you sure you want to change the base?
Conversation
…erialize string column protobuf msg into packed byte array with offsets.
@jingjingwang: So I've noticed that although I can avoid copying from a protobuf msg by stealing the |
I think saving one direction is definitely better than not saving :) Plus the code looks neat so I don't see any reason not to do it. I also have concern on using a protobuf specialized datatype in |
I now think I should generalize the issue/PR into creating zero-copy (buffer-stealing) versions of all column types, so a column of any type can be serialized/deserialized from a protobuf msg with no buffer copy (by keeping all data in a |
Sounds like it's going to be a big change! I'm not completely sure how the code will look like after the zero-copy change, but if it's doable then looks great. One concern is how do we handle |
@jingjingwang what do you think of this PR after a year? :) How high a priority do you think it should be to avoid storing objects in String and DateTime columns? Not sure what memory savings this would really yield, but it seems like memory usage is currently a big issue for Myria users. |
According to the discussion above, we were planing to add zero-copy to this PR. Could you add this improvement (if you plan to) and also clean the PR up? I can review it after it's up-to-date. |
This will proceed in two stages:
STRING_TYPE
orDATETIME_TYPE
columns into new packedColumn
implementationsColumnBuilder
implementations that build these packedColumn
implementations, possibly adding a new append-onlyColumnBuilder
interface (since the packed string representation can't efficiently replace/swap by row index)Fixes #762