-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configurable HAMT tree bitwidth #32
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we really want less than 8 bits per level, let's just remember the bit offset. Take a look at hashBits from https://github.com/ipfs/go-unixfs/blob/25a8acedf8c65251dd15a39f9125dbb8973ab5dc/hamt/util.go
This isn't really a new hash function.
hamt.go
Outdated
|
||
// UseHashFunction allows you to use a custom function | ||
// to hash the keys in a hamt node | ||
func UseHashFunction(hash HashFunction) Option { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to record the chosen hash function in the node. We can probably just use go-multihash (but use only the hash digest as the key).
hamt.go
Outdated
|
||
// UseTreeBitWidth allows you to set the width of the HAMT tree | ||
// in bits (from 1-8) via a customized hash function | ||
func UseTreeBitWidth(bitWidth uint8) Option { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really can't see why one would want to use less than 8 bits on a level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(@whyrusleeping corrected me here; this is a very different use-case from IPFS).
We should probably reuse the existing code we have that does this over in the other HAMT impl: https://github.com/ipfs/go-unixfs/blob/master/hamt/util.go @Stebalien we likely want to use a width of 32 (bitwidth 5), or maybe smaller. The delta per change is quite high with higher widths. In any case, to choose a particular width, we need to run some tests on total footprint of objects created during a realistic workload. |
Interesting. I guess you're dealing with a highly mutable datastructure while IPFS directories tend not to be. |
Yeah, we get hundreds of mutations per block, minimizing the impact of that is pretty high priority. |
We have exactly the same reasons and usage in Peergos, and we went with bit width of 3, and 4 max hash collisions per level (which we also have as configurable) after profiling data usage and churn with different parameters. |
@Stebalien @whyrusleeping I will reimplement with the bitreader but I wonder if it makes sense to either copy to a separate package or to this package — feels weird to bring in something from go-Unixfs here (feels like a higher level of the stack to me) |
I'd copy it ere. Eventually, we need to converge around a single canonical HAMT implementation so go-unixfs will likely end up depending on this package. |
@Stebalien sounds good. |
a177105
to
25acf8e
Compare
@Stebalien this has been redone to use hashBits. Can I get a +1 so I can merge? Results continue to be promising:
|
Though also, I wonder if we should record the fanout in the CBOR itself as is done in go-unixfs. |
Add a variadic options structure to the Node constructors
adds one more option that allows you to configure bitwidth, plus some benchmarks for alternate width breakup
copy over hashbits utility class from go-unixfs for reading different bitwidth lengths
allow use of alternate bitwidths by making use of hashbits reader
25acf8e
to
c51764d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, good stuff. I'd say lets merge this now and press on with future benchmarks
Goals
See #31 -- allow people to configure the bitWidth of the HAMT tree
Implementation
Todo
For Discussion
So initial benchmarks are highly promising... I'm curious if anyone else can replicate on their machine:
Not the diff in memory allocations and ns/op
I dunno what this means... but super exciting though it makes me skeptical!
FYI: @anorth @acruikshank