Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable HAMT tree bitwidth #32

Merged
merged 4 commits into from
Sep 12, 2019

Conversation

hannahhoward
Copy link
Contributor

Goals

See #31 -- allow people to configure the bitWidth of the HAMT tree

Implementation

  • Create hashing functions for different bitWidths (both generalized and a few probably over optimized -- see hash.go / hash_test.go)
  • Create variadic options structure when initializing HAMT nodes to support custom hashing functions
  • Add option that supports varying bitWidth via a custom hashing function
  • Add benchmarks that test with a different bitWidth (5 -- or 32 tree width)

Todo

  • Did not know there was a HashMap spec and per @rvagg need to look into that

For Discussion

So initial benchmarks are highly promising... I'm curious if anyone else can replicate on their machine:

goos: darwin
goarch: amd64
pkg: github.com/ipfs/go-hamt-ipld
BenchmarkSerializeNode-4   	    5000	    352626 ns/op	   98588 B/op	    2236 allocs/op
BenchmarkFind/find-10k-4   	    2000	    652365 ns/op	  150762 B/op	    4039 allocs/op
BenchmarkFind/find-100k-4  	    1000	   1602449 ns/op	  362829 B/op	   11049 allocs/op
BenchmarkFind/find-1m-4    	    2000	   1048900 ns/op	  247089 B/op	    6232 allocs/op
BenchmarkFind/find-10k-bitwidth-5-4         	   10000	    185219 ns/op	   41811 B/op	    1152 allocs/op
BenchmarkFind/find-100k-bitwidth-5-4        	    5000	    312501 ns/op	   70285 B/op	    2049 allocs/op
BenchmarkFind/find-1m-bitwidth-5-4          	    5000	    310980 ns/op	   71084 B/op	    2002 allocs/op
PASS
ok  	github.com/ipfs/go-hamt-ipld	157.022s

Not the diff in memory allocations and ns/op

I dunno what this means... but super exciting though it makes me skeptical!

FYI: @anorth @acruikshank

Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we really want less than 8 bits per level, let's just remember the bit offset. Take a look at hashBits from https://github.com/ipfs/go-unixfs/blob/25a8acedf8c65251dd15a39f9125dbb8973ab5dc/hamt/util.go

This isn't really a new hash function.

hamt.go Outdated

// UseHashFunction allows you to use a custom function
// to hash the keys in a hamt node
func UseHashFunction(hash HashFunction) Option {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to record the chosen hash function in the node. We can probably just use go-multihash (but use only the hash digest as the key).

hamt.go Outdated

// UseTreeBitWidth allows you to set the width of the HAMT tree
// in bits (from 1-8) via a customized hash function
func UseTreeBitWidth(bitWidth uint8) Option {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really can't see why one would want to use less than 8 bits on a level.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(@whyrusleeping corrected me here; this is a very different use-case from IPFS).

@whyrusleeping
Copy link
Member

We should probably reuse the existing code we have that does this over in the other HAMT impl: https://github.com/ipfs/go-unixfs/blob/master/hamt/util.go

@Stebalien we likely want to use a width of 32 (bitwidth 5), or maybe smaller. The delta per change is quite high with higher widths.

In any case, to choose a particular width, we need to run some tests on total footprint of objects created during a realistic workload.

@Stebalien
Copy link
Member

we likely want to use a width of 32 (bitwidth 5), or maybe smaller. The delta per change is quite high with higher widths.

Interesting. I guess you're dealing with a highly mutable datastructure while IPFS directories tend not to be.

@whyrusleeping
Copy link
Member

Yeah, we get hundreds of mutations per block, minimizing the impact of that is pretty high priority.

@ianopolous
Copy link
Contributor

Yeah, we get hundreds of mutations per block, minimizing the impact of that is pretty high priority.

We have exactly the same reasons and usage in Peergos, and we went with bit width of 3, and 4 max hash collisions per level (which we also have as configurable) after profiling data usage and churn with different parameters.

@hannahhoward
Copy link
Contributor Author

@Stebalien @whyrusleeping I will reimplement with the bitreader but I wonder if it makes sense to either copy to a separate package or to this package — feels weird to bring in something from go-Unixfs here (feels like a higher level of the stack to me)

@Stebalien
Copy link
Member

I'd copy it ere. Eventually, we need to converge around a single canonical HAMT implementation so go-unixfs will likely end up depending on this package.

@hannahhoward
Copy link
Contributor Author

@Stebalien sounds good.

@hannahhoward hannahhoward force-pushed the feat/configure-hamt-tree-width-31 branch 2 times, most recently from a177105 to 25acf8e Compare September 10, 2019 03:16
@hannahhoward
Copy link
Contributor Author

@Stebalien this has been redone to use hashBits. Can I get a +1 so I can merge?

Results continue to be promising:

goos: darwin
goarch: amd64
pkg: github.com/ipfs/go-hamt-ipld
BenchmarkSerializeNode-4   	  100000	     21736 ns/op	   19184 B/op	     356 allocs/op
BenchmarkFind/find-10k-4   	   10000	    194995 ns/op	   68187 B/op	    1680 allocs/op
BenchmarkFind/find-100k-4  	    3000	    419470 ns/op	  140259 B/op	    3865 allocs/op
BenchmarkFind/find-1m-4    	    5000	    326649 ns/op	  118301 B/op	    2777 allocs/op
BenchmarkFind/find-10k-bitwidth-5-4         	   20000	     59374 ns/op	   19316 B/op	     488 allocs/op
BenchmarkFind/find-100k-bitwidth-5-4        	   20000	     86398 ns/op	   29796 B/op	     783 allocs/op
BenchmarkFind/find-1m-bitwidth-5-4          	   20000	     89220 ns/op	   31607 B/op	     811 allocs/op
PASS

@hannahhoward
Copy link
Contributor Author

Though also, I wonder if we should record the fanout in the CBOR itself as is done in go-unixfs.

Add a variadic options structure to the Node constructors
adds one more option that allows you to configure bitwidth, plus some benchmarks for alternate width
breakup
copy over hashbits utility class from go-unixfs for reading different bitwidth lengths
allow use of alternate bitwidths by making use of hashbits reader
@hannahhoward hannahhoward force-pushed the feat/configure-hamt-tree-width-31 branch from 25acf8e to c51764d Compare September 11, 2019 20:33
Copy link
Member

@whyrusleeping whyrusleeping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, good stuff. I'd say lets merge this now and press on with future benchmarks

@hannahhoward hannahhoward merged commit 85b7f1d into master Sep 12, 2019
@whyrusleeping whyrusleeping deleted the feat/configure-hamt-tree-width-31 branch February 10, 2020 19:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants