doc: add example of SlimBytes; add desc of SlimBytes to README

openacid · Jan 16, 2021 · 1f00019 · 1f00019
1 parent 31ec831
commit 1f00019
Show file tree

Hide file tree

Showing 5 changed files with 210 additions and 1 deletion.
diff --git a/.github/settings.yml b/.github/settings.yml
@@ -3,6 +3,8 @@ _extends: gh-config
 
 repository:
   name: slimarray
-  description: SlimArray compresses uint32 into several bits, by using a polynomial to describe overall trend of an array.
+  description: |
+    SlimArray compresses uint32 into several bits, by using a polynomial to describe overall trend of an array.
+    SlimBytes use SlimArray to index a record array, to reduce memory overhead.
   homepage: https://openacid.github.io/
   topics: go, golang, memory, compacted, compress, array, space
diff --git a/README.md b/README.md
@@ -18,6 +18,11 @@ With a SlimArray with a million sorted number in range `[0, 1000*1000]`,
 - reading a `uint32` with `Get()` takes **7 ns**.
 - batch reading with `Slice()` takes **3.8 ns**/elt.
 
+SlimBytes is an array of var-length records(a record is a `[]byte`), which is indexed by SlimArray.
+Thus the memory overhead of storing `offset` and `length` of each record is very low, e.g., about **8 bits/record**,
+compared to a typical implementation that uses an offset of type int(`32 to 64 bit / record`).
+An `Get()` takes **15 ns**.
+
 中文介绍: [https://blog.openacid.com/algo/slimarray/](https://blog.openacid.com/algo/slimarray/)
 
 <!-- START doctoc generated TOC please keep comment here to allow auto update -->
@@ -30,6 +35,7 @@ With a SlimArray with a million sorted number in range `[0, 1000*1000]`,
 - [Install](#install)
 - [Synopsis](#synopsis)
   - [Build a SlimArray](#build-a-slimarray)
+  - [Build a SlimBytes](#build-a-slimbytes)
 - [How it works](#how-it-works)
     - [The General Idea](#the-general-idea)
     - [What It Is And What It Is Not](#what-it-is-and-what-it-is-not-1)
@@ -149,6 +155,52 @@ func ExampleSlimArray() {
 }
 ```
 
+
+## Build a SlimBytes
+
+```go
+package slimarray_test
+
+import (
+	"fmt"
+
+	"github.com/openacid/slimarray"
+)
+
+func ExampleSlimBytes() {
+
+	records := [][]byte{
+		[]byte("SlimBytes"),
+		[]byte("is"),
+		[]byte("an"),
+		[]byte("array"),
+		[]byte("of"),
+		[]byte("var-length"),
+		[]byte("records(a"),
+		[]byte("record"),
+		[]byte("is"),
+		[]byte("a"),
+		[]byte("[]byte"),
+		[]byte("which"),
+		[]byte("is"),
+		[]byte("indexed"),
+		[]byte("by"),
+		[]byte("SlimArray"),
+	}
+
+	a, err := slimarray.NewBytes(records)
+	_ = err
+
+	for i := 0; i < 16; i++ {
+		fmt.Print(string(a.Get(int32(i))), " ")
+	}
+	fmt.Println()
+
+	// Output:
+	// SlimBytes is an array of var-length records(a record is a []byte which is indexed by SlimArray
+}
+```
+
 # How it works
 
 Package slimarray uses polynomial to compress and store an array of uint32. A

diff --git a/docs/README.md.j2 b/docs/README.md.j2
@@ -10,6 +10,11 @@ With a SlimArray with a million sorted number in range `[0, 1000*1000]`,
 - reading a `uint32` with `Get()` takes **7 ns**.
 - batch reading with `Slice()` takes **3.8 ns**/elt.
 
+SlimBytes is an array of var-length records(a record is a `[]byte`), which is indexed by SlimArray.
+Thus the memory overhead of storing `offset` and `length` of each record is very low, e.g., about **8 bits/record**,
+compared to a typical implementation that uses an offset of type int(`32 to 64 bit / record`).
+An `Get()` takes **15 ns**.
+
 中文介绍: [https://blog.openacid.com/algo/slimarray/](https://blog.openacid.com/algo/slimarray/)
 
 <!-- START doctoc generated TOC please keep comment here to allow auto update -->
@@ -78,6 +83,13 @@ go get github.com/openacid/slimarray
 {% include 'example_slimarray_test.go' %}
 ```
 
+
+## Build a SlimBytes
+
+```go
+{% include 'example_slimbytes_test.go' %}
+```
+
 # How it works
 
 {% include 'docs/slimarray-package.md' %}
diff --git a/docs/slimarray.md b/docs/slimarray.md
@@ -147,6 +147,13 @@ SlimArray compact `Seg` into a dense format:
 
 ## Usage
 
+```go
+var (
+	BytesTooLarge = errors.New("total bytes exceeds max value of uint32")
+	TooManyRows   = errors.New("row count exceeds max value of int32")
+)
+```
+
 ```go
 var File_slimarray_proto protoreflect.FileDescriptor
 ```
@@ -218,6 +225,16 @@ Get returns the uncompressed uint32 value. A Get() costs about 7 ns
 
 Since 0.1.1
 
+#### func (*SlimArray) Get2
+
+```go
+func (sm *SlimArray) Get2(i int32) (uint32, uint32)
+```
+Get2 returns two uncompressed uint32 value at i and i + 1. A Get2() costs about
+15 ns.
+
+Since 0.1.4
+
 #### func (*SlimArray) GetBitmap
 
 ```go
@@ -317,3 +334,89 @@ Since 0.1.1
 ```go
 func (x *SlimArray) String() string
 ```
+
+#### type SlimBytes
+
+```go
+type SlimBytes struct {
+
+	// Positions is the array of start position of every record.
+	// There are n + 1 int32 in it.
+	// The last one equals len(Records)
+	Positions *SlimArray `protobuf:"bytes,21,opt,name=Positions,proto3" json:"Positions,omitempty"`
+	// Records is byte slice of all record packed together.
+	Records []byte `protobuf:"bytes,22,opt,name=Records,proto3" json:"Records,omitempty"`
+}
+```
+
+SlimBytes is a var-length []byte array.
+
+Internally it use a SlimArray to store record positions. Thus the memory
+overhead is about 8 bit / record.
+
+Since 0.1.4
+
+#### func  NewBytes
+
+```go
+func NewBytes(records [][]byte) (*SlimBytes, error)
+```
+NewBytes creates SlimBytes, which is an array of byte slice, from a series of
+records.
+
+Since 0.1.14
+
+#### func (*SlimBytes) Descriptor
+
+```go
+func (*SlimBytes) Descriptor() ([]byte, []int)
+```
+Deprecated: Use SlimBytes.ProtoReflect.Descriptor instead.
+
+#### func (*SlimBytes) Get
+
+```go
+func (b *SlimBytes) Get(i int32) []byte
+```
+Get the i-th record.
+
+
+A Get costs about 17 ns
+
+Since 0.1.14
+
+#### func (*SlimBytes) GetPositions
+
+```go
+func (x *SlimBytes) GetPositions() *SlimArray
+```
+
+#### func (*SlimBytes) GetRecords
+
+```go
+func (x *SlimBytes) GetRecords() []byte
+```
+
+#### func (*SlimBytes) ProtoMessage
+
+```go
+func (*SlimBytes) ProtoMessage()
+```
+
+#### func (*SlimBytes) ProtoReflect
+
+```go
+func (x *SlimBytes) ProtoReflect() protoreflect.Message
+```
+
+#### func (*SlimBytes) Reset
+
+```go
+func (x *SlimBytes) Reset()
+```
+
+#### func (*SlimBytes) String
+
+```go
+func (x *SlimBytes) String() string
+```
diff --git a/example_slimbytes_test.go b/example_slimbytes_test.go
@@ -0,0 +1,40 @@
+package slimarray_test
+
+import (
+	"fmt"
+
+	"github.com/openacid/slimarray"
+)
+
+func ExampleSlimBytes() {
+
+	records := [][]byte{
+		[]byte("SlimBytes"),
+		[]byte("is"),
+		[]byte("an"),
+		[]byte("array"),
+		[]byte("of"),
+		[]byte("var-length"),
+		[]byte("records(a"),
+		[]byte("record"),
+		[]byte("is"),
+		[]byte("a"),
+		[]byte("[]byte"),
+		[]byte("which"),
+		[]byte("is"),
+		[]byte("indexed"),
+		[]byte("by"),
+		[]byte("SlimArray"),
+	}
+
+	a, err := slimarray.NewBytes(records)
+	_ = err
+
+	for i := 0; i < 16; i++ {
+		fmt.Print(string(a.Get(int32(i))), " ")
+	}
+	fmt.Println()
+
+	// Output:
+	// SlimBytes is an array of var-length records(a record is a []byte which is indexed by SlimArray
+}