summer

High-performance utility for generating checksums in parallel, capable of handling a 5GB database in less than 2 seconds.

Installation

Latest Release

Manual

go install github.com/utilyre/summer/cmd/summer@latest

Usage

For starters, you can run the generate command on any file:

$ summer generate foo bar
764efa883dda1e11db47671c4a3bbd9e  foo
081ecc5e6dd6ba0d150fc4bc0e62ec50  bar

Add the -r flag to generate checksums for directories recursively:

$ summer generate -r bar nested
081ecc5e6dd6ba0d150fc4bc0e62ec50  bar
168065a0236e2e64c9c6cdd086c55f63  nested/baz

To utilize more cores of your CPU, pass --open-file-jobs=n and --digest-jobs=m flags, where n and m are the number of jobs used for each task respectively.

Run summer help generate to learn more about different flags.

API

It is possible to call the API of this utility directly in your own application. Here's an example:

package main

import (
	"context"
	"log"

	"github.com/utilyre/summer/pkg/summer"
)

func main() {
	s, err := summer.New()
	if err != nil {
		log.Fatal(err)
	}

	checksums, err := s.Sum(context.TODO(), "foo", "bar")
	if err != nil {
		log.Fatal(err)
	}

	for cs := range checksums {
		if cs.Err != nil {
			log.Println(result.Err)
			continue
		}

		// use cs
	}
}

For more information visit the Documentation.

How It Works

This CLI tool efficiently calculates checksums by processing files in parallel using a multi-stage pipeline. The pipeline architecture allows different tasks to be processed concurrently, optimizing performance. The process consists of the following stages:

Walk: The tool recursively scans the given files and directories, identifying all regular files. Each discovered file name is then passed to the next stage.
Open File: The file names are used to open the corresponding files, producing data streams for their contents. These streams are then sent to the next stage.
Digest: Each data stream is processed to compute its hash.

Finally, all the calculated hashes are aggregated and streamed as checksums.

The pipeline package handles data flow and parallel execution, ensuring efficient processing. The Data Flow Diagram (DFD) below visually represents the process:

By leveraging this pipeline-based approach, the tool efficiently computes checksums for large sets of files with minimal overhead.

License

This project is licensed under the Apache License 2.0. You are free to use, modify, and distribute it under the terms of this license.

Name		Name	Last commit message	Last commit date
Latest commit History 170 Commits
.github/workflows		.github/workflows
_bench		_bench
cmd/summer		cmd/summer
docs		docs
internal/cli		internal/cli
pkg		pkg
.cliff.toml		.cliff.toml
.fio.ini		.fio.ini
.gitignore		.gitignore
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

summer

Installation

Usage

API

How It Works

License

About

Releases 9

Languages

License

utilyre/summer

Folders and files

Latest commit

History

Repository files navigation

summer

Installation

Usage

API

How It Works

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 9

Languages