Skip to content
/ summer Public

🔥 High-performance utility for generating checksums in parallel

License

Notifications You must be signed in to change notification settings

utilyre/summer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

summer
license downloads issues version

High-performance utility for generating checksums in parallel.

Installation

  • Latest Release

  • Manual

    go install github.com/utilyre/summer/cmd/summer@latest

Usage

For starters, you can run the generate command on any file:

$ summer generate foo bar
764efa883dda1e11db47671c4a3bbd9e  foo
081ecc5e6dd6ba0d150fc4bc0e62ec50  bar

Add the -r flag to generate checksums for directories recursively:

$ summer generate -r bar nested
081ecc5e6dd6ba0d150fc4bc0e62ec50  bar
168065a0236e2e64c9c6cdd086c55f63  nested/baz

To utilize more cores of your CPU, pass --open-file-jobs=n and --digest-jobs=m flags, where n and m are the number of jobs used for each task respectively.

Run summer help generate to learn more about different flags.

API

It is possible to call the API of this utility directly in your own application. Here's an example:

package main

import (
	"context"
	"log"

	"github.com/utilyre/summer/pkg/summer"
)

func main() {
	s, err := summer.New()
	if err != nil {
		log.Fatal(err)
	}

	checksums, err := s.Sum(context.TODO(), "foo", "bar")
	if err != nil {
		log.Fatal(err)
	}

	for cs := range checksums {
		if cs.Err != nil {
			log.Println(result.Err)
			continue
		}

		// use cs
	}
}

For more information visit the Documentation.

How It Works

This CLI tool efficiently calculates checksums by processing files in parallel using a multi-stage pipeline. The pipeline architecture allows different tasks to be processed concurrently, optimizing performance. The process consists of the following stages:

  1. Walk: The tool recursively scans the given files and directories, identifying all regular files. Each discovered file name is then passed to the next stage.

  2. Open File: The file names are used to open the corresponding files, producing data streams for their contents. These streams are then sent to the next stage.

  3. Digest: Each data stream is processed to compute its hash.

Finally, all the calculated hashes are aggregated and streamed as checksums.

The pipeline package handles data flow and parallel execution, ensuring efficient processing. The Data Flow Diagram (DFD) below visually represents the process:

Data Flow Diagram

By leveraging this pipeline-based approach, the tool efficiently computes checksums for large sets of files with minimal overhead.

License

This project is licensed under the Apache License 2.0. You are free to use, modify, and distribute it under the terms of this license.