Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

archive pseudo-vcs driver: indexing code in archives (e.g. zip, tar) without extracting files #484

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

muravjov
Copy link

@muravjov muravjov commented May 21, 2024

What kind of change does this PR introduce? (check at least one)

  • Bugfix
  • Feature
  • Code style update
  • Refactor
  • Build-related changes
  • Other, please describe:

The PR fulfills these requirements:

  • All tests are passing?
  • New/updated tests are included?
  • If any static assets have been updated, has ui/bindata.go been regenerated?
  • Are there doc blocks for functions that I updated/created?

If adding a new feature, the PR's description includes:

  • A convincing reason for adding this feature (to avoid wasting your time, it's best to open a suggestion issue first and wait for approval before working on it)

Description:

This PR adds a new driver archive, which allows to index source code in archives (e.g. zip, tar; any that supported by https://github.com/mholt/archiver) without extracting files: while indexing, files are walked using archive API, and while searching, results are checked and snippets generated with files extracted on the fly.

A config example:

{
  "dbpath" : "db",
  "vcs-config" : {
    "git": {
      "ref" : "main"
    }
  },
  "repos" : {
    "video" : {
      "url" : "/Volumes/1tb-ext4/twitch/video.zip",
      "vcs" : "archive",
      "vcs-config" : {
        "ignored-files" : [".git"]
      },
      "url-pattern" : {
        "base-url" : "file:///Volumes/1tb-ext4/src/twitch/{path}"
      }
    }
  }
}

Some metrics:

  • for 160 zip files, 126GB, I got 3GB of indexes
  • it takes about 13 seconds for a search request to execute

@muravjov
Copy link
Author

muravjov commented Jun 1, 2024

@salemhilal
would you mind to review the PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants