Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi repo support #70

Open
a-bendoraitis opened this issue Oct 18, 2021 · 3 comments
Open

Multi repo support #70

a-bendoraitis opened this issue Oct 18, 2021 · 3 comments

Comments

@a-bendoraitis
Copy link

Right now rootPath configuration expects single path where git repository is located. I have multiple git repositories and it would be very useful to gather data from all of them.

Maybe I overlooked some of the configuration options and it is possible now? I can also contribute with a pull request and turn rootPath gulpfile option into an array, but thats a breaking change and I might just fork it for myself

@smontanari
Copy link
Owner

smontanari commented Oct 21, 2021

I can try to interpret your request in two ways:

  1. You're after some form of parallelisation, i.e. you want to run the same analysis in parallel on multiple repos to speed up your investigative process.
  2. You're after analysis that perform data mining considering commits from multiple code bases.

Option 1 is just a matter of parallel execution of commands that can be achieved in a variety of ways without having to necessarily change code-forensics.
Option 2 is a completely different beast. Supporting analyses across multiple repositories is an ambitious goal, but unfortunately it might require more than just accepting an array of root paths. The idea would be to apply the same algorithms of the existing analyses across commit data gathered from multiple projects, to infer potential issues caused by hidden couplings between the corresponding code bases. The problem though is that all the current analyses are based on the data collected by running git log commands, and, as far as I know, you cannot run git log across multiple repos at once. So ideally the work should require collecting and somehow merging commit information from different repositories. Something to think about but not as straightforward to implement.

@a-bendoraitis
Copy link
Author

I'm looking at the second way. For my use case, I don't really care about hidden couplings, just - run analysis on different repos and present all the data in single report, for example in hot spots - all repos could have their own blob, side to side. That's why I'm thinking about rootPaths array, I don't need anything too complicated.

I tried merging my repos into one, preserving all of the logs, but I couldn't make it to work

@smontanari
Copy link
Owner

smontanari commented Oct 26, 2021

present all the data in single report

That is not possible for the same reasons I described above, i.e. the reports are pretty much a data mining exercise over the information contained in git log outputs, and such outputs contain data that is relative to one repository only.

This sort of feature has been on my mind for some time, because I do understand its potential benefit, especially as we move towards more distributed codebases. However, it'd require my full attention to assess its feasibility and necessary code changes, and unfortunately I don't have much time now.

I'm not going to close this issue for the moment, but only so I can see it here as a reminder of a desirable feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants