Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

project.py: Add new 'joblib' dependency to parallelize update #713

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

RobertGalatNordic
Copy link

@RobertGalatNordic RobertGalatNordic commented Jun 11, 2024

The projects from manifests were updated one by one. The parallel approach has potential to fully utilize CPU and network connection to speed up the update.

Needs more testing, I checked only few common cases and it seems fine.

Performance tests:
west init -m https://github.com/nrfconnect/sdk-nrf --mr main
time west update

real 14m35.399s
user 6m2.368s
sys 1m34.562s

system and network load during update:
image

Clean up and replace west for version from this PR
rm -rf <.west and all downloaded repositories>
pip3 uninstall west
cd <west_source_dir>
pip3 install -e .
cd ..

repeat the test
west init -m https://github.com/nrfconnect/sdk-nrf --mr main
time west update

real 6m34.408s
user 9m25.977s
sys 2m7.339s

system and network load during parallel update:
image

As we can see, the result is substantially faster. The same operation finishes in less than half of the original time (from 14.5 minutest down to 6.5 minutes).

Copy link

@ktaborowski ktaborowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me

@RobertGalatNordic RobertGalatNordic force-pushed the make_west_update_parallel branch from 3fa370f to 6c1fb2e Compare June 11, 2024 08:32
@RobertGalatNordic RobertGalatNordic changed the title project.py: use asyncio to paralelize west update project.py: paralelize west update Jun 11, 2024
@RobertGalatNordic
Copy link
Author

Another tip for faster fetch would be to use ssh instead of http.
By adding the following config to git it will automatically translate the http to ssh, for github repos.
git config --global url.ssh://git@github.com/.insteadOf https://github.com/

@marc-hb
Copy link
Collaborator

marc-hb commented Jun 11, 2024

There are a few test failures in https://github.com/zephyrproject-rtos/west/actions/runs/9462265663/job/26086666688?pr=713, please take a look at "Running the tests" in the README.rst

Ignore the macOS failures, unrelated. Fixed in #712.

But more importantly, unless I'm confused (please correct me) I believe there was a previous attempt to do this and it had to be reverted because of terminal issues on Windows, see 2nd and abandoned attempt in #551. Were you aware of it? This new #713 attempt looks surprisingly much shorter: sounds too good to be true.

@M1cha can you take a look at this #713?

@RobertGalatNordic RobertGalatNordic force-pushed the make_west_update_parallel branch from 6c1fb2e to da14c09 Compare June 12, 2024 08:24
@RobertGalatNordic
Copy link
Author

I did have a minor copy-paste bug, The tests ware very helpful to catch this.
I rerun the tests locally, and got a green mark. So I hope this time it will be OK.

I looked into the previous attempt, and my approach seems simpler because I used a library that has been designed to be a drop-in replacement to for loops.

Copy link
Collaborator

@marc-hb marc-hb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into the previous attempt, and my approach seems simpler because I used a library that has been designed to be a drop-in replacement to for loops.

Ha! That explains. Re-using a higher-level library: great idea. This makes the PR very short indeed, nice job.

So this is actually the most important change in this PR: the addition of a new dependency! This must always be performed very carefully because it carries maintenance and security risks https://cacm.acm.org/practice/surviving-software-dependencies/

This new dependency will also break all existing users, so its name must be at the very least visible in the commit title (and PR title). Something like: "Add new 'joblib' dependency to parallelize update". (while at it don't forget to fix the spelling of "paralelize")

I had a quick look at joblib and it seems like a serious and actively maintained library; nice pick!

Nevertheless, joblib cannot be immune to weird terminal issues. Does it have workarounds for all Windows quirks? There is only one way to tell: testing. So please state clearly in the commit message all the operating systems and terminals you've tested this with. This must include Powershell and CMD.EXE which is apparently still very important:

Don't forget to make sure colors are still working:

Last but not least, you must provide some -j 1 option. Not just for optimizing and fine-tuning the number of threads but more importantly for the ability to turn that feature completely OFF for two important reasons:

  1. Debugging git configuration and network issues.
  2. As a workaround in case someone has weird terminal or other issues.

I have debugged countless build issues and whether it's make, ninja, west or anything concurrent, the very first step is always -j 1 to stop relevant errors from being drowned by concurrent threads. Even when debugging race conditions or other concurrency issues, -j 1 is still a critical datapoint and reference for comparison purposes.

For those reasons, 1. requires unbuffered git output, see --help string in #551. Can joblib do that? If not then you need to add a "joblib-free" code path. Automatically falling back on this joblib-free code path when the dependency is missing would be the icing on the cake (but not required IMHO)

@marc-hb marc-hb added the performance How long things take label Jun 12, 2024
@marc-hb
Copy link
Collaborator

marc-hb commented Jun 12, 2024

I did have a minor copy-paste bug, The tests ware very helpful to catch this.

Yes, tests save lives. Speaking of which: make sure all the existing tests stick to -j 1 by default. Again, the last thing we want is concurrency when debugging some totally unrelated issue. You should also add at least one test using a parallel update for minimal test coverage.

(pytest should run tests in parallel but that's a completely different story)

For those reasons, 1. requires unbuffered git output, see --help string in #551.

If joblib provides terminal output ONLY after a job is done, then this means west update will now be surprisingly quiet for a possibly long time. This could misled users into thinking they have a network issue or some proxy misconfiguration when git is actually running fine. So now I'm actually wondering whether -j 1 should maybe stay the default, among other benefits it would avoid disorienting old users. It would also make the new dependency "more optional". Either way, when all output is buffered then print a message like "parallel update started, be patient..."

Another tip for faster fetch ...

BTW: performance How long things take

@marc-hb
Copy link
Collaborator

marc-hb commented Jun 12, 2024

Now that memories are coming back I feel like I just started repeating the previous discussions on this topic. To avoid such a repeat, please review all past discussions in #533, #551 and the ones before that. It's a big long sorry but it will be shorter than repeating them.

@RobertGalatNordic RobertGalatNordic changed the title project.py: paralelize west update project.py: Add new 'joblib' dependency to parallelize update Jun 14, 2024
@RobertGalatNordic RobertGalatNordic force-pushed the make_west_update_parallel branch from da14c09 to 7f70536 Compare June 18, 2024 08:58
@RobertGalatNordic
Copy link
Author

RobertGalatNordic commented Jun 18, 2024

rebased on main,
added detection if joblib is available (if not use old single core approach),
added -j [JOBS] parameter to west update ( -j will use all cores, same as -j -1)
when joblib is not installed or -j 1 has been passed or -j has not been passed at all, the old single thread approach with for loop will be used.
Updated tests to add parametrization to test cases

  • -j 1 - same as without -j
  • -j - use all threads
  • -j -1 - use all threads
  • -j 2 - use two cores

@RobertGalatNordic
Copy link
Author

I did not test those changes on Windows, as I do not work on that platform. I think that if there is an issue there, we could remove this feature for that platform, or maybe someone more experienced on Windows could help us. But as I mentioned previously, I used off the shelf library to do the heavy lifting, and it is possible that we could delegate any new issues to joblib project

@marc-hb marc-hb dismissed their stale review June 18, 2024 17:13

stale review

Copy link
Collaborator

@marc-hb marc-hb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I didn't review in detail yet but this is looking better and better.

All tests are failing, hopefully only codestyle issues. Can you please fix? Do you have any issue running tox on your system? I added some tox tips in the README.md a while ago.

I did not test those changes on Windows, as I do not work on that platform.

I'm sorry but you must find a Windows system or "Windows friend" to smoke test this PR with at least -j1 and not, in both CMD.EXE and Powershell. The previous attempt went through a couple rounds of reverts and we really don't want to do that again. Maybe ask on Discord? There is a #west channel there; also try all the CI and infrastructure channels. People there may not use Windows but they may desire this enough to go and find a "Windows friend" of theirs :-)

I think that if there is an issue there, we could remove this feature for that platform,

Yes: for instance we could default to -j1 on Windows, at least initially! This would make it so much easier to find "Windows friends" willing to test it: now they don't have to fetch a PR to test. This is exactly what I strongly advocated in the previous attempt but I was unfortunately overruled.

src/west/app/project.py Outdated Show resolved Hide resolved
@RobertGalatNordic RobertGalatNordic force-pushed the make_west_update_parallel branch from 7f70536 to 9d8c8af Compare June 19, 2024 12:05
@RobertGalatNordic
Copy link
Author

I tested it on Windows in Power shell.

When using west update or west update -j 1 the results are as expected, nice clean output.

On the other hand, when more threads are used, the output gets a little messy.
It seems like the multiple threads push to stdout at the same time, and some of the lines are placed in the middle of other lines.

I can see how this could be fixed, but it will require a lot more refactoring in the self.update(project) which does all the job the output is generated and pushed to stdout with print statements, or it calls the git commands without catching its output, We would have to collect all this output and return it from the self.update(project) function.

So just to summarize, the default behavior is the same as before this PR, so you have to explicitly state that you want to use multiple threads for update. On Windows the output with multiple threads is messy, but when single thread is used there is no change compared to version from main.

On Linux the output looks a little better, as the lines are not interrupted, but you see lines from multiple processes one under another.

IMO this is not a big deal, to make it right we would have to collect all the output, and print it only after the job is completed, this would cause some idle time for the user (there is a package to provide progress bar for joblib so this could solve our issue of idle output for user https://pypi.org/project/joblib-progress/).

Furthermore, there is always option to not use this, and I imagine that this feature will be used in automation scripts, where we do not care about the output, just on the time.

@marc-hb
Copy link
Collaborator

marc-hb commented Jun 19, 2024

I believe there was a previous attempt to do this and it had to be reverted because of terminal issues on Windows, see 2nd and abandoned attempt in #551. Were you aware of it? This new #713 attempt looks surprisingly much shorter: sounds too good to be true.

I looked into the previous attempt, and my approach seems simpler because I used a library that has been designed to be a drop-in replacement to for loops.

Ha! That explains. Re-using a higher-level library: great idea. This makes the PR very short indeed, nice job.
[...]
Either way, when all output is buffered then please print a message like "parallel update started, be patient..."

On the other hand, when more threads are used, the output gets a little messy.
It seems like the multiple threads push to stdout at the same time, and some of the lines are placed in the middle of other lines.

I can see how this could be fixed, but it will require a lot more refactoring in the self.update(project) which does all the job the output is generated and pushed to stdout with print statements, or it calls the git commands without catching its output, We would have to collect all this output and return it from the self.update(project) function.

Wait, so you're saying joblib/loky have no ability to capture terminal/standard outputs?

I skimmed their APIs and I found joblib/loky#306 which all seem to confirm they indeed don't care about standard outputs. Their focus is parallel computation, which I bet uses standard outputs rarely - only when something goes wrong.

This is a MAJOR disappointment :-(

All this time I was confused and assumed joblib/loky offered this ability to capture standard outputs. I thought this was why the joblib solution was much shorter. Now I understand it is shorter because it does not try to capture outputs and just let them pass through.

I wouldn't mind a first shot that does not capture outputs (others may disagree) but there would need to be very clear path towards implementing this in the future and I see none with joblib right now; it looks like a dead-end.

Now that memories are coming back I feel like I just started repeating the previous discussions on this topic. To avoid such a repeat, please review all past discussions in #533, #551 and the ones before that. It's a big long sorry but it will be shorter than repeating them.

As you can see the title of the aborted fix #533 was project: overwrite stdout/stderr objects as well, that was a pretty strong clue. I also mentioned multiple times that the previous attempt was reverted because of terminal issues.

Furthermore, there is always option to not use this, and I imagine that this feature will be used in automation scripts, where we do not care about the output, just on the time.

Failures happen in automation, then output is important too. In case of failure, outputs can be even more important than in interactive use because there is often no simple access to the automation environment and no or little "trial and error" possibility to test again. Changing verbosity levels and other parameters is also more difficult at best and impossible at worst.

PS: I also discovered that the main reason loky was created was to avoid bugs in Python multiprocessing. But these bugs have apparently been fixed in 3.7 and west requires 3.8 minimum

@marc-hb
Copy link
Collaborator

marc-hb commented Jun 19, 2024

Failures happen in automation, then output is important too.

Moreover: compilers, linkers etc. are used to have their outputs interleaved with concurrent processes, it's been like this for them for decades. So they expect concurrency and generally make a good effort to push standalone "chunks" of information. At the receiving end, make and ninja try to somehow respect that.

On the other hand, I really doubt git output comes prepared for such concurrency and "naive" parallelization like the one in this PR does not either; your test in Powershell proves that.


Trying to summarize:

  1. I would approve a PR that defaults to -j 1 and prints interleaved garbage terminal output for -j > 1 but only if it does not add any new, significant dependency (like joblib) that does not care about outputs and just lets them through. Maybe Python's built-in multiprocessing could be enough, I don't know.
    Also, there should be a very clear warning in the --help that -j > 1 can produce garbage.

  2. Based on past discussions (links already shared above), I suspect @mbolivar-ampere may disagree with 1. He should be back from vacation soon. August 2024 EDIT: he doesn't have time to maintain west any more.

  3. Without waiting, please extract from this PR and submit separately the creation of the project_update() and project_update_some() functions. Leave aside the actual -j bits for now. This will lay the ground for ANY parallelization attempt with pretty much any library and it can't hurt so I think we can merge that preliminary bit yesterday. Let's get those mundane bits merged and out of the way.

  4. Preferably in a separate commit, please also submit your test changes like this:

pytest.mark.parametrize("options", [""]) #, "-j 1", "-j 2", "-j -1"])`
def test_update_some_with_imports(options, repos_tmpdir):

Again, this will help anyone experimenting with any sort of -j prototype. We don't want to lose this no matter what happens next.

@@ -416,14 +416,14 @@ def test_grep(west_init_tmpdir):

assert re.search('west-commands', cmd('grep -- -- -commands'))


def test_update_projects(west_init_tmpdir):
@pytest.mark.parametrize("options", ["", "-j 1", "-j 2", "-j"])
Copy link

@pdunaj pdunaj Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is -j indented? should not be -j -1?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree seems so from here: parser.add_argument('-j', '--jobs', nargs='?', const=-1, that -1 should be passed

Copy link
Collaborator

@marc-hb marc-hb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above - joblib looks nice but it does not seem to add any value compared to built-in Python modules.

@carlescufi carlescufi requested a review from pdgendt August 27, 2024 11:34
@pdgendt
Copy link
Collaborator

pdgendt commented Aug 28, 2024

See above - joblib looks nice but it does not seem to add any value compared to built-in Python modules.

Agreed, something like?

from multiprocessing.pool import Pool

# ...
        processes = self.args.jobs if self.args.jobs > 0 else None
        if processes == 1:
            for project in self.manifest.projects:
                project_update(project)
        else:
            pool = Pool(processes=processes)
            pool.map_async(project_update, self.manifest.projects)
            pool.join()

@marc-hb
Copy link
Collaborator

marc-hb commented Aug 28, 2024

Agreed, something like?

Yes something like this.

@pdgendt
Copy link
Collaborator

pdgendt commented Sep 12, 2024

@RobertGalatNordic any update?

@marc-hb
Copy link
Collaborator

marc-hb commented Sep 12, 2024

@pdgendt if you have time to spend on this, I think you could make pretty short work of a re-implementation of this that uses some built-in, multiprocessing module instead (assuming all old multiprocessing bugs that motivated the creation of joblib have been fixed)

@pdgendt
Copy link
Collaborator

pdgendt commented Sep 13, 2024

@pdgendt if you have time to spend on this, I think you could make pretty short work of a re-implementation of this that uses some built-in, multiprocessing module instead (assuming all old multiprocessing bugs that motivated the creation of joblib have been fixed)

I took a quick stab at it, but it's not as easy apparently, because of the shared memory stuff.

Comment on lines 895 to 896
help='''Use multiple jobs to paralelize update process.
Pass -1 to use all avaliable jobs.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
help='''Use multiple jobs to paralelize update process.
Pass -1 to use all avaliable jobs.
help='''Use multiple jobs to parallelize the update process.
Pass -1 to use all available jobs.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed

@kartben
Copy link

kartben commented Nov 29, 2024

rebased to address switch to pyproject.toml + @pdgendt comments re: typos

The projects from manifests were updated one by one.
The parallel aproach has potential to fully utilize
CPU and network connection to speed up the update.

Signed-off-by: Robert Gałat <robert.galat@nordicsemi.no>
@marc-hb
Copy link
Collaborator

marc-hb commented Dec 2, 2024

I took a quick stab at it, but it's not as easy apparently, because of the shared memory stuff.

The "shared memory stuff" is also the... messenger. It's telling us that this PR runs concurrently code that was never designed to. What could possibly go wrong? On top of all other issues already discussed above.

I took a deep dive in the code last weekend and I think an asyncio implementation should be possible with a reasonable amount of change. This may also help with #747 , #519 and maybe #338. I hope I can share a prototype this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance How long things take
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants