Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parallel: rustc_data_structures: sync: implement RwLock as enum { Sync(parking_lot::RwLock), NoSync(RefCell) } #136772

Closed

Conversation

safinaskar
Copy link
Contributor

@safinaskar safinaskar commented Feb 9, 2025

parallel: rustc_data_structures: sync: implement RwLock as enum { Sync(parking_lot::RwLock), NoSync(RefCell) }

Hopefully this will make single threaded front end faster.

I carefully split changes into commits. Commit messages are self-explanatory. Squashing is not recommended.

cc "Parallel Rustc Front-end" #113349

r? SparrowLii

@rustbot label: +WG-compiler-parallel

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-compiler-parallel Working group: Parallelizing the compiler labels Feb 9, 2025
@the8472
Copy link
Member

the8472 commented Feb 9, 2025

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 9, 2025
bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 9, 2025
…, r=<try>

[perf run] parallel: rustc_data_structures: sync: implement RwLock as enum { Sync(parking_lot::RwLock), NoSync(RefCell) }

Please, somebody, run perf.

parallel: rustc_data_structures: sync: implement `RwLock` as `enum { Sync(parking_lot::RwLock), NoSync(RefCell) }`

Hopefully this will make single threaded front end faster.

I carefully split changes into commits. Commit messages are self-explanatory. Squashing is not recommended.

cc "Parallel Rustc Front-end" rust-lang#113349

r? SparrowLii

`@rustbot` label: +WG-compiler-parallel
@bors
Copy link
Contributor

bors commented Feb 9, 2025

⌛ Trying commit 95ce6d0 with merge d71a46a...

@bors
Copy link
Contributor

bors commented Feb 9, 2025

☀️ Try build successful - checks-actions
Build commit: d71a46a (d71a46abf4c4d81c0449c3fe8930bad47e2b1bb3)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (d71a46a): comparison URL.

Overall result: ❌ regressions - please read the text below

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

mean range count
Regressions ❌
(primary)
0.6% [0.2%, 3.5%] 10
Regressions ❌
(secondary)
0.4% [0.2%, 0.5%] 22
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-0.2% [-0.3%, -0.1%] 2
All ❌✅ (primary) 0.6% [0.2%, 3.5%] 10

Max RSS (memory usage)

Results (primary 1.4%, secondary 2.3%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.4% [0.5%, 3.4%] 3
Regressions ❌
(secondary)
2.3% [1.8%, 2.6%] 4
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 1.4% [0.5%, 3.4%] 3

Cycles

Results (primary 3.2%, secondary 2.2%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
3.2% [3.2%, 3.2%] 1
Regressions ❌
(secondary)
2.2% [2.2%, 2.2%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 3.2% [3.2%, 3.2%] 1

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 781.536s -> 779.774s (-0.23%)
Artifact size: 329.13 MiB -> 329.22 MiB (0.03%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Feb 9, 2025
@safinaskar
Copy link
Contributor Author

Results look bad. But I think instruction count is not relevant at this particular case. As well as I understand it literally counts instructions. I. e. integer addition, atomic compare-and-exchange and memory barrier will all count as one instruction. So, it is expected that instruction count will be larger than before. Because of those match constructs. But I still hope that rustc will be faster, because I replaced (in single threaded case) atomic instructions with cheaper ones.

So, these are metrics we should look at:

  • cpu_clock
  • cpu_clock:u
  • cycles:u (as well as I understand it counts CPU clock count)
  • task_clock
  • task_clock:u
  • wall-time

Hopefully all them will be smaller. (I haven't looked at them yet. Yes, I share list above before actually seeing them to combat cherry picking ( https://en.wikipedia.org/wiki/Cherry_picking ))

@safinaskar
Copy link
Contributor Author

Side note: I want to attach likely to match arms. Is this possible?

@SparrowLii
Copy link
Member

Side note: I want to attach likely to match arms. Is this possible?

Yea, sure.

@safinaskar
Copy link
Contributor Author

Okay, now I looked at these metrics, and they are not good. But I don't give up. :) Most of these metrics contain single test only. For example, task-clock seems to contain syn-1.0.89 only. I'm trying to understand, what is going on

@safinaskar
Copy link
Contributor Author

Yea, sure

And how to do this?

@SparrowLii
Copy link
Member

Yea, sure

And how to do this?

You can refer to the impl of Lock here, it uses unlinkely though

…Lock"

parking_lot::RwLock doesn't have "impl Clone", so we should not have, either
…lot::RwLock), NoSync(RefCell) }

Also we remove ERROR_CHECKING, because RefCell already panics on borrow errors
@safinaskar safinaskar force-pushed the parallel-2025-02-08-11-55 branch from 95ce6d0 to 5397b39 Compare February 10, 2025 03:37
@safinaskar safinaskar changed the title [perf run] parallel: rustc_data_structures: sync: implement RwLock as enum { Sync(parking_lot::RwLock), NoSync(RefCell) } parallel: rustc_data_structures: sync: implement RwLock as enum { Sync(parking_lot::RwLock), NoSync(RefCell) } Feb 10, 2025
@safinaskar
Copy link
Contributor Author

safinaskar commented Feb 10, 2025

@SparrowLii , okay, I figured it out. I used cold_path ( #120370 ). Okay, so I changed my code to use cold_path and also did rebase.

Please, do perf run again

@SparrowLii
Copy link
Member

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 10, 2025
@bors
Copy link
Contributor

bors commented Feb 10, 2025

⌛ Trying commit 5397b39 with merge 3a40e21...

bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 10, 2025
…, r=<try>

parallel: rustc_data_structures: sync: implement RwLock as enum { Sync(parking_lot::RwLock), NoSync(RefCell) }

parallel: rustc_data_structures: sync: implement `RwLock` as `enum { Sync(parking_lot::RwLock), NoSync(RefCell) }`

Hopefully this will make single threaded front end faster.

I carefully split changes into commits. Commit messages are self-explanatory. Squashing is not recommended.

cc "Parallel Rustc Front-end" rust-lang#113349

r? SparrowLii

`@rustbot` label: +WG-compiler-parallel
@bors
Copy link
Contributor

bors commented Feb 10, 2025

☀️ Try build successful - checks-actions
Build commit: 3a40e21 (3a40e21c50106cd5cb202a36b6e7f6b7879f5806)

@rust-timer

This comment has been minimized.

@Kobzol
Copy link
Contributor

Kobzol commented Feb 10, 2025

Okay, now I looked at these metrics, and they are not good. But I don't give up. :) Most of these metrics contain single test only. For example, task-clock seems to contain syn-1.0.89 only. I'm trying to understand, what is going on

The rest of the results weren't significant. You can display the non-significant results using the Filters UI.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (3a40e21): comparison URL.

Overall result: ❌ regressions - please read the text below

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

mean range count
Regressions ❌
(primary)
0.4% [0.2%, 1.3%] 10
Regressions ❌
(secondary)
0.4% [0.2%, 0.5%] 23
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-0.2% [-0.3%, -0.1%] 2
All ❌✅ (primary) 0.4% [0.2%, 1.3%] 10

Max RSS (memory usage)

Results (primary 0.0%, secondary 2.2%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.6% [0.5%, 0.6%] 2
Regressions ❌
(secondary)
2.2% [1.9%, 2.5%] 5
Improvements ✅
(primary)
-1.1% [-1.1%, -1.1%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.0% [-1.1%, 0.6%] 3

Cycles

Results (secondary -1.8%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-1.8% [-1.8%, -1.8%] 1
All ❌✅ (primary) - - 0

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 779.273s -> 781.777s (0.32%)
Artifact size: 329.09 MiB -> 329.26 MiB (0.05%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 10, 2025
@safinaskar
Copy link
Contributor Author

Results look bad, again. It is difficult for me to properly interpret them.

Still I'm not giving up. :) I will wait for cold_path improvements ( #133852 )

@safinaskar
Copy link
Contributor Author

Okay, I don't like timer results, and I don't want to spend my time more. So, I'm closing this PR.

Theoretically, I think what we really should do is to measure wall clock time reliably. It is possible. For example, https://soft-dev.org/src/krun/ does this (they even have their own patched Linux kernel: https://github.com/softdevteam/krun-linux-kernel !). We should setup project-wide infra based on krun or something similar. But I personally don't want to do this, so I simply give up.

So, I will not in near future submit more PRs to parallel compiler. (But it is possible I will send few clean-up patches)

@safinaskar safinaskar closed this Feb 11, 2025
@safinaskar safinaskar deleted the parallel-2025-02-08-11-55 branch February 21, 2025 05:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
perf-regression Performance regression. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-compiler-parallel Working group: Parallelizing the compiler
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants