Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wayland_vk: use FIFO if commit-timing and fifo protocols are available #15056

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Dudemanguy
Copy link
Member

A very long time annoyance with wayland was compositors indefinitely blocking our vo thread if the surface gets occluded in some way. We've worked around this by using mailbox and our own custom vsync function. Thankfully it looks like people are finally solving this and with these two protocols it should be possible to guarantee forward progress on vulkan which means all the workarounds we do shouldn't be needed. So we can just request fifo in this case as a default since all we want is standard vsync blocking.

It's not tested at the moment, but SDL essentially does the same thing.

@llyyr
Copy link
Contributor

llyyr commented Oct 11, 2024

We should probably wait a few months after a new Mesa release is cut with https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26150

@Dudemanguy
Copy link
Member Author

Dudemanguy commented Oct 11, 2024

I'm hoping compositor implementations would only signal clients if it knows the drivers can actually use the protocols. But I didn't really look at how it was implemented. We still have to wait for the next wayland-protocols release anyway.

@Dudemanguy Dudemanguy force-pushed the wayland-fifo branch 3 times, most recently from 043de5b to 3113c9a Compare October 12, 2024 04:49
@mahkoh
Copy link
Contributor

mahkoh commented Oct 12, 2024

I'm hoping compositor implementations would only signal clients if it knows the drivers can actually use the protocols.

There is no way for compositors to do this. I've ranted at length about this here: libsdl-org/SDL#10937 (comment)

I also don't see why the compositor should be in charge of telling an application what features a library supports. If application A wants to know what features library B supports, then it should ask library B instead of compositor C.

@Dudemanguy
Copy link
Member Author

Dudemanguy commented Oct 12, 2024

Hmm that's unfortunate. I guess that means this will just have to wait some arbitrary time after the next mesa release. I guess the semantics of --wayland-disable-vsync should change a bit so there's a "force on" option as a fallback in case someone is on some really weird configuration like newish compositor version but old mesa.

@Dudemanguy Dudemanguy force-pushed the wayland-fifo branch 2 times, most recently from f8234a9 to e84aa5e Compare October 12, 2024 15:50
@Dudemanguy
Copy link
Member Author

Dudemanguy commented Oct 12, 2024

Reworked --wayland-disable-vsync to a --wayland-internal-vsync option so users have full control of the vsync behavior. Since 1.38 is out now, I bumped that as well so this is just blocked on the next mesa release as far as I'm concerned. We'll see what the final implementation will be but for now I also made using fifo depend on having wp_presentation_v2.

Copy link

github-actions bot commented Oct 12, 2024

Download the artifacts for this pull request:

Windows
macOS

@Dudemanguy Dudemanguy force-pushed the wayland-fifo branch 4 times, most recently from 6471b74 to 0841139 Compare October 15, 2024 23:07
@Dudemanguy
Copy link
Member Author

It looks like the mesa implementation will work with presentation time v1 so I loosened up the restriction a bit.

@llyyr
Copy link
Contributor

llyyr commented Oct 16, 2024

It looks like the mesa implementation will work with presentation time v1 so I loosened up the restriction a bit.

it'll work but only for fixed refresh rate outputs. It'll go back to the old behavior as soon as VRR is enabled. See: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26150/diffs?commit_id=e7aa09688bf20a5b4e664437f302762be606c783#8b9326d5e797a10c9d7bf33ddb5420c06e700768_2135_2270

implementing v2 is trivial for any compositor that'll go through the trouble of implementing fifo-v1 and commit-timing-v1, so I don't see any issue with requiring all 3

@Dudemanguy
Copy link
Member Author

Added it back in.

video/out/vulkan/context_wayland.c Outdated Show resolved Hide resolved
@Dudemanguy Dudemanguy force-pushed the wayland-fifo branch 2 times, most recently from 35a00d7 to 62f6cda Compare October 21, 2024 14:18
@Dudemanguy Dudemanguy marked this pull request as ready for review November 23, 2024 19:15
@Dudemanguy
Copy link
Member Author

The latest mesa supports this now, so this should be fairly safe to merge once we confirm a compositor implementation works as expected.

Copy link
Contributor

@kasper93 kasper93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@llyyr
Copy link
Contributor

llyyr commented Dec 11, 2024

This should wait until at least one of Mutter/Kwin/wlroots have finalized and merged their implementations.

@kasper93
Copy link
Contributor

kasper93 commented Dec 11, 2024

This should wait until at least one of Mutter/Kwin/wlroots have finalized and merged their implementations.

Do you expect any modifications on mpv side? For me it looks like we are dependent on protocol, so not sure why some implementations should block us.

@Dudemanguy
Copy link
Member Author

Nothing for us should change. But still I would prefer to make sure it works as expected before merging.

@llyyr
Copy link
Contributor

llyyr commented Dec 11, 2024

Do you expect any modifications on mpv side? For me it looks like we are dependent on protocol, so not sure why some implementations should block us.

I tested the wlroots implementation and it is objectively worse than mpv doing vsync internally. With this MR, mpv will default to using fifo/commit-timing if available which could be worse than the status quo.

If we have itchy fingers and want to merge it, please make it default to "yes" for now and change the default to auto later when we can test the finalized version of these implementations aren't worse than what mpv does.

@kasper93 kasper93 added the priority:on-ice may be revisited later label Dec 11, 2024
@kasper93
Copy link
Contributor

Adding on-ice label as it is blocked by compositor implementations apparently.

@kasper93
Copy link
Contributor

kasper93 commented Dec 11, 2024

Do you expect any modifications on mpv side? For me it looks like we are dependent on protocol, so not sure why some implementations should block us.

I tested the wlroots implementation and it is objectively worse than mpv doing vsync internally. With this MR, mpv will default to using fifo/commit-timing if available which could be worse than the status quo.

If we have itchy fingers and want to merge it, please make it default to "yes" for now and change the default to auto later when we can test the finalized version of these implementations aren't worse than what mpv does.

I don't want to start a philosophical discussion, but my point is that protocols and specifications exist for a reason (we implement according to protocol, not specific compositor). If a compositor's implementation is incomplete or bugged, it is not mpv's issue, nor should it be a concern for us in any way (it's on them to enable when ready). mpv provides a way to disable the feature if users encounter problems. Of course, I agree that waiting is a valid option, and I'm not pushing to merge anything, but I want to point out that mpv shouldn't be hindered by third-party issues. Also, delaying features only postpones testing in the wild, which could otherwise happen sooner. Imagine that compositor developers could use mpv to validate their implementation, but we are shifting the responsibility the other way around willingly.

@llyyr
Copy link
Contributor

llyyr commented Dec 11, 2024

my point is that protocols and specifications exist for a reason (we implement according to protocol, not specific compositor).

I agree completely, but we're not talking about specific compositor here. No compositor is implementing is correctly (or at all) in a released version right now. There are also edge cases with the protocol that need to be ironed out and we don't know how it might affect mpv (though it shouldn't). Once we default to using fifo/commit-timing, we lose control over changing the default until the next release cycle which in mpv terms means 6 months.

I want to point out that mpv shouldn't be hindered by third-party issues.

Feel free to merge this, I'm not blocking it. My only concern was that it could potentially be a regression but if we don't care about it then LGTM

Imagine that compositor developers could use mpv to validate their implementation, but we are shifting the responsibility the other way around willingly.

You already can without this PR, with --wayland-disable-vsync=no --vulkan-swap-mode=fifo. Test clients [1] [2] [3] also exist for this specific purpose. This PR only makes it use fifo by default.

Anyone building a compositor to test this in the wild will be fine passing two arguments to mpv.

@mahkoh
Copy link
Contributor

mahkoh commented Dec 11, 2024

No compositor is implementing is correctly (or at all) in a released version right now.

If you know of a bug in https://github.com/mahkoh/jay feel free to open an issue. That being said, I don't see how fifo could ever have better results than the existing mailbox code.

@llyyr
Copy link
Contributor

llyyr commented Dec 11, 2024

If you know of a bug in https://github.com/mahkoh/jay feel free to open an issue.

Occluded mpv surface progressed completely unthrottled on Jay, but that was last month and I haven't tested since then. I intended to report this but it slipped my mind. I'll open an issue if it's still a problem tomorrow or day after. thx for the work.

@mahkoh
Copy link
Contributor

mahkoh commented Dec 11, 2024

Occluded mpv surface progressed completely unthrottled on Jay

Hmm, that is surprising. In my implementation I have diverged a bit from the mainstream and made it so that even wp-fifo alone should be enough to throttle presentation to 40hz.

When I tested this mpv branch, I believe that I also confirmed that the throttling was working correctly. I have repeated this test just now with vkcube-wayland and it looks like it is still working:

~$ WAYLAND_DEBUG=1 vkcube-wayland 2>&1 | rg --line-buffered set_barrier | sed -u -E 's/^(.*?).......\].*/\1/' | uniq -c
     44 [3826
     40 [3827
     40 [3828
     40 [3829
     40 [3830
     40 [3831
     40 [3832
     40 [3833

I extract the "second" component from the wayland debug output and count how many wp_fifo.set_barrier events occur with the same timestamp.

I'll open an issue if it's still a problem tomorrow or day after. thx for the work.

Please do if you have a reproducer.

@llyyr
Copy link
Contributor

llyyr commented Dec 11, 2024

Please do if you have a reproducer.

I just tried it and this is a mpv bug related to suspended event. When mpv is suspended, it stops drawing new frames. I don't think this is appropriate when using fifo but I'm not sure how to fix it. This works fine when using legacy fifo or mailbox, but for some reason not with fifo-v1. mpv basically stops receiving any events in WAYLAND_DEBUG when it's not drawing frames when using fifo-v1.

Jay works perfectly if you specify --force-render or ignore the suspended xdg_toplevel event.

I'll let @Dudemanguy figure out what the right fix here is

@Dudemanguy
Copy link
Member Author

Dudemanguy commented Dec 11, 2024

Well we certainly deliberately not draw any frames when mpv is suspended. But that shouldn't cause the surface to be permanently blocked. Or at least as far as I know. And when mpv is brought back into view, it should receive callbacks again, be visible and draw frames again. At least that's how it should work but maybe fifo + commit-timing changes this model in a way that makes this not viable.

@llyyr
Copy link
Contributor

llyyr commented Dec 12, 2024

But that shouldn't cause the surface to be permanently blocked. Or at least as far as I know. And when mpv is brought back into view, it should receive callbacks again, be visible and draw frames again.

It does, however during that time mpv thinks it's progressing unthrottled with video-sync=display-resample, so it's possible to be a bug entirely with mpv. If mpv is "suspended" for 30 seconds with display-resample, it'll "present" frames fast enough to reach eof on a 24 minute file.

@Dudemanguy
Copy link
Member Author

Does that mean we have to throttle ourselves if we aren't drawing?

@Dudemanguy
Copy link
Member Author

Does that mean we have to throttle ourselves if we aren't drawing?

Answering my own question here, but it's "yes". I forgot that I encountered similar behavior on x11 (didn't throttle then but just rendered anyway).

With the upcoming fixes to FIFO in wayland, it should be preferable to
use FIFO instead of our own hacky heuristic. This means
--wayland-disable-vsync should become a tristate option with an "auto"
behavior (not implemented in this commit). The semantics have to
slightly change so introduce --wayland-internal-vsync and deprecate
--wayland-disable-vsync.
A very long time annoyance with wayland was compositors indefinitely
blocking our vo thread if the surface gets occluded in some way. We've
worked around this by using mailbox and our own custom vsync function.
Thankfully it looks like people are finally solving this and with these
two protocols it should be possible to guarantee forward progress on
vulkan which means all the workarounds we do shouldn't be needed. So we
can just request fifo in this case as a default since all we want is
standard vsync blocking.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority:on-ice may be revisited later
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants