Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to implement non-blocking POSIX send in terms of stream.write? #441

Open
badeend opened this issue Jan 19, 2025 · 8 comments
Open

How to implement non-blocking POSIX send in terms of stream.write? #441

badeend opened this issue Jan 19, 2025 · 8 comments

Comments

@badeend
Copy link
Contributor

badeend commented Jan 19, 2025

Am I correct to say that: when stream.write returns BLOCKED, it continues to have access to the provided memory buffer until the write either finishes or is canceled?
If so, does this mean wasi-libc has to perform an intermediate copy into a private socket-level buffer first for every send/recv on non-blocking sockets? (in order to emulate the "readiness"-based POSIX async model)

@dicej
Copy link
Collaborator

dicej commented Jan 19, 2025

Great question. I believe @lukewagner's plan was to allow zero-length stream.read and stream.write calls, which may be used to query whether the other end of the stream is ready. So a POSIX send call would first issue a zero-length stream.write, and if it returns BLOCKED, cancel it and return EAGAIN/EWOULDBLOCK. Otherwise, issue a non-zero length stream.write, which should not return BLOCKED (but which can be cancelled if it does).

However, this could lead to a livelock-type situation if both the read and write ends are repeatedly probing for readiness using zero-length calls, but failing to line up since both of them are canceling any call that returns BLOCKED. The only way to avoid that is if at least one side is willing to let a BLOCKED call complete. @lukewagner how do you think that would work if both ends are using POSIX send/recv?

@lukewagner
Copy link
Member

Yes, relaxing the current length restrictions on stream.{read,write} so that zero-length signal "readiness" of the other side is the plan. I'm mostly just waiting for someone to get enough into the weeds of libc to confirm that the approach makes sense.

To avoid the livelock scenario @dicej mentions, the general rule needs to be that, if you are using a zero-length read/write to probe readiness, once you hear back that "the other side is ready", you must follow that up with a non-zero-length read/write that you allow to complete (either by calling stream.{read,write} synchronously, which I think would be the preferred option for libc, or calling it asynchronously, and not cancelling if the return value is BLOCKED).

If you apply these rules to a vanilla scenario where there is a stream between two components that are both applying this non-blocking I/O technique, then you'd get the following sequence, which I think is what you'd expect:

  • The first component, call it A, tries to do a non-blocking write() on a file-descriptor which (upon creation) has the "ready" bit unset, and so libc makes a zero-length stream.write, which returns BLOCKED, so libc returns EWOULDBLOCK to the caller. A does other things until eventually waiting on a libc select() which proceeds to block on task.wait.
  • The second component, call it B, tries to do a non-blocking read() on a file-descriptor which (upon creation) has the "ready" bit unset, and so libc makes a zero-length stream.read which rendezvous with A's and thus returns eagerly to B (but also marking A's stream-writable-end as having a pending event). Following the above rules, B's libc code now makes a synchronous stream.read with a non-zero-length buffer which blocks because there is no non-zero-length buffer from A to rendezvous with (yet).
  • Because A's stream-writable-end has a pending event, the C-M runtime now switches back to A, with task.wait returning that the zero-length read/write completed which libc internally handles by setting the "ready" bit on the file descriptor and having select() return that file descriptor. A then performs a libc write(), which now (b/c of the "ready" bit) libc implements by performing a synchronous non-zero-length buffer stream.write which rendezvous with B, eagerly returning how much was written (in the range [1, buffer-length)) while also marking B's readable-stream-end as having a pending event. A's libc now clears the "ready" bit and returns the results to A, which now tries to do another write() and the process repeats from A's POV.
  • Because B's readable-stream-end has a pending event, the C-M runtime now switches back to B, with task.wait returning the [1, buffer-length) result of the read/write. B's libc now clears the "ready" bit and returns the results to B, which tries to do another read() and the process repeats from B's POV.

Thus, you do get "blocking", but only in guest-to-guest cases where, one way or another, you need to synchronously block and switch between A and B for streaming to work at all. In host-guest streaming scenarios, such blocking could be avoided by the host as an impl detail.

I hope that helps, let me know if that sounds like it'd work or there are any problems.

@badeend
Copy link
Contributor Author

badeend commented Jan 25, 2025

Thanks for the detailed explanation (as usual 🤠)

Unless I'm mistaken, this protocol seems susceptible to infinite blocking in the presence of false wake-ups. If one side of the stream reports that its 0-sized read/write is done, it must be very, very sure it actually has some data to read/write, otherwise the other side of the stream is toast. In the worst case new data never appears and the other side will never wake up again.
An example of false wake-ups surfaced on Zulip very recently with the conclusion that it is generally impossible to avoid on POSIX.

Ultimately, I'll need more hands-on experience with all of this to fully understand all the possible interactions.

@lukewagner
Copy link
Member

Yeah, that is a hazard. I'm not sure how likely it is to occur, but I guess I can see it happening hypothetically if code performs a non-blocking read() or write() and then never follows up with a blocking read or write. I think the rule would be that 0-length reads/writes must be considered as not just querying but also signalling readiness, and after one completes, you need to follow-up with a non-0-length read/write. But if some particular piece of code was breaking this rule, I suppose a fallback option would be to give libc an option to perform bounded buffering in linear memory, using EWOULDBLOCK to exert backpressure.

@badeend
Copy link
Contributor Author

badeend commented Jan 28, 2025

As an optimization, the CM could additionally offer stream.try-read & stream.try-write built-ins. They would both take the same parameters and return the same types as the non-try variants, but with the added guarantee to the caller that the provider buffer is always released immediately after the built-in returns, regardless of returned status.

  • If the try- method returns Complete, the data has immediately been copied, just like a regular read/write.
  • If the try- method returns Blocked, no data is copied, the buffer is immediately handed back to the caller, and is for all intents-ands-purposes the equivalent to starting a 0-sized read/write and outlined above.

This reduces the number of operations on the happy path (where data is immediately ready) from 2 to 1.

Taking it even further; with these methods in place, there may not even be a need to expose 0-sized reads/writes to the guest.

@lukewagner
Copy link
Member

Good idea! As a small tweak to consider, we could also add try as an alternative immediate to async in the existing {stream,future}.{read,write} built-ins, making it, I expect, a pretty small bit of extra work. try could also be used to optimize the buffering scheme I mentioned above. But first, regarding:

Taking it even further; with these methods in place, there may not even be a need to expose 0-sized reads/writes to the guest.

I think we still need 0-length reads/writes as the way for select() (et al) to wait for N distinct file descriptors without having any buffers for any of them. But with try, it could work like this:

  • select() creates 0-length reads/writes for any stream end that doesn't already have a read/write outstanding, then waits for one to complete, then sets a "ready" bit on it (which only ends up being relevant for writable stream ends).
  • When doing a non-blocking libc read()/write(): libc first trys stream.{read,write}, which maybe completes eagerly. But if not:
    • If it's a write() and the "ready" bit is set, copy the caller-supplied buffer into an internal buffer, and issue an async stream.write with the internal buffer, eagerly returning to the C caller. select() is thus waiting for this (non-0-length) stream.write to complete before signalling readiness so that as an invariant the "ready" bit implies that the internal buffer is empty.
      • Fancier multiple-read()-buffering schemes are also possible.
    • Otherwise, return EWOULDBLOCK.
      • If we care about pathological cases where both sides are busy-waiting on non-blocking read()/write() (no intervening select() et al), set the "ready" bit if the internal buffer is now empty, using task.poll/waitable-set.poll (which both, incidentally, yield, allowing other components to make progress).

To avoid spurious EWOULDBLOCK-after-select() on read(), this scheme requires changing #444 so that, when a 0-length read and write rendezvous, only the 0-length write completes, leaving the 0-length read pending until a non-0-length write comes along to complete it, at which point the stream.read try will succeed. Thus, once a 0-length read completes, the only way a subsequent non-blocking read() would return EWOULDBLOCK is if there is a stream.cancel-write which I don't think libc would ever do itself, so it shouldn't happen unless shenanigans. And I don't think this asymmetric semantics is arbitrary either: if you're using 0-length reads in the first place, you're by definition not reader-side buffering (b/c if you were, you'd need to perform a non-0-length stream.read into an internal buffer and wait for that to complete before signalling readiness before the read()). Thus, when you have two rendezvousing 0-length reads/writes, you always want the 0-length write to complete first so it can do any internal buffering if desired (which, unlike read(), can happen during the write() call because the data is coming from the caller of write(), not the stream). Thus, the asymmetry in completion order derives from the directional flow of data.

Since buffering shouldn't happen in host-to-wasm, wasm-to-host, wasm-to-wasm synchronous or wasm-to-wasm asynchronous-completion-based scenarios and since a traditional OS piping between two processes also copies into an intermediate buffer (the pipe's kernel memory), this seems equivalent-or-better than the status quo and so a pretty good default behavior. We could offer an advanced option (say, set by ioctl or setsockopt) to disable writer-side buffering, if someone wanted it (and couldn't use a synchronous or completion-based I/O approach).

WDYT?

@badeend
Copy link
Contributor Author

badeend commented Feb 2, 2025

Good idea! As a small tweak to consider, we could also add try as an alternative immediate to async in the existing {stream,future}.{read,write} built-ins

Fine by me 👍
Maybe add a validation rule that the try option may only be used in combination with the async option?


I think we still need 0-length reads/writes as the way for select() (et al) to wait for N distinct file descriptors without having any buffers for any of them.

With the try variant in place I don't see a need for libc to ever need a 0-sized read/write.
If there's no active (try) stream.read/write in progress at the time of calling select, it can immediately return that file descriptor as being "ready". The next call to POSIX read or POSIX write then initiates a proper (try) stream.read/write with data.


when a 0-length read and write rendezvous, only the 0-length write completes, leaving the 0-length read pending until a non-0-length write comes along to complete it

Awesome! This seems like a simple but effective way to ensure both sides of the stream make progress, even if they're both readiness-based. Earlier you mentioned:

To avoid the livelock scenario, the general rule needs to be that, if you are using a zero-length read/write to probe readiness, once you hear back that "the other side is ready", you must follow that up with a non-zero-length read/write

With the asymmetric rendezvous mechanism in place, I presume this restriction is now lifted for the read side? I.e. the reader is allowed to only issue try reads?

Also, would it make sense to enforce this protocol for the writer? For example trap the writer if it attempts to do two try writes in a row on the same stream/future, without a non-0-sized non-try write in between?


Since buffering shouldn't happen in host-to-wasm, wasm-to-host, wasm-to-wasm synchronous or wasm-to-wasm asynchronous-completion-based scenarios (...)

Could you elaborate on the wasm-to-host aspect? Seems to me that with the livelock prevention protocol mentioned above, the libc-based guest would need an intermediate copy in case the preceding try write couldn't succeed immediately. Regardless of whether the other side is the host or another guest.


this seems equivalent-or-better than the status quo and so a pretty good default behavior

Aside from the potential double buffering (see previous point): Agree! 👌

@lukewagner
Copy link
Member

With the try variant in place I don't see a need for libc to ever need a 0-sized read/write.
If there's no active (try) stream.read/write in progress at the time of calling select, it can immediately return that file descriptor as being "ready". The next call to POSIX read or POSIX write then initiates a proper (try) stream.read/write with data.

Ah, maybe we're thinking of different behavior for try? I was imagining that the idea with try is that it either completes or returns "would block" without leaving any pending operation in progress when "would block". Based on this, the 0-length async read/write is still needed for selet() to register to learn about readiness once try returns "would block". But what were you thinking?

To avoid the livelock scenario, the general rule needs to be that, if you are using a zero-length read/write to probe readiness, once you hear back that "the other side is ready", you must follow that up with a non-zero-length read/write

With the asymmetric rendezvous mechanism in place, I presume this restriction is now lifted for the read side? I.e. the reader is allowed to only issue try reads?

Yep!

Also, would it make sense to enforce this protocol for the writer? For example trap the writer if it attempts to do two try writes in a row on the same stream/future, without a non-0-sized non-try write in between?

Hypothetically we could, but I worry this would have false negatives (where maybe there were two (or three or ...) in a row for some ad hoc reason, but a bounded amount, so no livelock).

Could you elaborate on the wasm-to-host aspect? Seems to me that with the livelock prevention protocol mentioned above, the libc-based guest would need an intermediate copy in case the preceding try write couldn't succeed immediately. Regardless of whether the other side is the host or another guest.

My thinking here is that an optimizing host would never do the wasm equivalent of a 0-length read, it would only signal "readiness" when it had buffer space, and thus after a wasm writer's 0-length write completes, the subsequent try write would definitely succeed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants