Comments on README #1

talex5 · 2023-07-19T13:57:32Z

By providing such key concurrent runtime facilities in scheduler independent form we can have an ecosystem of interoperable libraries, multiple schedulers, and avoid unnecessary community split.

The basic problem here is that you can't run the same code with different schedulers. Eio.Fiber specifies how the fibers are scheduled, and applications can rely on this. e.g.

let x = ref 0 in
Fiber.both
  (fun () -> incr x)
  (fun () -> incr x);
!x

The above program returns 2. However, with a work-stealing scheduler it is a race.

A more realistic example:

let a, b = Fiber.pair
  (fun () -> OpamFile.OPAM.read_from_string (Path.load (packages / "foo.opam"))
  (fun () -> OpamFile.OPAM.read_from_string (Path.load (packages / "bar.opam"))

In Eio, this loads foo.opam and bar.opam in parallel, and then parses them one at a time. In Domainslib, it crashes because opam isn't thread-safe.

It's also possible to have code that works under Domainslib but would fail under Eio (e.g. spawning two fibers and one blocks the domain waiting for something the other one will do).

So the problem I see is not so much one of how to allow this, but how to make it safe and without splitting the ecosystem. It would be possible to make another API (e.g. Generic.Fiber) for code that can be used either way, but writing against that is harder than writing against either more specific API.

In general, you can't automatically compose:

Code that assumes concurrent vs parallel scheduling.
Structured and non-structured concurrency (the result is just unstructured).
Capability safe and non-capability safe (the result is just non-capability safe).

Or perhaps you'd rather not have capabilities, because you feel that they are unnecessary or you'd rather wait for typed effects to provide much of the same ability with convenient type inference.

Do you have any examples of how this would work? Typed effects (as far as I understand it) is concerned with checking that the program won't fail due to an unhandled effect, whereas capabilities are about preventing security problems (particularly where a resource is accessed by something that had permission to access it, but for a different reason). There doesn't seem to be much overlap.

The example in my blog post was a web-server that is asked for https://example.com/../tls_config/server.key and returns its private TLS key. A typed-effects system will accept the code because it tries to access the file-system (as expected) and an IO handler is in scope. A capability system requires the web-server to say which capability it wants to use (to the static files directory or to the TLS configuration directory), which avoids this. You can't infer which one to use.

As another example, just last week I was converting an Lwt application to Eio. It's a service that allows users to request that it process some remote data, which it downloads to a cache directory. However, it forgot to combine the cache directory path with the user-supplied name, and instead downloaded things to the server's root directory (which hadn't been discovered previously)! I think a typed effects system would only report that the service writes to the disk, which is expected. A capability system won't let you use a string (pure data) as a file (mutable state), so the function would have to take the directory as an input and be explicit that it's combining its authority from the sys-admin with the name from the user. To see where the application might write, you can just start at the main entry-point, instead of having to read all the code.

In both cases, you could fix it by having the program perform different effects for e.g. reading TLS keys and reading static files, and then install effect handlers to deal with that, but that's way more work than just passing the directory as an argument, and you have to know there's a security problem ahead of time, which rather defeats the purpose.

As the blog notes, you can put e.g. the root directory capability in fiber-local storage in your main function, allowing any code to access any file, effectively turning off the security system. Some people might want to do that if they have a lot of old code to port. It makes the code hard to audit, of course, but at least people can see from the main function what you've done. Typed effects would make it possible to track which libraries relied on that so you couldn't forget to add it. Though since most code needs changes anyway to remove the monadic binds, I find it easier to fix it up at the same time.

The text was updated successfully, but these errors were encountered:

polytypic · 2023-07-20T10:02:21Z

how to make it safe

Briefly, the way I see it, the question here is one of semantics.

If you look at the documentation of domain-local-await and domain-local-timeout you can see that the documentation has many notes on what the semantics of these primitives are. The semantics are fairly carefully designed to allow a range of implementations for providers and to allow both providers and consumers to make some simplifying assumptions.

As described in the README, this is still WIP and one of many things missing is the semantic descriptions of the draft primitives like NestedParallelism.par and Fiber.spawn. For those primitives the semantics should answer questions such as

Is it guaranteed that actions will be run on different domains or threads?
Is it allowed that the actions be run on different domains or threads?

and possibly others. The answers to those questions determines what actions are safe for consumers when using the primitives and what implementations are allowed by providers.

Roughly, the intention is to specify a kind of relaxed semantics that will allow a range of implementations. So, when using those specific primitives, it will likely not be safe to make assumptions about whether or not the actions may or may not be run on different domains or threads. OTOH, to guarantee that an action will be run on a different thread, e.g. so that it is safe to block in the action in a scheduler unfriendly manner, one would e.g. have to explicitly use OCaml's threads.

The example you gave

let a, b = Fiber.pair
  (fun () -> OpamFile.OPAM.read_from_string (Path.load (packages / "foo.opam"))
  (fun () -> OpamFile.OPAM.read_from_string (Path.load (packages / "bar.opam"))

is kind of subtle in my opinion. It uses a primitive Fiber.pair that gives an impression of concurrency / parallelism, yet relies on sequential execution for safety. Using the kind of relaxed semantics primitives I am proposing here one would need to express the concurrent / parallel and sequential parts more explicitly.

It is important to note that the point here is to give minimalistic primitives, which are actually only intended for direct use by library writers — not by application programmers. The intention is that higher level libraries, e.g. Kcas, Saturn, and others, will then provide higher level operations, e.g. parallel_map, that make it more convenient to express various concurrent programming patterns.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments on README #1

Comments on README #1

talex5 commented Jul 19, 2023 •

edited

Loading

polytypic commented Jul 20, 2023

Comments on README #1

Comments on README #1

Comments

talex5 commented Jul 19, 2023 • edited Loading

polytypic commented Jul 20, 2023

talex5 commented Jul 19, 2023 •

edited

Loading