Skip to content

Peer selection governor FRP

Marcin Wójtowicz edited this page Nov 27, 2024 · 2 revisions

In progress

As part of making ouroboros-network independent of cardano-node and pursue a more reusable diffusion implementation, peer selection governor was identified as very tightly coupled component to the current stack (provide reference). This is slightly unfortunate, since this is a major component of the stack, and it's core functionality in principle should be more generic. This outcome is a consequence of some early design goals, which placed emphasis on its performance, among other things. Specifically, the governor has some short circuiting ability such as to minimize some of the work it has to perform, and also runs only when it needs to / on demand (provide some references?). As the code base has grown, the complexity has required numerous ad-hoc solutions which intermingled cardano-specific functionality within it's core. The current implementation also has a peculiarity that it's principal elements - the monitoring jobs - have to be specified in a particular order, and there are ad-hoc dependencies between these jobs which require proper sequencing of those actions at run-time by more ad-hoc code. These sequencing actions are expressed as explicit state updates, that are then recovered in other parts of the monitoring actions or the main loop. In effect, it is a curious blend of being very explicit and very implicit simultaneously. It is very explicit in handing of the state and it's updates, and firing off various activities in the main loop, but it's actual behavior is not specified and is only inferred by the activity of writing and reading state. By now, the peer selection governor is a reliable and performant piece of code, and has been well tested. Nevertheless, new applications with fresh monitoring jobs might run into new problems principally because it is difficult to reason about the governor. In principle, the governor is a state machine, and by expressing it as such - with proper state and transitions - it may be tamed and possibly easier to extend. FRP was investigated, and in particular one library Rhine was identified which has a good blend of qualities. It has an arrows based interface which is free of time leaks (time leaks are possible, but they have to be explicitly requested by code) and with a novel approach of resampling buffers, it is also free of duplicating work that is inherent to all the other arrow-based libraries. It's the only library that I am aware of that has gone through the effort of making initial encodings of stream-based automata (unlike for eg. machines etc which are recursive) and so should be aggresively optimized by GHC.

The main loop was moved essentially to a Clock instance. There, it awaits for events on a channel. These events can be queued up on a priority queue to be fired off later when a timeout expires, or step the governor network immediately. Different sub-clocks were defined corresponding to events of interest. When a sub-clock ticks, only the necessary part of the network is ran. Since all the interdependencies are explicit, we are free to compose the governor's activity as fit, without mingling some ad-hoc code within any action, and so there isn't an explicit list of actions that need to be defined in a particular order.

Jobs implemented: Monitoring lsj Monitoring bootstrap peers Chasing known peers, established peers

todo: Monitoring job pool Monitoring connections