Skip to content

Latest commit

 

History

History
4607 lines (3514 loc) · 277 KB

Logbook.md

File metadata and controls

4607 lines (3514 loc) · 277 KB

2024-09-06

Conformance testing of external implementations.

The new peras-conformance-test executable runs the quickcheck-dynamic tests against external (e.g., non-Haskell) implementations of the Peras protocol. It writes JSON requests to a pipe and reads the JSON response from another pipe.

The reference/prototype implementation of Peras has been packaged as peras-simulation-pipe for use with this external tester.

See Conformance for details.

2024-07-26

Adversarial behaviors for peras-markov

The Markov-chain simulation now supports "mix and match" configuration of adversarial behavior via a YAML input file.

  • Adversary's voting
    • NeverVote: the adversary never votes for a block.
    • AlwaysVote: the adversary always votes honestly.
    • VoteForAdversary: the adversary only votes for blocks on their preferred chain.
  • Whether the adversary keeps their chain private
    • NeverReveal: the adversary keeps their chain private.
    • AlwaysReveal: the adversary honestly reveals their chain.
    • RevealIfLonger: the adversary only reveals their chain to honest parties if it is weightiest chain.
  • Whether the adversary adopts the honest chain
    • NeverAdopt: the adversary never adopts the honest chain.
    • AdoptIfLonger: the adversary adopts the honest chain if it is weightiest.
  • "Split brain"
    • NoSplitting: the honest and adversarial nodes can communicate with one another throughout the simulation.
    • MkAdverseSplit Slot Slot: the honest and adversarial nodes do not communication with one another during the specified (inclusive) slot interval.

Next steps:

  • unit tests
  • example scenarios
  • split-brain analysis

2024-07-25

Dynamic testing of peras-markov

Using quickcheck-dynamic for the Markov-chain simulator peras-markov posed a challenge because, for efficiency, Peras.MarkovSim.Types implements a minimalistic verson of the Peras protocol: it omits hashes, represents certificates solely by their round number, represents that chain by its length and prefix, and handles votes implicitly. Nevertheless, it was possible to test this in the quickcheck-dynamic framework by using Peras.Prototype as the StateModel and then projecting that state model into the expected RunModel for peras-markov for the postcondition tests. In order to control the transition of the Markov model, the perform function passes certainties instead of probabilities to the block-creation and fetching functions.

Several compromises were made in this testing because the Markov model does not explicitly represent votes and it only treats honest vs adversarial chains. Nevertheless, the dynamic tests verify the Markov chain's soundness. Additional static, unit tests are needed to fully cover the Markov chain's modeling of adversarial scenarios. (Note that the Markov-chain results have already been successfully compared to analytic computations when such computations are possible.)

In retrospect, this whole dynamic-testing process would have easier if we had first created a formal model of the Markov-chain transitions and defined a projection operator from the Peras protocol to the Markov chain. It might have even been possible to prove the soundness of the Markov model.

2024-07-25

Providing reports in PDF format

Looking at ways to generate PDF from docusaurus website, https://github.com/jean-humann/docs-to-pdf looks promising, but it looks like it works by browsing HTML which is annoying given that the website is currently private. Then looking at https://github.com/signcl/docusaurus-prince-pdf?tab=readme-ov-file which seems to work offline, but requires some pricey software. Ultimately, tried docs-to-pdf but it does not work with file:// based URLs and the result it produces is not great.

So best option seems to be good old pandoc, possibly using some filtering on input to make it better. After a while, I managed to get Tech report in PDF published to staging web site: https://peras-staging.cardano-scaling.org/tech-report-2.pdf and added some links to the reports cover page.

Discussing CIPs w/ Consensus TWG

  • Details of the process w/in Intersect are still being worked out
  • CIP editors suggested we propose a CIP for "onboarding" consensus project w/in CIPs, which is not the case already. Ledger and plutus already enrolled in the process and could be role models
  • we decided the Consensus TWG would start drafting such a CIP and push it for adoption w/o waiting for other committees decision

2024-07-24

Monthly demo and review

Here are the slides and the recording of Peras's monthly demonstration review, which was well attended (particularly by the the networking and consensus teams). The key questions and points of discussion follow:

  • There was a lot of interest in understanding the nature of the Byzantine pre-agrement part of the "alpha" version of peras.
    • What does it add to the "pre-alpha" version?
    • How long does the pre-agreement phase last.
    • What happens when there is no agreement?
    • The caveat is that what is outlined in the research paper is subject to adjustment/revision.
  • Cooldown
    • What is are the triggering conditions or threshold for entering cooldown?
    • There was concern about the reputational risk of mistakenly entering cooldown.
    • What happens to the protocol in cases of "force majeure"?
  • Discussion and explanation of how Peras interacts with forking (honest or adversarial).
  • How are terms like "guarantee", "settlement", "finality", "fast", etc. defined?
    • What does one mean by "fast"?
    • What does one measure Peras against?
  • Are the metrics real world and computed probabilities applicable to the "real world"?
    • LD50 approach: Given specific adverse conditions, how long would one have to run Peras for there to be a 50-50 chance of a failure?
  • There was concern about the security parameter increasing.
    • The origins of the 2160-block Praos security parameter were discussed.
    • Changing that could be a mess for cardano-node implementation and testing.
    • Although the Peras security parameter should in principle be larger because of the boost, in practice the Praos parameter was set so generously large that Peras may not require it being increased.
  • We discussed at length the communication challenges around Peras.
    • Terms and metrics need to be defined precisely in discussions.
    • Adversarial scenarios can be complex and must also be discussed precisely.
    • Appropriate setting of parameters somewhat depends upon stakeholder use cases.
    • There may be two camps regarding Peras: faster finalization vs lower resource consumption.
  • We discussed the approach used in the Markov-chain simulation.
    • Neil has done related work and may undertake more.
  • Finally, we discussed the recent CrowdStrike debacle.

2024-07-23

Computation of margin and reach

Margin is the advantage of the penultimate tine of the blocktree and reach is the maximum advantage in the blocktree, where advantage is the excess weight of a tine relative to the public blocktree. The Markov-chain simulation can now compute this for the adversarial scenarios it models.

$ peras-markov margin-reach --help

Usage: peras-markov margin-reach [--epsilon DOUBLE] [--slots NATURAL]
                                 [--adversarial-stake FRACTION]
                                 --param-file FILE [--out-file FILE]
                                 [--progress]

  Compute the probability distribution of the margin and reach for a one-slot
  diffusion time.

Available options:
  --epsilon DOUBLE         Threshhold for discarding small probabilities.
                           (default: 1.0e-30)
  --slots NATURAL          Number of slots to simulate. (default: 1000)
  --adversarial-stake FRACTION
                           Fraction [%/100] of adversarial stake.
                           (default: 5.0e-2)
  --param-file FILE        Path to input YAML file containing the Peras protocol
                           parameters.
  --out-file FILE          Path to output TSV file containing the simulation
                           results. (default: "/dev/stdout")
  --progress               Show the progress of the simulation.
  -h,--help                Show this help text

The two plots below display margin and reach for example Peras protocol parameters at 20.0% and 22.5% adversarial stake.

  • The gaps (every 600 slots) in margin plot result from the boosts that typically occur ever 150 slots.
  • Note how the 22.5% adversary case suffers from significantly lower reach compared to the 20.0% adversary case.
  • Similarly, reach trends larger (and with smaller boost probability at 10, which is the boost intensity) in the 22.5% adversary case.
Margin Reach
Example margins at 20.0% and 22.25% adversarial stake Example reaches at 20.0% and 22.25% adversarial stake

2024-07-19

Publishing web site

Worked on publishing a private (password-protected) version of Peras' website for ease of review and feedback from various teams, as we are planning to make it public in the forthcoming weeks, as part of a communication campaign around Peras. It turned out to be surprisingly annoying as by default, GH does not allow such private sites with GH pages. I ended up doing the following:

  • Create a small VM on GCP (see the terraform descriptor with a fixed IP address
  • Assign the peras-staging.cardano-scaling.org domain name to this IP manually
  • Configure this machine to run nginx with the website using Let's Encrypt using propellor
  • Create a job on the publish-site branch using rsync to push the built docusaurus website to this machine
  • Configure a staging environment with the secrets needed to do so

This feels quite clunky but hopefully is a temporary solution. Another, possibly simpler, approach would have been to pack the website in a docker image along with an nginx server then deploy this container to cloud run using a similar technique than we use for the https://peras-simulation.cardano-scaling.org service.

Also added some documentation to the simulator's UI.

2024-07-10

Healing time

Here are empirical results for the scaling behavior for the healing time, which determines the chain-ignorance parameter $R$. We had vaguely conjectured that it scales as the square of the boost, but were unsure. Here are some analytic results that indicate that it scales super-exponentially. Perhaps that makes sense because the number of slots needed to overcome the boost is related to the asymptotic behavior of the gamma function.

Consider a 5% active slot coefficient and 20% adversarial stake. The probability of not overcoming a boost $B = 1$ by slot 1000 happens to be 4.73e-6. The table below shows how many slots one has to wait until achieving a similar probability at higher boosts $B$.

Boost B Slots for Healing
1 1000
10 1551
100 5731
1000 39762

Change the adversarial stake to 49% and consider not overcoming a boost B = 1 by slot 43200: the probability here is 0.178. The table below shows how many slots one has to wait until achieving a similar probability.

Boost B Slots for Healing
1 43200
10 59900
100 189000
1000 1227000

Is there theoretical guidance for setting the boost $B$? Or is there an intuitive estimate for an appropriate boost strength? It looks like $B = 10$ only modestly increases the healing time, but as the boost gets larger, the healing time increases disproportionately.

Is this analysis even relevant to setting $B$? It was inspired by the following comment in the paper. The worry is that the parties (honest and adversarial) would boost an adversarial block and that a competing fork of honest blocks would have to overcome that.

During the initial “healing” phase of the cooldown period, parties continue with standard Nakamoto block creation until the potential advantage of $B$ that the adversary could gain with a certificate is neutralized.

2024-07-08

Adversarial testing for Peras

  • Four posssible areas
    1. Equivocations
      • blocks
      • votes
      • certificates
    2. Delayed receipt
      • chains
      • votes
    3. Non-receipt
      • chains
      • votes
    4. Invalid cryptographic proofs, garbled data, etc.
  • We can test for the first three in the protocol conformance test, but should wait on the node conformance test for the fourth.
  • Approach
    • Equivocation
      • Update Agda executable spec to handle equivocation.
      • Add action(s) for receipt of equivocated information (block/vote/certificate).
      • Add generators that use the model state to create equivocated information.
    • Delayed receipt or non-receipt (= non-prompt delivery)
      • Add action for non-prompt delivery .
      • Add generators for creating non-prompty chains and votes.
  • Other discussion
    • Negative tests are not generally useful for protocol state-machine testing because protocols do not through error messages.
    • No blacklisting of nodes in current specification, so this does not need testing.
    • Consider having separate soundness proofs for an honest vs a dishonest environment.

Determinism in Peras

  • The protocol in the Peras paper is not deterministic in at least these areas:
    • Selection of preferred chain when there is a tie.
    • Sequencing of operations.
  • The executable specification necessarily become deterministic.
  • QuickCheck Dynamic could handle non-determinism (somewhat awkwardly) through these mechanisims:
    • Have the state model be aware on non-determinism and track it.
    • Use symbolic variables to track and account for the presence of non-determism.
  • However, for Peras conformance tests we want determinism:
    • This has to be specified in a way that different implementation languages can handle easily.
    • This needs to be documented in the version of the specification realized for conformance tests.
    • The latest version of the the specification in the tech report adds rules to force determinism:
      • Choose the preferred chain whose tip's block hash is smallest.
        • QUESTION: Does this open up an oportunity for grinding attacks?
      • Fetch, then create blocks, then vote.
  • The non-determinisim in the fundamental protocol can be identified via completess proofs.
    • The proof would be conditioned upon the existence of functions that make the executable specification deterministic: "completeness up to x".
    • For example, "for all possible functions that select the preferred chain among chains of equal length, the test specification is complete".
    • An instantiation of the executable specification would use a particular tie-breaking function: i.e., implementations would not be free to break ties differently.
    • The proof would explicitly identify all areas of non-determinism.

2024-07-04

Fixing conformance tests

Started tracking the reason why our first conformance test is failing. The test reports that the node has voted where it should not have.

  • I have added traces on both sides to check the round number and they are identical
  • The Test module does not call the tickNode function, only the voting function which always vote. Moved the guard from tickNode to voting function which now takes a slot and not a round. This makes the logic in tickNode more uniform and simple, but the test still fails
  • Seems like we don't update the clock in the Model ?? => :no: In the perform of the test, we first increase the clock before choosing to vote or not. This is consistent with the postcondition where we check whether or not we should vote according to the destination state, not the initial state.
  • Tried to remove shrinking which might have some "side-effects" but adding noShrinking to the property does not work, I still see Assertion failed (after 6 tests and 2 shrinks):

So it appears the issue stems from the voting logic in the model not taking into account the blocks. Pursued some lead but it was a red herring: The block selection in Model takes into account perasT but not the implementation (it only checks perasL), however perasT = 0 so this is not the source of the problem.

Tracing the vote creation process in the model shows the preferredChain to be always empty, even though some block has been diffused.

This is obviously wrong:

Checking blocks in [[MkBlock {slotNumber = MkSlotNumber {getSlotNumber = 2}, creatorId = -4, parentBlock = "0300040200020201", certificate = Nothing, leadershipProof = "0402040302000303", signature = "0101010200010404", bodyHash = "0303020400020002"},MkBlock {slotNumber = MkSlotNumber {getSlotNumber = 1}, creatorId = -4, parentBlock = "0000000000000000", certificate = Nothing, leadershipProof = "0101000003010400", signature = "0300040200020201", bodyHash = "0201000302040101"}],[MkBlock {slotNumber = MkSlotNumber {getSlotNumber = 1}, creatorId = -4, parentBlock = "0000000000000000", certificate = Nothing, leadershipProof = "0101000003010400",signature = "0300040200020201", bodyHash = "0201000302040101"}],[]], preferred = []

The preferred chain should not be empty in this case!

Got to the bottom of one issue: the maximumBy function had its branches inverted which lead to it being actually minimumBy! Fixing that leads me into a different issue, with an interesting failure:

           -- round: 1
           --      got: [Vote {round     = 1
                               creator   = 1
                               blockHash = "0602080800000107"
                               proofM    = "8cacfd4bd1f4a6b1"
                               signature = "bad333f8e2548088"}]
           -- expected: [Vote {round     = 1
                               creator   = 1
                               blockHash = "0605000300020205"
                               proofM    = "8cacfd4bd1f4a6b1"
                               signature = "cd9830e7358506cd"}]

so it's probably now just a problem in choosing which of 2 chains are longest in case of ties on the weight.

Haskell's maximumBy function favors the last element in a list:

> maximumBy (compare `on` snd) [(1,1), (2,1), (3,1)]
(3,1)

which is consistent with Peras.Util's :

> U.maximumBy (0,0) (compare `on` snd) [(1,1), (2,1), (3,1)]
(3,1)

Finally got to the bottom of it: known chains are defined in the implementation as

    let chains' = chains `Set.union` newChains

whereas in Model we just use a [Chain] so this is an ordering issue.

We can't require the implementation to behave like a list, but we can refine the comparison function used to find the best chain, so I resorted to implementing a specific comparison function to ensure chains comparison matches, enforcing an ordering when chaing weights are equal: In case of match for weight, compare slots, then creatorId, then block hash

To enable coverage, I added the following to cabal.project.local (coverage is not supported for internal libraries):


package *
  coverage: True
  library-coverage: True

package dns
  coverage: False
  library-coverage: False

Then running the test generates coverage report which shows that Voting implementation module is fully covered by our conformance test 🎉 as well as the Model code.

Important

Some takeaways from this experiment:

  • Reimplementing even basic functions on the model side should be done with great care!
  • We need the soundness proof to guarantee the behaviour of the model is consistent with the formal specification
  • Writing a test model and running it is very useful to identify blindspots, implicit assumptions, and ensure proper coverage of implementation

2024-07-03

Playing with faster convolutions

We've been discussing ways to improve both the accuracy and the speed of ΔQ models computation and I have played a bit with the idead of using existign fast linear algebra operations to do so. I did a little spike specifically targeted on computing convolution of 2 Double-valued vectors, comparing 3 different approaches:

  1. Direct convolution computation using 0-padded vectors and standard vector operations
  2. FFT-based convolution using a fork of a straightforward Haskell implementation of FFT and DFT
  3. hmatrix-based convolution

Quick benchmarks results are clear: hmatrix is 100x faster than naive convolution and 2000x faster than FFT one.

benchmarking Direct convolution (100 elements)
time                 2.029 ms   (2.023 ms .. 2.039 ms)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 2.038 ms   (2.032 ms .. 2.052 ms)
std dev              30.91 μs   (12.10 μs .. 48.87 μs)

benchmarking hmatrix convolution (100 elements)
time                 19.09 μs   (19.06 μs .. 19.11 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 19.10 μs   (19.07 μs .. 19.11 μs)
std dev              58.11 ns   (40.92 ns .. 81.04 ns)

benchmarking FFT convolution (100 elements)
time                 47.46 ms   (47.38 ms .. 47.55 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 47.49 ms   (47.40 ms .. 47.83 ms)
std dev              314.3 μs   (59.48 μs .. 579.6 μs)

Even for relatively large number of points, matrix convolution is fast enough for our purpose:

benchmarking Matrix convolution/hmatrix convolution (100 elements)
time                 18.00 μs   (17.96 μs .. 18.08 μs)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 18.43 μs   (18.28 μs .. 18.57 μs)
std dev              522.4 ns   (475.5 ns .. 568.1 ns)
variance introduced by outliers: 31% (moderately inflated)

benchmarking Matrix convolution/hmatrix convolution (1000 elements)
time                 897.3 μs   (893.0 μs .. 903.9 μs)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 893.4 μs   (891.2 μs .. 897.1 μs)
std dev              9.988 μs   (7.214 μs .. 15.93 μs)

benchmarking Matrix convolution/hmatrix convolution (10000 elements)
time                 93.03 ms   (81.34 ms .. 109.7 ms)
                     0.959 R²   (0.908 R² .. 0.989 R²)
mean                 102.4 ms   (93.90 ms .. 112.9 ms)
std dev              15.28 ms   (11.56 ms .. 20.99 ms)
variance introduced by outliers: 43% (moderately inflated)

It seems straightforward to implement a hmatrix-based backend for ΔQ that should provide fast modelling capabilities, possibly resorting to even more efficient low-level implementations of convolution if needed to handle very large models. Another interesting option would be to use something like Futhark which can blend seamlessly with Haskell code and provide GPU-based optimised vector computations.

This experimental code is available in PR #165.

2024-06-21

Fixing docker build

  • The docker build stopped working after adding dependencies to cardano-crypto-class because it has native dependencies to libsodium, libblst and libsecp256k1.
  • Added the dependencies to the build image following instructions from cardano-node wiki documentation

2024-07-02

Edinburgh Workshop report & short-term planning

  • We have completed and published the workshop report
  • In the wake of those discussions, we have created a few GitHub issues to short-term actions pursuant to this quarter's goals:
    • 164 Define a communication plan for Peras
    • 163 Conformance model-based tests for voting
    • 162 Review adversarial scenario analysis
    • 161 Complete technical report #2
    • 160 Make code repository public
    • 159 Publish full web site
    • 128 UI improvements for simulator/visualizer
    • 98 Post-retrospective actions
    • 97 Write a draft CIP

2024-06-20

New adversarial scenarios

  • Attack on common-prefix

Voting process as defined in Leios (and Mithril)

In the Leios paper, stake-based voting is defined similarly to how it's done in Mithril:

VoteCount(𝑥, 𝑠):
  (1) 𝑦 := 𝐻 (𝑥)
  (2) 𝑐𝑛𝑡 := Sum 𝑖 ∈ [𝑚](𝐻 (𝑦, 𝑖) > 𝑇 (𝑠))
  (3) return 𝑐𝑛𝑡;

where $T$ maps a probability function $1 - (1 - f)^{s}$ for some stake ratio $\sigma$ to an integer.

In other words, just like we do in Mithril, a voter runs a lottery by hashing some base value (VRF output) with an index $m$ times and counting the number of "winning" indices. The underlying check is exactly the same as the leader election, comparing a ratio (the random value derived from the hash) to the function $1 - (1 - f)^{\sigma}$.

I had a hard time understanding the comment in the previous functions (note the comment from the ledger code has been copied verbatim to the Mithril STM library1) as it details how the above function is transformed to provide a more efficient computation.

Given $q = 1 / (1 - p)$ and $c = \log(1 - f)$, then

$$ \begin{array}[rl] & p < 1 - (1 - f)^{\sigma} \\ \Leftrightarrow & 1 - p \geq (1 - f)^{\sigma} \\ \Leftrightarrow & \frac{1}{1-p} < (1 - f)^{-\sigma} \\ \Leftrightarrow & q < e^{\log(1 - f)\times (-\sigma)} \\ \Leftrightarrow & q < e^{-\sigma \times c} \\ \end{array} $$

The $q$ can be efficiently computed as the fraction $vmax / (vmax -v )$ and the exponentiation approximated through a Taylor series.

Better vote diffusion ΔQ model

  • Building on previous work, we want to model the expected delay to reach a quorum. We know the newest version of the ΔQ library based on polynomials is computationally intensive and cannot handle this kind of modelling
  • Using the old version of ΔQ based on numerical sampling, we introduce a NToFinish combinator to model the fact we only take into account some fraction of the underlying model. In our case, we model the case where we only care about the first 75% of votes that reach a node.
  • The model works as follows:
    • We start from a base uniform distribution of single MTU latency between 2 nodes, assuming a vote fits in a single TCP frame
    • We then use the expected distribution of paths length for a random graph with 15 average connections, to model the latency distribution across the network
    • We then apply the NToFinish 75 combinator to this distribution to compute the expected distribution to reach 75% of the votes (quorum)
    • An important assumption is that each vote diffusion across the network is expected to be independent from all other votes
  • We also want to take into account the verification time of a single vote, which we do in 2 different ways:
    • One distribution assumes a node does all verifications sequentially, one vote at a time
    • Another assumes all verifications can be done in parallel
    • Of course, the actual verification time should be expected to be in between those 2 extremes
  • Verification time assumption is based on some experiments documented in Vote benchmarks where a single vote's verification time is expected to be about 160μs.

This yields the following graph:

Vote diffusion

This graph tends to demonstrate vote diffusion should be non-problematic, with a quorum expected to be reached in under 1s most of the time.

2024-06-19

New adversarial scenarios

  • No healing from adversarial boost (i.e., "healing time")
  • No honest block (i.e. "chain-quality time")

2024-06-18

New adversarial scenarios

  • No honest quorum in round
  • Adversarial quorum
  • No certificate in honest block
  • Adversarial chain receives boost
    • Variant 2

Votes details

  • Started working on detailing the voting algorithm, data structures, and certificates process.
  • Interestingly, things are more complicated when one looks at them closely enough.
  • I want to provide some detailed code and benchmarks, so I had to look at how VRF voting is done within the node, and import actual VRF and KES functions from cardano-base
  • Currently a bit confused by the committee election (sortition) algorithm, but I found a good and simple explanation in the Algorand paper so will just use that for the time being
  • Also need to detail the stake-based certificate construction algorithm in ALBA

Modeling vote diffusion

  • Worked w/ Brian on modeling vote diffusion with old version of ΔQ library.
  • We reused the data from previous model about the 1-hop delay and distance distribution in small graph assuming degree distribution of 15
  • Assume a vote fits in a single MTU so there's a single transmission for each vote
  • Also assumes that verifying a single vote (VRF + KES) takes 1.3ms (from cardano-base benchmarks), so verifying 1000 votes takes 1.3s and on average a vote is verified after 0.65s
  • We consider the delay probability distribution function to represent the fraction of votes that are available after some delay, taking into account the topology of the network and validation time.
  • We assume that network contention does not impact votes diffusion
  • This model (see picture) below seems to demonstrate vote diffusion and validation can happen quite quickly. The 75% line represents the expected time to reach a quorum which is around 1.5s.
  • We note that given round length will be in the order of 120-150 seconds, which provides ample time for voting to happend and quorum to be reached
    • Note that it could quite interesting to validate this empirically using netsim or PeerNet

Vote diffusion & validation

2024-06-17

New adversarial scenario

The new adversarial scenario in the 2nd tech report partially answers the question "What is the probability that an adversarial chain receives the next boost?". (Work is in progress on a more sophisticated analysis.)

Per-round probability of dishonest boost when the active-slot coeffficient is 10%.

ΔQ Modeling for Votes

Quickly tried to replace custom SimplePolynomials with poly package but of course it does not have convolve function that does the core of the logic to compute the convolution of polynomials over 2 intervals in 3 pieces. It has however DFT computations that could be useful as it seems more efficient to compute convolution of polynomials with FFT but I am out of my depth here and would need more research.

Decision: For the time being, we are going to use the existing numerical MC approximation based library.

Troubleshooting ΔQ

Plotting ΔQ distribution for Peras currently takes forever, seemingly when computing nWayConvolve for a "large" number of different possible degrees. Reducing the number to 2 produces a graph, going to try with increasing number of iteams.

Computing ΔQ basic diffusion model with 3 hops takes 20s. With 4 hops, it goes up to:

cabal run
553,97s user 237,95s system 46% cpu 28:39,40 total

Sampling size in Utilities plotting module does not have an impact on execution time.

Trying to run ΔQ plotting with profiling on to see if there's something obvious we could fix relatively easily. Unsurprisingly, most of the time is spent in constructing polynomials during convolution: It blows up as we need to construct cartesian product of lists multiple times.

  multihops                         Main                                                      plot-dqsd.hs:45:1-53                                                          973           1    0.0    0.0    99.8   99.9
   multiHop                         Main                                                      plot-dqsd.hs:43:1-45                                                         1011           3    0.0    0.0    99.8   99.9
    <+>                             PWPs.IRVs                                                 src/PWPs/IRVs.hs:317:1-54                                                    1129           3    0.0    0.0    99.8   99.9
     makePDF                        PWPs.IRVs                                                 src/PWPs/IRVs.hs:(114,1)-(115,33)                                            1131           6    0.0    0.0     0.0    0.0
     <+>                            PWPs.Piecewise                                            src/PWPs/Piecewise.hs:(271,1)-(276,35)                                       1130           3   37.4   51.8    99.8   99.9
      object                        PWPs.Piecewise                                            src/PWPs/Piecewise.hs:57:5-10                                                1136    85592371    1.3    0.0     1.3    0.0
      basepoint                     PWPs.Piecewise                                            src/PWPs/Piecewise.hs:56:5-13                                                1137    70365240    2.8    0.0     2.8    0.0
      plusPD                        PWPs.PolyDeltas                                           src/PWPs/PolyDeltas.hs:(60,1)-(63,32)                                        1154    59194673   34.0   28.1    34.0   28.1
      getPieces                     PWPs.Piecewise                                            src/PWPs/Piecewise.hs:68:30-38                                               1132      146481    0.0    0.0     0.0    0.0
      convolvePolyDeltas            PWPs.PolyDeltas                                           src/PWPs/PolyDeltas.hs:(168,1)-(190,80)                                      1135       61137    0.1    0.1     0.8    0.9
       aggregate                    PWPs.PolyDeltas                                           src/PWPs/PolyDeltas.hs:(150,1)-(156,50)                                      1138      128436    0.0    0.1     0.0    0.1
       zeroPoly                     PWPs.SimplePolynomials                                    src/PWPs/SimplePolynomials.hs:47:1-21                                        1148       72798    0.0    0.0     0.0    0.0
        makePoly                    PWPs.SimplePolynomials                                    src/PWPs/SimplePolynomials.hs:44:1-21                                        1149       72798    0.0    0.0     0.0    0.0
       scalePD                      PWPs.PolyDeltas                                           src/PWPs/PolyDeltas.hs:(116,1)-(117,27)                                      1147       24411    0.0    0.0     0.0    0.0
        scalePoly                   PWPs.SimplePolynomials                                    src/PWPs/SimplePolynomials.hs:78:1-43                                        1150       24411    0.0    0.0     0.0    0.0
       shiftPoly                    PWPs.SimplePolynomials                                    src/PWPs/SimplePolynomials.hs:(197,1)-(204,64)                               1151       24411    0.1    0.1     0.1    0.1
        scalePoly                   PWPs.SimplePolynomials                                    src/PWPs/SimplePolynomials.hs:78:1-43                                        1152       24411    0.0    0.0     0.0    0.0
       convolvePolys                PWPs.SimplePolynomials                                    src/PWPs/SimplePolynomials.hs:(154,1)-(192,121)                              1139       16400    0.5    0.5     0.6    0.6
        scalePoly                   PWPs.SimplePolynomials                                    src/PWPs/SimplePolynomials.hs:78:1-43                                        1141      121416    0.0    0.0     0.0    0.0
        zeroPoly                    PWPs.SimplePolynomials                                    src/PWPs/SimplePolynomials.hs:47:1-21                                        1142       85557    0.0    0.0     0.0    0.0
         makePoly                   PWPs.SimplePolynomials                                    src/PWPs/SimplePolynomials.hs:44:1-21                                        1143       85557    0.0    0.0     0.0    0.0
        makeMonomial                PWPs.SimplePolynomials                                    src/PWPs/SimplePolynomials.hs:66:1-82                                        1144       79592    0.0    0.0     0.0    0.0
         zeroPoly                   PWPs.SimplePolynomials                                    src/PWPs/SimplePolynomials.hs:47:1-21                                        1145       11911    0.0    0.0     0.0    0.0
          makePoly                  PWPs.SimplePolynomials                                    src/PWPs/SimplePolynomials.hs:44:1-21                                        1146       11911    0.0    0.0     0.0    0.0
      mergePieces                   PWPs.Piecewise                                            src/PWPs/Piecewise.hs:(77,1)-(85,64)                                         1133       48825   17.4   19.2    23.5   19.2
       object                       PWPs.Piecewise                                            src/PWPs/Piecewise.hs:57:5-10                                                1153   118291696    6.1    0.0     6.1    0.0
       getPieces                    PWPs.Piecewise                                            src/PWPs/Piecewise.hs:68:30-38                                               1134       48825    0.0    0.0     0.0    0.0
      makePoly                      PWPs.SimplePolynomials                                    src/PWPs/SimplePolynomials.hs:44:1-21                                        1156           3    0.0    0.0     0.0    0.0

Refining the profiling to get more details on plusPD, gives:

     <+>                                        PWPs.Piecewise                                            src/PWPs/Piecewise.hs:(271,1)-(276,35)                                       1160           3   36.9   51.8    99.8   99.9
      object                                    PWPs.Piecewise                                            src/PWPs/Piecewise.hs:57:5-10                                                1166    85592371    1.2    0.0     1.2    0.0
      basepoint                                 PWPs.Piecewise                                            src/PWPs/Piecewise.hs:56:5-13                                                1167    70365240    2.8    0.0     2.8    0.0
      plusPD                                    PWPs.PolyDeltas                                           src/PWPs/PolyDeltas.hs:(60,1)-(63,32)                                        1194    59194673    5.7    5.6    34.3   28.1
       PWPs.SimplePolynomials.addPolys          PWPs.SimplePolynomials                                    src/PWPs/SimplePolynomials.hs:100:1-8                                        1196    29621749   16.9   13.1    28.6   22.5
        PWPs.SimplePolynomials.trimPoly         PWPs.SimplePolynomials                                    src/PWPs/SimplePolynomials.hs:57:1-8                                         1197    29621749   11.8    9.4    11.8    9.4

Top cost centres are

COST CENTRE                     MODULE                 SRC                                     %time %alloc

<+>                             PWPs.Piecewise         src/PWPs/Piecewise.hs:(271,1)-(276,35)   36.9   51.8
mergePieces                     PWPs.Piecewise         src/PWPs/Piecewise.hs:(77,1)-(85,64)     17.2   19.2
PWPs.SimplePolynomials.addPolys PWPs.SimplePolynomials src/PWPs/SimplePolynomials.hs:100:1-8    16.9   13.2
PWPs.SimplePolynomials.trimPoly PWPs.SimplePolynomials src/PWPs/SimplePolynomials.hs:57:1-8     11.9    9.4
object                          PWPs.Piecewise         src/PWPs/Piecewise.hs:57:5-10             7.6    0.0
plusPD                          PWPs.PolyDeltas        src/PWPs/PolyDeltas.hs:(60,1)-(63,32)     5.7    5.6
basepoint                       PWPs.Piecewise         src/PWPs/Piecewise.hs:56:5-13             2.8    0.0

It seems clear that polynomials computation are the bottleneck so a more efficient library like poly would perhaps help?

2024-06-14

Markov-chain simuations

Full-coverage property-based tests we added for the Markov-chain simulations. In particular, the simulation results are now checked against extact analytic results for the expected probability distributions.

More investigations into ΔQ upgrade

Trying to plot a simple graph for the block diffusion which combines header + block diffusions, yields something which does not make sense either Even the oneMTU simple distribution is wrongly plotted: The y scale stops below 0.3 and the x-scale goes up to 1.2 while the

The CDF is computed correctly:

  CDF
    ( Pieces
        { getPieces =
            [ Piece{basepoint = 0.0, object = Ph (Poly [0.0])}
            , Piece{basepoint = 0.3333333333333333, object = H 0.0 1.2e-2}
            , Piece{basepoint = 0.3333333333333333, object = Ph (Poly [1.2e-2])}
            , Piece{basepoint = 0.6666666666666666, object = H 1.2e-2 6.9e-2}
            , Piece{basepoint = 0.6666666666666666, object = Ph (Poly [6.9e-2])}
            , Piece{basepoint = 1.0, object = H 6.9e-2 0.268}
            , Piece{basepoint = 1.0, object = Ph (Poly [0.268])}
            ]
        }
    )

so the problem seems to be in the plotting functions?

Looking at asDiscreteCDF function which is used in the plotting to compute points. The CDF I get from it as a last sequence which goes beyond 1 and has constant value.

It feels like if the sequence of pairs are inverted: The x value should be the (accumulated) time whereas the y value should the probability mass, but it looks like makeCDF is doing the opposite:

cd =
  Pieces
    { getPieces =
        [ Piece{basepoint = 0.0, object = Ph (Poly [0.0])}
        , Piece{basepoint = 0.3333333333333333, object = H 0.0 1.2e-2}
        , Piece{basepoint = 0.3333333333333333, object = Ph (Poly [1.2e-2])}
        , Piece{basepoint = 0.6666666666666666, object = H 1.2e-2 6.9e-2}
        , Piece{basepoint = 0.6666666666666666, object = Ph (Poly [6.9e-2])}
        , Piece{basepoint = 1.0, object = H 6.9e-2 0.268}
        , Piece{basepoint = 1.0, object = Ph (Poly [0.268])}
        ]
    }

Turns out the way to compute a distribution fromQTA has swapped the two components of the pair: It's the time first and then probability. This makes sense but of course lead to confusion when upgrading.

Computing the distribution for a complicated expression does not crash after swapping x/y but it loops.

OK, the problem is still there but with different values:

plot-dqsd: Invalid polynomial interval width: (0.0,8.4e-2,0.367,0.367)
CallStack (from HasCallStack):
  error, called at src/PWPs/PolyDeltas.hs:167:32 in dqsd-piecewise-poly-4.0.0.0-inplace:PWPs.PolyDeltas
  convolvePolyDeltas, called at src/PWPs/PolyDeltas.hs:189:23 in dqsd-piecewise-poly-4.0.0.0-inplace:PWPs.PolyDeltas
  convolveIntervals, called at src/PWPs/Piecewise.hs:276:43 in dqsd-piecewise-poly-4.0.0.0-inplace:PWPs.Piecewise
  <+>, called at src/PWPs/IRVs.hs:317:26 in dqsd-piecewise-poly-4.0.0.0-inplace:PWPs.IRVs
  <+>, called at src/DeltaQ/PWPs.hs:58:14 in dqsd-piecewise-poly-4.0.0.0-inplace:DeltaQ.PWPs
  convolve, called at src/DeltaQ/Model/DeltaQ.hs:202:28 in dqsd-piecewise-poly-4.0.0.0-inplace:DeltaQ.Model.DeltaQ
  $dmnWayConvolve, called at src/DeltaQ/PWPs.hs:54:10 in dqsd-piecewise-poly-4.0.0.0-inplace:DeltaQ.PWPs
  nWayConvolve, called at plot-dqsd.hs:48:17 in main:Main

Feels like the disaggregate function might the actual culprit, where it creates a zero-length interval for a Pd polynomial? the problem is in the interplay of disaggregate and convolveIntervals In the latter, I added some trace and assertions that show we can produce a sequence of Pd for the same point, leading to this 0 interval issue

https://github.com/abailly-iohk/dqsd-piecewise-poly/blob/3fdab53911386c1641cf0acc1c8e4afda416a672/src/PWPs/PolyDeltas.hs#L183

Here is the data that's causing the duplicate points:

plot-dqsd: Duplicate basepoints generated in convolution: [(0.0,Pd (Poly [0.0])),(0.367,Pd (Poly [5.486968449931412e-2])),(0.367,Pd (Poly [0.0]))], lg=0.155, ug=0.15500000000000003, lf=0.212, uf=0.212

So it indeed seems this is a case of rounding issues with Doubles: The two sums should not be the same, but they end up being because some minute number gets dropped.

Tried to fix the issue in the aggregate function by removing duplications for the Pd case:

  describe "Aggregate" $ do
    it "discard 0 polynomial when points are equal" $ do
      let deltas1 =
            [ (1.0 :: Double, Pd (Poly [0.0]))
            , (2.0, Pd (Poly [1.0]))
            , (2.0, Pd (Poly [0.0]))
            ]
          deltas2 =
            [ (1.0 :: Double, Pd (Poly [0.0]))
            , (1.5 :: Double, Pd (Poly [2.0]))
            , (2.0, Pd (Poly [0.0]))
            , (2.0, Pd (Poly [1.0]))
            ]

      aggregate deltas1
        `shouldBe` [(1.0, Pd (Poly [0.0])), (2.0, Pd (Poly [1.0]))]
      aggregate deltas2
        `shouldBe` [(1.0, Pd (Poly [0.0])), (1.5 :: Double, Pd (Poly [2.0])), (2.0, Pd (Poly [1.0]))]

No more crash but it now loops.

2024-06-13

ΔQ Upgrade & Debugging

We want to model the votes diffusion and certificates construction impact using ΔQ, particularly to understand how long we can expect reaching a quorum would take. The dqsd-piecewise-poly library is expected to provide a more accurate way of modeling and computing CDF from ΔQ model, so I wanted to upgrade the existing code (in peras-delta-q) to this new version.

The library actually depends on (https://github.com/DeltaQ-SD/dqsd-classes) which supposedly provide an interface while keeping the piecewise polynomial implementation "hidden". As working with 2 foreign unpublished and unfinished libraries is cumbersome, I decided to

  1. Fork the ΔQ polynomial based computation from Peter Thompson.
  2. Incorporate the dqsd-classes package in order to make it self-standing, and implement Peras' model there
  3. "Vendor" the library in peras repository to be able to iterate faster
  4. Remove old copy-pasted code from peras-delta-q and only keep the Peras-relevant part

It's really unclear what's the relationship between the various modules and packages is. The interface has changed, there's no more funny symbols which is good, but the classes seeem a bit disconnected from the implementaiton.

I should probably use the plotCDF et al. functions from https://github.com/abailly-iohk/dqsd-piecewise-poly/blob/16d2c2f6913ffb85153d6cc6be0107d2c379342d/src/DeltaQ/Model/Utilities.hs#L62 which provides a higher-level interface than trying to do it myself?

I finally manage to get old ΔQ models (eg. basic diffusion of headers w/ and w/o certs) to compile with newest version of dqsd-poly but I got an error:

plot-dqsd: Invalid polynomial interval width
CallStack (from HasCallStack):
  error, called at src/PWPs/PolyDeltas.hs:175:34 in dqsd-piecewise-poly-4.0.0.0-inplace:PWPs.PolyDeltas

Added some more details about the error (eg. textual description of the problem) to understand where it comes from, input data should probably be sanitized. So the problem stemmed from initial (0,0) in the QTA list which is not needed anymore.

I managed to get a better callstack, sprinkling HasCallStack constraints up the call tree, and I have another error:

plot-dqsd: Invalid polynomial interval width: (0.0,2.0,4.05,4.05)
CallStack (from HasCallStack):
  error, called at src/PWPs/PolyDeltas.hs:172:32 in dqsd-piecewise-poly-4.0.0.0-inplace:PWPs.PolyDeltas
  convolvePolyDeltas, called at src/PWPs/PolyDeltas.hs:194:23 in dqsd-piecewise-poly-4.0.0.0-inplace:PWPs.PolyDeltas
  convolveIntervals, called at src/PWPs/Piecewise.hs:259:43 in dqsd-piecewise-poly-4.0.0.0-inplace:PWPs.Piecewise
  <+>, called at src/PWPs/IRVs.hs:317:26 in dqsd-piecewise-poly-4.0.0.0-inplace:PWPs.IRVs
  <+>, called at src/DeltaQ/PWPs.hs:58:14 in dqsd-piecewise-poly-4.0.0.0-inplace:DeltaQ.PWPs
  convolve, called at plot-dqsd.hs:29:7 in main:Main
  ⊕, called at plot-dqsd.hs:54:15 in main:Main

We spend some time with BB and JC debugging the error, going deeper into the library.

I finally get a failing test for mergePieces when one has more than 2 mergeable pieces in a row

The test says:

    it "keeps merging mergeable pieces" $ do
      let pieces :: Pieces Double (PolyDelta Double)
          pieces =
            Pieces
              [ Piece 1 (Pd (Poly [1]))
              , Piece 2 (Pd (Poly [1]))
              , Piece 3 (Pd (Poly [1]))
              , Piece 4 (Pd (Poly [2]))
              ]
      mergePieces pieces `shouldBe` Pieces [Piece 1 (Pd (Poly [1])), Piece 4 (Pd (Poly [2]))]

and of course it fails:

PWPs.Piecewise
  Merge pieces
    keeps merging mergeable pieces [✘]

Failures:

  test/PWPs/PiecewiseSpec.hs:14:26:
  1) PWPs.Piecewise, Merge pieces, keeps merging mergeable pieces
       expected: Pieces {
                   getPieces = [Piece {
                   basepoint = 1.0,
                   object = Pd (Poly [1.0])
                 }, Piece {
                   basepoint = 4.0,
                   object = Pd (Poly [2.0])
                 }]
                 }
        but got: Pieces {
                   getPieces = [Piece {
                   basepoint = 1.0,
                   object = Pd (Poly [1.0])
                 }, Piece {
                   basepoint = 3.0,
                   object = Pd (Poly [1.0])
                 }, Piece {
                   basepoint = 4.0,
                   object = Pd (Poly [2.0])
                 }]
                 }

  To rerun use: --match "/PWPs.Piecewise/Merge pieces/keeps merging mergeable pieces/" --seed 1252471448

I fixed the code to make the test pass, but the code still crashes.

I end up handling the zero-width interval case for polynomial segments explicitly in the code. This fixes the issue but lead to further errors down the line when displaying the CDF. I fixed those issues too, simplify replacing a >= sign with > and it finally generated a graph, which does not make sense.

2024-06-12

Discussing details of votes and certificates

  • ALBAs is current best choice for Certificates

    • Main question is Committee size, we can assume it will be in the order of the number of SPOs
    • Next step: Benchmark certificates size and proving/verifying time with different committee size and quorums
  • Votes

    • Stick with header-like construction
    • We should use a special prefix for VRF
    • $nonce = epoch nonce \Vert "vote" \Vert round$
    • we could change for blocks signing (nakamoto) to $epoch nonce \Vert "block" \Vert slot , but it is probably fine to keep it as is
  • ΔQ discussion

    • instead of a generic scenario, hardwire a particular scenario to get started
    • how to model a global behaviour?
    • model the impact of stake w/ distance -> define a scenario where we have different distribution of stakes for some number of upstream nodes with their distance as a random variable
  • We want to simulate preagreement step and the behaviour of the network with various "L common prefix" situations

    • assume validity and consistency in the preagreement
    • define adversarial strategy with private chain attack -> have a chain without L common prefix with positive margin
    • L ~ 60 slots (~ 3 blocks)
    • BA takes time, requires multiple rounds
    • T ~ 100s
    • U => 120-150

Markov-chain scenario for ambiguous candidate blocks

We began analyzing the scenario where an adversary maintains a fork that contains a candidate block for voting (i.e., at least $L$ slots old on its preferred chain). We use the Markov-chain simulation to examine the situation where, at the start of a new round, a block of age $L + U$ contains a certificate and an adversary builds their own chain upon that. If the adversary succeeds in building a block in the next $U$ slots, then there would be a candidate-block proposal both on the honest chain and the adversarial one.

In the preliminary results below (only lightly QA'ed), the round length is $U = 150 \text{slots}$, the block-selection offset is $L = 6 \text{slots}$, the honest party has 75% of the stake, the adversarial party has 25% of the stake, the active slot coefficient is 5%, and network diffusion occurs in one slot.

Outcome Probability
At least one adversarial candidate block & Adversary chain longer 3.07 %
At least one adversarial candidate block & Chains of equal length 3.14 %
At least one adversarial candidate block & Honest chain longer 78.64 %
No adversary candidate block 15.15 %

Hence, there is a 3% chance that some voters in the new round would prefer the block on one fork while others would prefer the block on the other fork.

Additional findings:

  • The symbolic computations are much slower than the numeric ones when the number of slots is high.
  • Convolution of probability densities can be used to convert from simulations that use a block-based clock to an equivalent simulation that uses a slot-based clock.
    • A block-based clock uses fewer computing resources.
    • A block-based clock also makes scenario initialization easier.
    • Some analyses are easier with a slot-based clock.

Formal specification in Agda

The IOG Agda prelude provides syntax for writing inference rules in the following style:

    data VoteInRound : RoundNumber  T  Type where

      Regular :  {r t} 
        let
          pref  = preferredChain t
          cert′ = latestCertSeen t
        in
        ∙ r ≡ roundNumber cert′ + 1       -- VR-1A
        ∙ cert′ PointsInto pref           -- VR-1B
          ───────────────────────────────
          VoteInRound (MkRoundNumber r) t

      AfterCooldown :  {r c t} 
        let
          cert⋆ = latestCertOnChain t
          cert′ = latestCertSeen t
        in
        ∙ c > 0
        ∙ r ≥ roundNumber cert′ + R       -- VR-2A
        ∙ r ≡ roundNumber cert⋆ + c * K   -- VR-2B
          ───────────────────────────────
          VoteInRound (MkRoundNumber r) t

2024-06-11

Haskell code for voting probabilities

See analytics/voting.ipynb for details of how to compute the probability that a party will be allocated a given number of votes.

Markov-chain computations of adversarial scenarios

The new peras-markov implements a framework for Markov-chain computations of adversarial scenarios. Two scenarios are included so far:

  • The adversary builds an alternative chain separate from the honest chain and then later reveals it.
  • The adversary aims to keep active two chains of nearly equal length.

The computations may be expressed as symbolic expressions, rational numbers, or real numbers.

The example below is for the scenario where the adversary builds a chain separately from the honest parties. The delta in the table is the length of the honest chain minus the length of the adversarial chain. We generate ten blocks and compute the probabilities, given that the honest party controls p fraction of the stake and the adversary controls q.

Delta Probability
-10 q¹⁰
-8 10⋅p⋅q⁹
-6 45⋅p²⋅q⁸
-4 120⋅p³⋅q⁷
-2 210⋅p⁴⋅q⁶
0 252⋅p⁵⋅q⁵
2 210⋅p⁶⋅q⁴
4 120⋅p⁷⋅q³
6 45⋅p⁸⋅q²
8 10⋅p⁹⋅q
10 p¹⁰

This framework can be extended with scenarios to compute finality probabilities, in order to compare Praos vs Peras. Boosts can be included in the analyses, too.

2024-06-11

Quviq Meeting

There are currently 3, slight incompatible, references:

  • Agda specification
  • Haskell Reference implementation
  • Research Paper -> underspecification, simplifying assumption

How to connect all? this can be done in 3 different ways:

  1. QC model (paper) -> align spec + code
  2. QC model aligned with code -> align spec
  3. QC model aligned with spec -> align code
  • Quviq team is going for 2, as it makes more sense than trying to understand the paper which should be the job of the R&D team
  • Current implementation is in Haskell for simplicity's sake, but it will be ported to Agda once mature enough
  • What's hard is not the Agda side, it is to have a runnable QC model
  • Focusing on the "node should vote right now", feeding it blocks and ticks

2024-06-10

Next steps with formal specification

Goal:

  • We want quickcheck tests for "arbitrary" implementations (eg. in various languages)

Questions:

  • Do we want an "Executable spec"?

    • ie. Agda code from which we could generate runnable Haskell
    • Observation from the past: the proofs might be too hard
    • Test language is sound w.r.t. full specification but not complete
    • Having an executbale specification gives you completeness + reference implementation for free
    • Andre's experience from Ledger: Just write an executable specification, leave the proofs for later
      • There's a central proof that's maintained over time
    • Pros: completeness, ref impl for free
    • Cons: proofs are harder
  • Without exec spec:

    • Pros: simpler to prove, more abstract
    • Cons: need to pay attention to completeness of test models
  • Do we want a "Reference implementation"?

    • yes, we need at least for simulation
    • we also need that to "test the quickcheck tests"
  • What FastBFT is doing?

    • Implementing Agda2Rust => goal is to have an executable specification
    • everything is decidable
  • Formal spec is similar to the research paper

Decision:

  • We won't write an Executable spec per se, only in the form of q-d tests

  • We focus on the Voting logic

  • Current small example of trace is already eating a lot of memory in Agda

    • Compile Agda with Haskell profiling on???

Deploying Peras simulation

Spent some time to add automatic deployment of our Peras simulation server to some well known address so that people can benefit from it. It's been quite a journey as is often the case with cloud-based deployments when it's not one's day-to-day job, but it's kind of working with some degree of IaaC and automation.

Here is a quick summary of what was done:

  • A docker image is built in CI, and pushed to 2 different docker registries:
    • Github's registry, which makes it available under Packages for manual download
    • A custom GCloud Artifacts Registry, because Cloud Run actions cannot pull from ghcr.io
      • There's a concept of Remote Registry that can be used to expose ghcr.io
  • It's all deployed under iog-hydra GCloud project with some dedicated service account
  • The server is deployed as a single-container Cloud Run service with a custom domain mapping from https://peras-simulation.cardano-scaling.org

The following picture summarizes the overall workflow:

Peras Server Deployment

2024-06-07

Simulation and visualzation

Upgraded simulation server:

  • Handle multiple clients
  • Control via WebSockets
  • Optional basic authentication
  • Added preagreement termination bound
  • More realistic defaults for visualization

2024-06-06

Constraints on Peras parameters

Created a table of constraints on Peras parameters for use in simulation studies and UIs that involve setting the protocol parameters. Some of these are from the draft Peras paper but others were determined analytically or intuitively. We do not have precise values for the healing, chain-quality, and common-prefix times because they depend upon the level of security desired for the chain.

2024-06-05

Simulation visualization

A rudimentary user interface for controlling, visualizing, and debugging simulations. Several ambiguities and/or bugs have been found already through its use.

2024-06-02

Restartable Peras simulations

  • The SimConfig type contains complete information for starting a simulation from an arbitrary state.
    • Start and end times
    • Protocol parameters
    • Which parties are slot leaders for particular rounds.
    • Which parties are committee members for particular rounds.
    • The initial PerasState for each party.
    • What pending chains and votes will be diffused in future slots.
  • peras-simulate now outputs a SimConfig that can be edited and input into peras-simulate to continue the simulation.
  • peras-visualize converts simulation traces to GraphViz DOT files.
    • Additional visualization and log-processing options can be added to this.
    • The next option to be added will convert the log to CSV files.
  • The underlying simulation and visualization functions can easily be incorporated into peras-server.

2024-05-31

Dynamic QuickCheck for Haskell prototype

The Peras.Abstract.Protocol.Node.Model module implements quickcheck-dynamic models for the Haskell prototype.

  • Presently, the prototype is used as the StateModel, but the code is organized so that we can drop in an Agda-generated model when that is ready.
  • Even prior to that, we can add dynamic logic properties to Peras.Abstract.Protocol.Node.ModelSpec to test whether the prototype exhibits the behavior outlined in the paper or in Agda. An example would be a liveness property.
  • We can also create additional state models for fetching, block-creation, and voting, and then test the prototype (as a run model) against those.
  • Thus, the conformance tests might end up including a general-purpose check of faithfulness to the spec and several focused models.

Findings:

  1. The usage of MonadSTM significantly complicated the construction of the state and run models. It is a serious problem for RunModel because there exists no MonadSTM Gen, which is needed for the dynamic-logic tests, but implementing that instances would be laborious. (This is another facet of the earlier problems we had using random numbers in MonadSTM.)
  2. Similarly, we probably should define a newtype wrapping a monad transformer stack that has instances such as MonadError PerasError and MonadState. This would let us remove the runExceptT and ExceptT functions that are scattered throughout the model code.
  3. We could consider having the Actions return a [PerasLog]: on one hand that would provide deeper observability, but on the other hand the run model would be less of a black box.
  4. It is unclear how closely the Agda-generated code will align with the Actions in the node model. We'll need closer coordination to align the Agda with the Haskell actions.
  5. It would be possible to revive the Agda Peras.QCD.Node.Conformance executable specification and use that as the model for testing the prototype.

In summary, we’ve considered and investigated three prominent approaches to link formal specification in Agda to dynamic property-based tests in Haskell.

  1. Relational spec → make it decidable → use that executable version for testing.
    • Pro: yields an executable specification in Haskell.
    • Con: requires decidable versions of each small step.
  2. Relational spec → formulate test properties → prove that test properties conform
    • Pro: yields properties in Haskell
    • Con: no executable specification in Haskell
    • We are pursuing this now.
  3. Relational spec + executable spec → prove their equivalence
    • Pro: yields an executable specification in Haskell
    • Con: consistency and completeness proofs may be difficult

Publishing Peras Website

  • Started work on a docusaurus based website for Peras that will be aimed at sharing the team's progress and findings, and various documents, specifications, and tools we produce.
  • Website is currently published at https://input-output-hk.github.io/peras-design but I plan to move it to peras.cardano-scaling.org
  • It currently hosts the technical report, an early version of the Agda specification, and some unfinished introduction text, so definitely room for improvements

2024-05-30

Removed obsolete code

The following top-level folders have been removed:

  • peras-iosim
  • peras-quickcheck
  • peras-rust
  • peras_topology

Haskell prototype

  • Did some minor renaming modification to clarify the 2 different running modes: As a single node against an (adversarial) environment, and as multiple nodes interacting
    • The former is useful to more easily model complex and powerful adversaries that would control a significant fraction of the stake, at the cost of some complexity in the generation of chains and votes
    • The latter is useful to understand and observe the behaviour of the protocol
  • Also moved this code to own peras-simulation package
    • We should probably thrash peras-iosim?
    • Not sure how to fix the underlying nix stuff, but at the time of this writing cache.iog.io yields error 500 anyway so there's not much I can do at this stage

Planning

  • Cleanup
    • rm -fr peras-iosim
    • rm -fr peras-quickcheck
    • rm -fr peras-rust
    • fix nix build
    • move test-demo -> quickcheck-dynamic
    • keep peras-delta-q & analytics
    • skim through the Logbook to ensure everything can be published
  • Brian to take over weekly updates
  • popularisation of Peras?
    • Interactive web site -> part of the job
  • tests
    • Brian -> write unit tests to explore corner cases with an eye towards porting them Agda
    • Quviq has started working on the formal spec <-> QCD bridge
    • Yves doing liaison w/ Quviq => QCD bridge
  • Dealing w/ downloading & storage votes/certificates
    • what happens when you sync up
      • need full certificates history
      • need certificates in blocks
      • votes
      • => make a proposal and see what comes out from the commu
  • people complaining about CIP not being accessible
    • viz + formal spec + diagrams
  • equivocation?
    • swithc in testing/prototype -> accept multiple votes / reject equivocated votes
  • what to do with ties in chains (eg. every slot you could change chain which is silly)?
    • could just write a unit/qcd test for that?
  • preagreement
    • need communication to agree not to vote

2024-05-29

Formal specification in Agda

In the small-step semantics in order to ensure that all steps of the protocol are executed and missing-out voting is detected by the semantics, I split the NextStep constructor into two, differentiating if the step transitions also to a new voting round. In case of a new voting round, we require that votes for the current round are present, if the previous round was successful. This is required to correctly build up the voting string explained in the paper (see rule HS-II).

    data _↝_ : State  State  Set where

      Deliver :  {m}
         h ⊢ M [ m ]⇀ N
          --------------
         M ↝ N

      CastVote :
          h ⊢ M ⇉ N
          ---------
         M ↝ N

      CreateBlock :
          h ⊢ M ↷ N
          ---------
         M ↝ N

      NextSlot :
          Delivered M
         NextSlotInSameRound M
          ---------------------
         M ↝ tick M

      NextSlotNewRound :
          Delivered M
         LastSlotInRound M
         RequiredVotes M
          ---------------
         M ↝ tick M

A similar issue with respect to block creation was pointed out last week when relying the test-spec to the formal specification. A similar condition as for voting needs to be added for block creation.

Discussion items for next steps

Several significant items need discussion among the prototyping team, as they have the potential to delay or block progress.

  • Prototype review and testing: Since, for the time being, we'll be relying on the Haskell prototype for the conformance tests . . .
    • Should we do a line-by-line code review to ensure it matches Figure 2 of the draft paper?
    • Should we prioritize creating the quickcheck-dynamic actions for simulation?
      • The actions can be used as a DSL for defining single- and multi-node honest and adversarial scenarios.
      • They would also test that implementations have the same observable effects as the reference implementation.
    • Aside from the quickcheck-dynamic model-based testing, should we also create a full suite of unit tests that focus on edge cases for the protocol?
      • Should these be backported to Agda?
      • Or should they be written in Agda and ported to Haskell?
  • Equivocation: The draft paper says little about how to handle equivocation.
    • Should we leave the prototype without any equivocation checks, since the paper's Figure 2 does not have any?
    • Or should we add a universal rule that the first chain or vote received always supercedes any subsequent equivocated ones?
  • Leadership and membership schedules in the prototype: It is still a bit awkward in the prototype to set predetermined schedules for slot leadership and voting committee membership.
    • Embedding the leadership/membership in the Party type is not practical for long or ad-hoc scenarios.
    • Could the leadership/membership be expressed via a DSL like in the Agda Peras.SmallStep.Execution?
    • Should externally generated leadership/membership proofs simply be handing to the blockCreation/voting functions?
  • Vote weights: A party may have multiple votes because they can win the stake lottery several times in the same round. Shall we coordinate soon on including this in the Agda Vote type and in the Haskell prototype?
  • Preagreement: When should we tackle coding preagreement in the prototype and conformance tests? The preagreement process might result in significant message traffic and delays, so we cannot just ignore it.
  • Pseudo-code vs Agda in CIP: Do we now have sufficient stakeholder feedback to settle on how to present the Peras protocol in the CIP?
  • Diffusion: The prototype's current diffusion schema is lossless and has a fixed delay for delivering messages.
    • What variants on a fixed delay should we implement?
  • Visualization: We're reaching the limits of convenient usage of GraphViz for visualizing the chain evolution. Should we develop a lightweight web-based visualizer for the PerasLog output? If so, should it be based on a standard JavaScript library or done via SVG in WASM Rust?
  • Stakeholder-facing software: What software will we develop and deploy for stakeholders to experiment with the protocol?
  • Cryptography implementations: The conformance tests will need implementations of signatures, hashes, proofs, and certificates. What is the priority for implementing these?

Haskell prototype

  • The Haskell prototype now fully implements (aside from preagreement) the May version of the Peras protocol. The implementation has been cleaned up and more thoroughly documented.
  • Its tracing facility is used to generate GraphViz diagrams of the blockchain, votes, and certificates.
  • It can be run in multi-node mode.
  • Logging has been expaned to include initialization, preagreement, and the internal logic of voting and block creation.
  • The diffusion capabilities have been revised to allow for chains and votes to be delayed before delivery. This makes it possible to do "worst-case" network scenarios where every message is delayed by delta.

2024-05-27

Haskell Prototype

  • Added executable which can be run as cabal run peras, with everything currently hardcoded (eg. no parameters).
  • The executable runs a single node in an infinite loop, simulating the Environment through some scenario.
    • Current scenario is very simple: Input a new chain every 20 slots, and 10 votes for the block at cutoff window distance $L$ from start of the round every slot within a given round
  • Sprinkled Tracer calls at every step relevant for the protocol, outputting JSON formatted log entries on standard out.
    • I was able to verify that certificate creation, quorum gathering, and voting logic work (eg. a trace is emitted as expected)
  • Tried to simplify logs too
{"round":98,"slot":1961,"tag":"Tick"}
{"chains":[{"bodyHash":"","certificate":null,"creatorId":43,"leadershipProof":"864cceeef469dcfd","parentBlock":"065f88496d4fc64d","signature":"d265acba13971427","slotNumber":1960}],"tag":"NewChainAndVotes","votes":[{"blockHash":"5b798fcb7fc22d8b","creatorId":0},{"blockHash":"5b798fcb7fc22d8b","creatorId":1},{"blockHash":"5b798fcb7fc22d8b","creatorId":2},{"blockHash":"5b798fcb7fc22d8b","creatorId":3},{"blockHash":"5b798fcb7fc22d8b","creatorId":4},{"blockHash":"5b798fcb7fc22d8b","creatorId":5},{"blockHash":"5b798fcb7fc22d8b","creatorId":6},{"blockHash":"5b798fcb7fc22d8b","creatorId":7},{"blockHash":"5b798fcb7fc22d8b","creatorId":8},{"blockHash":"5b798fcb7fc22d8b","creatorId":9}]}
{"chain":{"bodyHash":"","certificate":null,"creatorId":43,"leadershipProof":"864cceeef469dcfd","parentBlock":"065f88496d4fc64d","signature":"d265acba13971427","slotNumber":1960},"tag":"NewChainPref"}
{"round":98,"slot":1962,"tag":"Tick"}
...
{"chains":[],"tag":"NewChainAndVotes","votes":[{"blockHash":"5b798fcb7fc22d8b","creatorId":70},{"blockHash":"5b798fcb7fc22d8b","creatorId":71},{"blockHash":"5b798fcb7fc22d8b","creatorId":72},{"blockHash":"5b798fcb7fc22d8b","creatorId":73},{"blockHash":"5b798fcb7fc22d8b","creatorId":74},{"blockHash":"5b798fcb7fc22d8b","creatorId":75},{"blockHash":"5b798fcb7fc22d8b","creatorId":76},{"blockHash":"5b798fcb7fc22d8b","creatorId":77},{"blockHash":"5b798fcb7fc22d8b","creatorId":78},{"blockHash":"5b798fcb7fc22d8b","creatorId":79}]}
{"quorums":[{"blockRef":"5b798fcb7fc22d8b","round":98}],"tag":"NewCertificatesFromQuorum"}
{"certificate":{"blockRef":"5b798fcb7fc22d8b","round":98},"tag":"NewCertPrime"}
...
{"tag":"CastVote","vote":{"blockHash":"d265acba13971427","creatorId":42,"proofM":"f68b0b711289e4b0","signature":"a7f7fb277efb6148","votingRound":100}}

2024-05-23

Ledger specification

Going quickly through the Conway spec paper: https://drops.dagstuhl.de/storage/01oasics/oasics-vol118-fmbc2024/OASIcs.FMBC.2024.2/OASIcs.FMBC.2024.2.pdf Seems like the approach is very consistent with what have been trying to do in Peras:

  • Define underlying fundamental structures over which the Agda specification is parameterised (in this case, sets)
  • Define a declarative transition system along with various properties
  • Turn the declarative TS into a computational TS, providing functions to actually produce new state from old state and transition
  • Use the reference implementation for conformance testing

WASM specification paper: https://www.cl.cam.ac.uk/~caw77/papers/mechanising-and-verifying-the-webassembly-specification.pdf which formalises the language in Isabelle

Peras v2 Prototype

  • The strategy is to follow as closely as possible the procedure's definition from draft paper, writing unit/property tests along the way to help progress
  • Started working on Voting module, then I realised that we are using IO instead of io-classes so converted the code to the latter which will make testing easier.
    • I wonder if we should not make everything pure, then tie the various parts together with monadic "node", which leads me to think we should actually write that cdoe in Agda and generate the Haskell part.
  • Got confused on the committee membership selection process, I tried to reuse preexisting code from peras-iosim but of course it's based on different structure and does not fit in new version of protocol
  • NOTE: Using type aliases for type signatures of functions is not convenient for navigation: it adds one level of indirection when searching, one step to go to the function's definition and then another step to go to the type definition
  • Getting a strange result in the test: Whatever the value of the party id and round are, the test passes but it's not supposed to!
    • I can see the diffuseVote function is called even though the isMembership should return false 🤔
    • Made some progress on voting after wasting 20 minutes staring at the code before realising I had inverted and if/then/else condition.
  • Working on the voting conditions, trying to find the most expressive approach, ontroducing detailed "errors" to represent the various circumstances in which a node would not vote
    • Struggling to structure the boolean expression for voting, a disjunction of conjunctions.
    • I would like to keep them explicitly in the code as they are written in the paper, and <|> seemed promising but it's not working of course: PerasError is not a Monoid and why would it be one?
    • We want to know why a node does not vote, if only to ensure we are properly testing our "reference" implementaiton
  • Not sure if our current representation of Chain is the best one: A list of arbitrary blocks is not a chain, we should guarantee they are linked. Not sure why we chose that representation, perhaps because it's easier to work with in Agda? But I don't think we use that in Agda...
    • We should make our PerasState abstract (eg. typeclass or record of functions) and instantiate it as needed with different concrete implementation while keeping an "higher level view", eg. how it's described in the paper.
    • Header/Body split logic does not need to percolate upstream, and we can diffuse certificates instead of single votes while retaining the protocol's view that we propagate chains and single votes
  • What about concurrency and deadlocks in the algorithm? Should we need a proof that it's deadlock-free. Crypto proofs don't prove it's deadlock free => safety gap, potential DoS?

2024-05-22

Peras Weekly

  • Goals for this week:
    • Define and sketch property for conformance testing (work w/ Quviq)
    • "Naive" Prototype for latest version of protocol
  • Dates and location for meeting confirmed => need to move forward as David is on a sick leave

Prototype implementation

We paired on defining interfaces for the Peras protocol. The design emphasizes interfaces (types and functions) that correspond closely to the algorithm of the draft paper. We opted for this approach in constrast to a more sophisticated implementation.

So far the following modules have been provisionally implemented:

  • Fetching
  • Diffusion
  • Cryptographic placeholders

2024-05-21

Meeting w/ Quviq

Next steps:

  1. Share the full paper
  2. Define the property we want to test
  3. Tour of Agda code to know where to plug-in
  4. Sketch the testing model together

Clarifying Agda/Haskell interactions

The following picture attempts to clarify the relationship between Agda and Haskell as it's been explored recently:

Agda-Haskell Interactions

  • Agda code relies on the Agda Standard Library which provide much better support for proofs than Agda2hs and Haskell's Prelude obviously
  • Therefore Haskell code needs to depend on this stdlib code which is problematic for standard types (Eg. numbers, lists, tuples, etc.)
  • The Agda code separates Types from Implementation in order to ease generation
  • Types are generated using agda2hs to provide unmangled names and simple structures on the Haskell side
  • Impl is generate usign GHC to enable full use of stdlib stuff, with toplevel interface functions generated with unmangled names
  • Types are also taken into account when compiling using GHC but they are only "virtual", eg. the compiler makes the GHC-generated code depend on the agda2hs generated types
  • Hand-written Haskell code can call unmangled types and functions

Chain weight in block headers

Just like the Praos header being signed increases the confidence a node has in the validity of the length claim and thus allows a node to select chain (and pass to downstream peers) before downloading all its block bodies, we could add the weight of the chain to the header as a simple 64 bits field thus making it possible to select chain without having to verify immediately the certificates which would be provided with the bodies.

header =
  [ header_body
  , body_signature : $kes_signature
  ]

header_body =
  [ block_number     : uint
  , slot             : uint
  , weight           : uint
  , prev_hash        : $hash32 / null
  , issuer_vkey      : $vkey
  , vrf_vkey         : $vrf_vkey
  , vrf_result       : $vrf_cert ; replaces nonce_vrf and leader_vrf
  , block_body_size  : uint
  , block_body_hash  : $hash32 ; merkle triple root
  , operational_cert
  , [ protocol_version ]
  ]

This would prevent any risk of compromising block diffusion time, while at the same time avoiding or at least significantly reducing DoS or similar attacks because of unverified or unverifiable chain selection based on weight.

2024-05-20

Dynamic QuickCheck for new Agda+Haskell workflow

The new tests Peras.ChainWeightSpec check both (1) that the implementation being tested matches the output of the specification and (2) that an Agda property Peras.SmallStep.Experiment.propNeverShortens holds.

Overall findings from the Agda+Haskell experiment are the following:

  1. It is feasible to use non-agda2hs properties in Haskell tests by carefully use of Agda pragmas to generate MAlonzo and agda2hs code. The approach avoids ever referencing mangled names, but it does require some indirection and separation of interface vs implementation.
  2. The situation would be far simplier if one could use a pure Agda state-machine testing framework instead of having to export code to Haskell and then test there.
  3. The StateModel and RunModel of quickcheck-dynamic seem a little out of sync with this use case where the model and its properties are known to be correct because of Agda proofs and only the implementation is being tested.

2024-05-17

Formal specification

Changed the reflexive transitive closure of the small-step semantics to a List-like syntax similar to what has been proposed in test-demo. This allows to express execution steps in a concise way and could be used as DSL for specifying test-cases and attack-scenarios. An example of the new syntax is shown below:

      _ : initialState ↝⋆ finalState
      _ =    NextSlot empty  -- slot 1
          ↣ CreateBlock (honest refl refl isBlockSignature isSlotLeader)
          ↣ Deliver (honest refl (here refl) BlockReceived)
          ↣ NextSlot empty  -- slot 2
          ↣ CastVote (honest refl refl isVoteSignature refl isCommitteeMember (Regular vr-1a vr-1b))
          ↣ Deliver (honest refl (here refl) VoteReceived)
          ↣ NextSlot empty  -- slot 3
          ↣ ∎

The example above shows only the execution path and all the details are omitted. See Peras.SmallStep.Execution for the full example.

Proof of principle for calling unmangled Agda code from Haskell

The module Peras.ChainWeight demonstrates how Agda code that uses the standard library (not agda2hs) can be called from a Haskell function via stable, unmangled names (types and functions) generated by agda and agda2hs. The basic recipe is as follows:

  1. The data types used by Haskell clients must be created use Haskell.Prelude and be created by agda2hs via the {-# COMPILE AGDA2HS ... #-} pragma.
    • Types involving natural numbers must be hand-coded using {-# FOREIGN AGDA2HS ... #-} because they compile to Integer in MAlonzo but to Natural in agda2hs.
    • Fields may not contain identifiers that violate Haskell naming requirements.
  2. Those types are used as the concrete implementation in the very module where they are defined via the {-# COMPILE GHC ... = data ... #-} pragma.
  3. Functions that are called by Haskell are annotated with the {-# COMPILE GHC ... as ... #-} pragma.
    • Every argument must be of a type that was generated with {-# COMPILE AGDA2HS #-} or is a basic numeric type or unit.
    • Functions cannot have arguments using natural numbers, tuples, Maybe etc.
    • Functions may contain identifiers that violate Haskell naming requirements.
  4. The agda --compile --ghc-dont-call-ghc --no-main command generates mangled Haskell under MAlonzo.Code.Peras, except that is uses the unmangled types in Peras and has unmangled function names.

Peras.ChainWeight uses both unmangled types and unmangled functions:

Thus, two types of indirection are needed to avoid dealing with unmangled names:

  1. Use agda2hs to generate Haskell-friendly types that are used as the concrete implementation in Agda.
  2. Separate the implementation, which typically is not friendly to Haskell, from the function exported via agda2hs.

2024-05-16

Design of voting layer

As we are focusing our investigation and prototypibng efforts on the voting layer, I have sketched a somewhat detailed design of what this independent voting layer would look like: https://miro.com/app/board/uXjVNNffmyI=/?moveToWidget=3458764589087032554&cot=14

Some notes:

  • The orange parts describe what a test driver would look like, and what kind of messages it needs to input and monitor
  • The voting layer can be conceived as a "basic" chain follower, at least insofar as we are only interested in the voting logic
    • Inclusion of the latest certificate can be dealt later by exposing an interface for the Nakamoto component to query it for inclusion in the block forged
  • This model includes boths votes and certificates but the latter is an optimisation
  • Votes diffusion is very similar to transaction diffusion: Each peer maintains a "vote pool" which is populated with new votes fetched from upstream peers
  • As voting proceeds in rounds and node need to know whether or not there's a chain of quorums, or a cooldown period, maintaining a round-based index is important
  • Votes are tied to blocks on the preferred chain of the node so should the chain be rolled back, votes need to be discarded, and new votes are only accepted if they point to the preferred chain
  • When a rollback is signaled, votes can be dropped and new votes need to be fetched
    • The node needs to cancel any pending request
    • a new request for votes after some round number is issued
    • vote pool is repopulated with those new votes
  • there is an inherent synchronisation problem here between the blocks and the votes: If we don't have the blocks votes are pointing to, they can't be added to the mempool, but chain selection depends on votes!
    • the chain following part needs to be consistent with the state of voting, e.g. only provide roll forward/roll backwards that are consistent with the weight of the chain as given by votes
    • new certificates/quorums need to be send to nakamoto layer for chain selction

While working on voting layer, I had the annoying realisation that while diffusing independently votes and blocks work in theory, that's problematic in practice because of the interdependence of votes and blocks:

  • a node could be flooded with irrelevant votes if they don't have a way to verify they are valid, eg. they are votes for a block on preferred chain
  • a node could be flooded with irrelevant blocks if they don't have a way to ensure they are selecting the right chain, which requires the votes
  • In Praos, risks are reduced because you download chains based on their length which can be easily asserted with headers only. And it's hard to forge a header, so if a header is validated the odds its body is valid are very high.
  • But in Peras, you really need those votes to make the decision (or the certificates but we agree this is an optimisation).
  • It's not new, something we already discussed probably several times, but from a practical standpoint it makes things really annoying to implement and is a significant departure from the high-level specification

2024-05-14

MAlonzo version of executable specification

We used agda to generate MAlonzo-style Haskell code for the experimental Peras executable specification, Peras.QCD.Node.Specification. A new quickcheck-dynamic test compares the MAlonzo version against the agda2hs version: these tests all pass, but the MAlonzo version runs slower, likely because it involves more than two hundred Haskell modules.

  • agda2hs version: 4m 45.206s (250 tests)
  • MAlonzo version: 10m 46.280s (250 tests)

We initially attempted the following workflow in order to avoid manually coding translation between normal Haskell and MAlonzo:

  1. Compile using agda --ghc-dont-call-ghc.
  2. Compile using agda2hs, so that the various {-# COMPILE AGDA2HS ... #-} types were generated as normal Haskell.
  3. Compile using agda --compile, so that the agda2hs-generated code could be re-used as implementations via {-# COMPILE GHC <name> = data ... #-} pragma.

Two situations block this workflow:

  • The Maybe type differs between MAlonzo and agda2hs.
    • The {-# COMPILE GHC ... = data ... #-} pragma requires that all types in the data structure also use that pragma. However, it is impossible (without modifying the generated MAlonzo code) to add that pragma to Maybe.
    • One cannot work around this by defining one's own Maybe a type because the type parameter a results in an extra erasable argument to the generated Just constructor. That makes it incompatible with the Haskell base library's Maybe.
  • agda generates natural numbers as Haskell Integer whereas agdah2 generates them as Natural.

A workaround for the above blockers involves manually writing code to cast between the MAlonzo representation and the agda2hs representation. That more-or-less involves the same amount of work as just working with the agda-generated MAlonzo types, so we just do the latter. The following fragment of Peras.QCD.Node.Impl.MAlonzo illustrates the essence of interfacing MAlonzo code to normal Haskell:

runState' :: T_State_10 -> T_NodeModel_8 -> ([S.Message], T_NodeModel_8)
runState' action node =
  let
    Tuple.C__'44'__24 x s = d_runState_18 action $ unsafeCoerce node
   in
    (toMessage <$> unsafeCoerce x, unsafeCoerce s)

instance Monad m => PerasNode MAlonzoNode m where
  initialize params party =
    pure . runState' (d_initialize_8 (fromParams params) (fromVerificationKey party))
  fetching chains votes =
    pure . runState' (d_fetching_136 (fmap fromBlock <$> chains) (fromVote <$> votes))
  blockCreation txs =
    pure . runState' (d_blockCreation_160 txs)
  voting weight =
    pure . runState' (d_voting_250 $ toInteger weight)

Overall findings:

  • It is practical to integrate MAlonzo code directly into quickcheck-dynamic tests, though this is slightly fragile with respect to modification of the underlying Agda.
    • Changes to Agda types or functions cause the names of the subsequent, corresponding MAlonzo types or functions to be renumbered.
  • Handwritten code is needed to marshal MAlonzo to normal Haskell and vice versa.
    • The use of the {-# COMPILE GHC ... as ... #-} pragma saves a small amount of work.
    • The {-# COMPILE GHD ... = data ... #-} pragma can be used only in cases that don't use MAlonzo.Code.Haskell.Prim types such as Maybe, tuples, etc. (List is okay because Agda uses Haskell native lists.)
    • Any type that uses {-# COMPILE GHC ... = data ... #-} must be also transitively use it.
    • Any type that transitively uses natural numbers must be marshalled manually.
  • The MAlonzo code runs at roughly have the speed of the corresponding agda2hs code.

Note that the experimental Peras executable specification has not been yet reviewed against the draft paper or otherwise subjected to quality assurance. As we discussed today, one possible path forward is to make decidable some small- and/or big-step parts of the formal spec and then to use the generated MAlonzo in quickcheck-dynamic.

Votes structure

  • Assuming we use existing keys and crypto available to a cardano-node, we have the following breakdown for the structure of a vote:

    round number 8
    cold vkey 32
    block hash 32
    VRF proof 80
    VRF vkey 32
    KES signature 448
    Op cert 102
    total 734
  • OpCert is needed because that's the only way to verify the KES vkey and therefore the KES Signature

  • The KES signature is required if we don't register some specific key pair for the purpose of Peras voting, which would introduce its own set of problems

  • VRF benchmarks in cardano-crypto-test gives a mean VRF proof time of 100μs

Quviq meeting

  • Only Max, Ulf is at Agda implementors workshop 😄

  • What are the next steps?

  • Whole spectrum of conformance testing -> need to define what we think is conformance

    • specification = draft paper
    • agnostic to diffusion, focus on how you obey the rules of the protocol
    • distinguish pure protocol from propagation logic
  • Social aspects => not fun to implement a very detailed spec

    • "conformance without over specification"
  • turn the existing specification readable

  • small step semantics -> big step semantics + glue code for execution -> MAlonzo for qcd

    • most code in spec is not decidable
    • pb w/ Agda2hs = lot of work to prove correspondence -> avoid extra proof by making the spec decidable and do direct extraction w/ MAlonzo
    • do qcd purely in Agda? probably cheaper to do it in Haskell
    • Lean4 is closer to a programming language
  • connect small steps semantics to limited subset of test semantics

    • focusing on executability of a small subset for decidability and executability
  • PBT of human readable pseudo-code?

  • property to test: quorum

    • if a node gets enough votes in a round, it issues a certificate
    • if a node does not get enough votes in a round, it enters cooldown
    • => prove property at the spec level -> ensure the test does exert that property
    • restricting the proof of the property on the actions (subset of the smallsteps semantics)
  • define small steps/big steps/actions difference

    • small steps = all transitions including not observable ones
    • big steps = observable group of transitions
  • Going forward:

    • Take Yves' small-step semantics
    • Connect to a qc-d model using our technique and malonzo
      • The property is still that the messages really match
    • Actions should be a subset of the protocol, some possibilities:
      • reaching a quorum (in every round where you have a quorum there should be a certificate)
      • voting
    • Make as many simplifying assumptions about the network as possible
  • Next steps:

    • Code review on Brian's code
    • Meet with Max + Ulf on Tuesday for concrete expression of "quorum formation"

Next steps

Domain Action
Conformance testing Executable specification based on MAlonzo to be used in running conformance tests
Express Peras properties as conformance tests (with help from Quviq?)
Cleanup previous attempts in peras-quickcheck
Simulations ?
Networking Model votes & certificates propagation w/ ΔQ
Test newest version of ΔQ library
Prototype voting layer in Haskell
Votes & certificates Define votes structure
Define certificates structure
Formal specification Model voting characteristic string
Outreach Better understand stakeholders' needs
Project Organise in person workshop

Why formal methods?

  • Formal verification = formally specifying and proving properties of real world programs
  • Formal verification is hard
    • requires highly trained specialists (eg. PhD level or more) which are few and expensive
    • significant investment in time
    • Formal verification of concurrent and distributed systems is super hard and this is essentially 80% of what we are working on
  • The tools, languages, methods are "primitive" when compared to "standards" in software engineering
  • Research is not using those tools and languages (yet?)
    • researchers are not trained on using them anyway
    • because it's hard and takes time, it would significantly increase TTP (time to publication)
  • Engineering is not using those tools (yet?)
    • same reasons as above, but worse because proving anything about real world distributed systems is super hard
  • If we want Formal methods to have an impact in IOG and the wider Cardano community, we need an integrated strategy:
    • Define why/where/what we want to apply FM to
    • Clarify who we are formally proving things for
    • Train (and retrain) people
    • Invest in tools

Some notes taken while reading QED at large, focusing on Why proof engineering matters? chapter

  • Some examples of successful large-scale program verification:

    • CompCERT which is used in embedded, aeronautics, and offers commercial support, 35kLOC, 87% proofs

    • seL4 with application in aviation/automotive

      The central artefact is the Haskell prototype of the kernel. The prototype is derived from the (informal) requirements and embodies the design and implementation of algorithms that manage the low- level hardware details. It can be translated automatically into the theorem prover Isabelle/HOL to form an executable, design-level specification of the kernel. The ab- stract, high-level, functional specification of the kernel can be developed manually, concurrently and semi-independently, giving further feedback into the design process.

      The original Haskell prototype is not part of the proof chain.

      The correspondence established by the refinement proof ensures that all Hoare logic properties of the abstract model also hold for the refined model.

      Proofs about concurrent programs are hard, much harder than proofs about sequential programs.

      We simplify the problem further by implementing interrupt points via polling, rather than temporary enabling of interrupts.

      seL4 Design Process

      Proof Layers in seL4

    • In distributed systems: formalization and proof of Raft properties

    • BoringSSL is used in Chrome and includes verified crypto in C

  • Challenges in User productivity:

    Bourke et al. (2012) outline challenges in large-scale verification projects using proof assistants: (1) new proof engineers joining the project, (2) expert proof engineering during main development, (3) proof maintenance, and (4) social and management aspects. They highlight three lessons: (1) proof automation is crucial, (2) using introspective tools for quickly finding facts in large databases gain importantance for productivity, and (3) tools that shorten the edit-check cycle increase productivity, even when sacrificing soundness.

  • Oldies but goldies from Social Processes and Proofs of Theorems and Programs

    It is argued that formal verifications of programs, no matter how obtained, will not play the same key role in the development of computer science and software engineering as proofs do in mathematics. Furthermore the absence of continuity, the inevitability of change, and the complexity of specification of significantly many real programs make the formal verification process difficult to justify and manage. It is felt that ease of formal verification should not dominate program language design.

2024-05-13

Current situation for Peras

  • Conformance testing
    • Toy example from Quviq demonstrating how to relate a formal model to a test model, deriving Haskell from Agda and proving some equivalence of the test model w.r.t. formal model

      research paper ~~> Agda specification <==> Agda testable specification --> Haskell executable specification --> Dynamic QuickCheck properties
      
    • Autonomous spec & conformance test against faithful and invalid node, showing a derivation path of QC properties from paper, through a "pseudo-code" executable specification:

      research paper ~~> Agda executable specification --> Haskell executable specification --> Dynamic QuickCheck properties
      
    • "Praos" only Conformance test of foreign code using Netsim

    • "Praos" only Conformance test of Haskell code

    • Peras optimal conformance model conformance test

  • Agda formal specification
    • Formalised Peras protocol in small-step semantics along the lines of POS-NSB paper, proving KLnowledge propagation property
    • Started work on proving theorem 4.1 from the peras paper which defines how voting string is derived from execution of the protocol. The goal is to align formal proofs techniques with the model used by researchers
  • ΔQ model
  • Simulation(s)
    • Implemented Haskell simulator for older version of Peras algorithm and ran various scenarios:
      • Happy path scenario with various sizes (up to 100 nodes?)
      • Split-brain scenario showing the impact of votes on chain selection
      • Network congestion impact
    • various Analytical scenarios in R
  • Voting & Certificates

2024-05-10

Voting string analysis in Agda

For doing the voting string analysis I investigated different approches how to represent a voting 1string in Agda. The first attempt implemented a formal language as described in Equational Reasoning about Formal Languages in Coalgebraic Style. Following this approach we can specify regular expression defining valid voting strings:

data Alphabet : Set where
  : Alphabet
  : Alphabet
  🄀 : Alphabet

⟦_⟧ :  {i}  Alphabet  Lang i Alphabet
ν ⟦ _ ⟧ = false
δ ⟦ ⒈ ⟧ ⒈ = ε
δ ⟦ ⒈ ⟧ ? = ∅
δ ⟦ ⒈ ⟧ 🄀 = ∅
δ ⟦ ? ⟧ ⒈ = ∅
δ ⟦ ? ⟧ ? = ε
δ ⟦ ? ⟧ 🄀 = ∅
δ ⟦ 🄀 ⟧ ⒈ = ∅
δ ⟦ 🄀 ⟧ ? = ∅
δ ⟦ 🄀 ⟧ 🄀 = ε
= (⟦ ⒈ ⟧ + ⟦ ? ⟧ + ⟦ 🄀 ⟧)*

HS-II = ⋯ · ⟦ ⒈ ⟧

test : (🄀 ∷ ? ∷ 🄀 ∷ ⒈ ∷ []) ∈ HS-II ≡ true
test = refl

For voting string we probably don't need the full power of regular expression and therefore we tried to build those direclty as an inductive data type as follows:

data Σ : Set where
  : Σ
  : Σ
  🄀 : Σ

VotingString = Vec Σ

module _ ⦃ _ : Params ⦄ where
  open Params ⦃...⦄

  infix 3 _⟶_
  data _⟶_ :  {n}  VotingString n  VotingString (suc n)  Set where

    HS-I : [] ⟶ [] , ⒈

    HS-II-? :  {n} {σ : VotingString n}
       σ , ⒈ ⟶ σ , ⒈ , ?

    HS-II-1 :  {n} {σ : VotingString n}
       σ , ⒈ ⟶ σ , ⒈ , ⒈

    HS-III :  {n} {σ : VotingString n}
       σ , ? ⟶ σ , ? , 🄀

    HS-IV :  {n} {σ : VotingString n}
       1 ≤ L
       L ≤ K
       ((σ , ⒈ , ?) ++ replicate L 🄀) ⟶
        ((σ , ⒈ , ?) ++ replicate L 🄀) , 🄀

    ...

The second solution is well readable and easier to use in the proofs.

QCD conformance test via Agda executable specification

The Peras.QCD.ConformanceSpec implements the basic faithfulness test for node implementations. Examples are provided for a perfectly faithful node and a buggy one. This demonstrates the feasibility of the following workflow:

research paper --> Agda executable specification --> Haskell executable specification --> Dynamic QuickCheck properties

2024-05-09

Executable specification in Agda

Here are the results of the experiment to encode the Peras protocol definition from the draft paper as an Agda executable specification that compiles to Haskell under agda2hs and can be used with quickcheck-dynamic.

The four main protocol operations are listed below, but helper functions are omitted. The operations are expressed monadically so that the recipe reads as pseudo-code. There are still opportunities for syntactic sugar that would make the code more readable, but dramatic improvements probably are not feasible in this approach. Perhaps a more readable approach would be to express this in a rigorously defined, standardized pseudo-code language, which could be compiled to Agda, Haskell, Rust, Go, etc.

Some ambiguities in the draft specification were also identified:

  1. Step 4 of Fetching does not specify which chain to prefer when more than one chain has the same weight. In previous versions of the protocol, the preferred chain would only be updated if a weightier one was seen. In this version, when several chains have the maximal weight a node might keep switching preferred chains each slot.
  2. Step 2 VR-1B of Voting does not define the meaning of "extends the block", nor does the text of the paper. From context, "extends" must be equivalent to "has a descendant block".
  3. Voting is silent on how to handle the fact that a party may have many votes.
  4. The "vote" data structure does not record the number of votes that the party has cast in the round.
  5. Step 2 of Procedure is incomplete.
  6. Also, it seems that Certs should be initialized with the genesis certificate instead of the empty set.

Other lessons learned:

  • Bugs in agda2hs slowed development work and required awkward workarounds. (Issues for these have been created in the agda2hs repository.)
  • Lenses improve readability.
  • Using a List for the "set" data structure of the paper creates inefficiencies in the implementation.
    • Set invariants are not trivially enforced.
    • Access and query functions are slow.
  • It might be difficult prove this executable specification matches the properties that are being formally proved.
  • Even though the Agda code is written to look imperative, it has quite a few artifacts of functional style that could be an impediment to some implementors.
    • It might be better to use let statements instead of ← pure $. Unfortunately, it would be quite difficult to design an assignment operator to replace monadic let in Agda.
    • The functional style avoids introducing lots of intermediate variables, but maybe that would be preferable to using functions as modifiers to monadic state (e.g., _≕_ : Lens' s a → (a → a) → State s ⊤).
    • The use and pure functions could be eliminated by defining operators (including logical and arithmetic ones) that hide them.
  • Overall, the Agda code is more verbose than the textual specification.
  • It might be difficult to create Agda code that is simultaneously easily readable by mathematical audiences (e.g., researchers) and software audiences (e.g., implementors).
  • Quite a bit of boilerplate (instances, helper functions, lenses, State monad, etc.) are required to make the specification executable.
  • Creating a full eDSL might be a better approach, but that would involved significantly more effort.

Next steps (order might vary) that should be discussed before proceeding further:

  • Convert to literate Agda.
  • Implement quickcheck-dynamic node-oriented conformance tests using this executable specification.
  • Implement similar network-oriented conformance tests.
  • Test example native implementations in Haskell and Rust.
  • Evaluate suitability for inclusion either in a CIP or as an appendix or supplement.
  • Revise when Peras paper is finalized.
  • Implement cryptographic functions in Agda.

Fetching

-- Enter a new slot and record the new chains and votes received.
fetching : List Chain  List Vote  NodeOperation
fetching newChains newVotes =
  do
    -- Increment the slot number.
    currentSlot ≕ addOne
    -- Update the round number.
    u  peras U
    now  use currentSlot
    currentRound ≔ divideNat now u
    -- Add any new chains and certificates.
    updateChains newChains
    -- Add new votes.
    votes ≕ insertVotes newVotes
    -- Turn any new quorums into certificates.
    newCerts  certificatesForNewQuorums
    certs ≕ insertCerts newCerts
    -- Make the heaviest chain the preferred one.
    boost  peras B
    heaviest  heaviestChain boost <$> use certs <*> use chains
    preferredChain ≔ heaviest
    -- Record the latest certificate seen.
    updateLatestCertSeen
    -- Record the latest certificate on the preferred chain.
    updateLatestCertOnChain
    -- No messages need to be diffused.
    diffuse

Block creation

-- Create a new block.
blockCreation : List Tx  NodeOperation
blockCreation txs =
  do
    -- Find the hash of the parent's tip.
    tip  use preferredChain ⇉ chainTip
    -- Fetch the lifetime of certificates.
    a  peras A
    -- Fetch the current round.
    round  use currentRound
    -- Fetch the latest certificate and the latest on the chain.
    certPrime  use latestCertSeen
    certStar  use latestCertOnChain
    -- Check whether a certificate exists from two rounds past.
    penultimate 
      use certs                                     -- Fetch the certificates.
        ⇉ takeWhile (noMoreThanTwoRoundsOld round)  -- For efficiency, since the list is sorted by decreasing round.
        ⇉ any (twoRoundsOld round)                  -- Check if any certificates are two rounds old.
    -- Check that the latest certificate has not expired.
    unexpired  pure $ round <= certificateRound certPrime + a
    -- Check that the latest certificate is newer than the latest on the chain.
    newer  pure $ certificateRound certPrime > certificateRound certStar
    -- Determine whether the latest certificate should be included in the new block.
    cert  pure (
             if not penultimate && unexpired && newer
               then Just certPrime
               else Nothing
           )
    -- Forge the block.
    block  signBlock <$> use currentSlot <*> use creatorId <*> pure tip <*> pure cert <*> pure txs
    -- Extend the preferred chain.
    chain  use preferredChain ⇉ extendChain block
    -- Diffuse the new chain.
    diffuse ↞ NewChain chain

Voting

-- Vote.
voting : NodeOperation
voting =
  do
    -- Select block.
    selected  selectBlock
    case selected of λ where
      -- There was no preagreement block.
      Nothing 
        do
          -- No messages if preagreement does not yield a block.
          diffuse
      -- There was a preagreement block.
      (Just block) 
        do
          -- Fetch the current round.
          round  use currentRound
          -- Fetch the chain-ignorance and cool-down durations.
          r  peras R
          k <- peras K
          -- Check whether the latest certificate is from the previous round.
          vr1a  use latestCertSeen ⇉ oneRoundOld round
          -- Check whether the block ends the chain indicated by the latest certificate.
          vr1b  extends block <$> use latestCertSeen <*> use chains
          -- Check whether the certificate is in the chain-ignorance period.
          vr2a  use latestCertSeen ⇉ inChainIgnorance round r
          -- Check whether the cool-down period has ended.
          vr2b  use latestCertOnChain ⇉ afterCooldown round k
          -- Determine whether to vote.
          if vr1a && vr1b || vr2a && vr2b
             then (
               -- Vote.
               do
                 -- Sign the vote.
                 vote  signVote round <$> use creatorId <*> pure block
                 -- Record the vote.
                 votes ≕ (vote ∷_)
                 -- Diffuse the vote.
                 diffuse ↞ NewVote vote
              )
             else (
               -- Do not vote.
               do
                 -- No message because no vote was cast.
                 diffuse
             )

SelectBlock

-- Select a block to vote for
selectBlock : NodeState (Maybe Block)
selectBlock =
  do
    -- Fetch the cutoff window for block selection.
    l  peras L
    -- Fetch the current slot.
    now  use currentSlot
    -- Find the newest block older than the cutoff window.
    use preferredChain               -- Fetch the prefered chain.
      ⇉ chainBlocks                  -- List its blocks.
      ⇉ dropWhile (newerThan l now)  -- Ignore the blocks that in the cutoff window.
      ⇉ foundBlock                   -- Report the newest block found, if any.

2024-05-07

Team Session

  • Welcomed Hans to the team!

  • Started drafting CIP for Peras:

    • Used Hackmd to be able to collanoratively edit markdown document for the CIP
    • Also listed related tasks in issue #97
  • Discussion about recent benchmarks on ALBA and whether or not the variations of time to build a prove could be used as an attack vector

    • Idea: Given the low cost of building certificates it would be easy for an upstream peer to build them on the fly and share them with a synchronizing peer downstream
    • In the normal case, certificates form an unbroken chain, one per round voting on some block. Their weight could be accumulated in some ways in order to alleviate the need to keep a lot of them around while not in cooldown.
    • There's no need to store all certificates while not in cooldown, only the ones relevant to the "best chain" -> second order "pruning"?
  • Another quick discussion, on the topic of specification documents and language. The idea of having a formal "pseudo-code" language is appealing but seems like a huge effort

  • Here is an example algorithm from the Introduction to Reliable and Secure Distributed Computing book

    Pseudo-code for distirbuted systems example

2024-05-06

Team retrospective

We followed the Liked/Lacked/Learned retrospective process, yielding the following output:

Liked/Lacked/Learned

Then we spent some time grouping the various items in various "domains" and define some actionable item for each of those groupings

Retrospective Actions

Optimising ALBAs

Spent some time optimising ALBAs, implementing the depth-first search suggested by Pyrros to speed up construction of the proof. This turned out to 1/ be quite simple to implement and 2/ provide a dramatic boost in performance.

2024-05-01

Probabilities of various Peras scenarios

The document analytics/analytics-1.md derives formulae for several Peras scenarios involving adversaries.

  • Adversaries refuse to vote in a round, causing a failure to reach a quorum.
  • A non-zero probability of entering a cool-down period results in the chain being in cool-down a fraction of the time it operates.
  • Adversarial block-producers refuse to include certificates in blocks, letting them expire instead. (This would only affect the operation of the protocol during times when a certificate must be recorded on the chain.)
  • Adversaries both forge a block and later muster a quorum to vote for it.

QuickCheck tests could be constructed to test that the chain dynamics conform to these analytic expressions when adversaries act accordingly.

No honest quorum in round Line plot of no honest quorum Contour plot of no honest quorum
Fraction of time in cool down Fraction of time spent in cool-down periods
No certificate in honest block Line plot of no certificate in honest block Contour plot of no certificate in honest block
Adversarial block with adversarial quorum Adversarial block with adversarial quorum

2024-04-29

Crypto benchmarks

Raw benchmarks for ALBA

Name Mean MeanLB MeanUB Stddev StddevLB StddevUB
Hashing/hashing len=256 5.81e-7 5.71e-7 5.85e-7 1.93e-8 1.01e-8 3.77e-8
"Proving 100/64 λ = 8" 7.24e-3 7.19e-3 7.45e-3 2.43e-4 5.38e-5 5.31e-4
"Proving 100/64 λ = 9" 8.84e-3 8.79e-3 8.88e-3 1.19e-4 7.41e-5 1.71e-4
"Proving 100/64 λ = 10" 1.07e-2 1.07e-2 1.08e-2 1.15e-4 9.23e-5 1.48e-4
"Proving 100/128 λ = 8" 7.46e-3 7.41e-3 7.50e-3 1.39e-4 9.07e-5 2.32e-4
"Proving 100/128 λ = 9" 9.04e-3 9.01e-3 9.06e-3 8.13e-5 6.39e-5 1.12e-4
"Proving 100/128 λ = 10" 1.12e-2 1.11e-2 1.12e-2 1.61e-4 9.70e-5 2.74e-4
"Proving 100/256 λ = 8" 7.12e-3 6.79e-3 7.57e-3 1.09e-3 7.48e-4 1.58e-3
"Proving 100/256 λ = 9" 7.50e-3 7.38e-3 7.66e-3 3.97e-4 3.24e-4 4.68e-4
"Proving 100/256 λ = 10" 9.52e-3 9.44e-3 9.67e-3 2.89e-4 1.80e-4 3.87e-4
"Proving 200/64 λ = 8" 1.14e-2 1.12e-2 1.17e-2 5.79e-4 3.24e-4 9.20e-4
"Proving 200/64 λ = 9" 1.36e-2 1.35e-2 1.37e-2 2.01e-4 1.31e-4 3.27e-4
"Proving 200/64 λ = 10" 1.70e-2 1.66e-2 1.81e-2 1.58e-3 3.02e-4 2.93e-3
"Proving 200/128 λ = 8" 1.15e-2 1.14e-2 1.16e-2 3.07e-4 2.34e-4 4.85e-4
"Proving 200/128 λ = 9" 1.37e-2 1.35e-2 1.42e-2 6.58e-4 1.90e-4 1.30e-3
"Proving 200/128 λ = 10" 1.71e-2 1.70e-2 1.71e-2 1.46e-4 9.06e-5 2.33e-4
"Proving 200/256 λ = 8" 1.13e-2 1.12e-2 1.18e-2 5.40e-4 1.13e-4 1.15e-3
"Proving 200/256 λ = 9" 1.37e-2 1.36e-2 1.39e-2 2.90e-4 1.16e-4 5.40e-4
"Proving 200/256 λ = 10" 1.67e-2 1.66e-2 1.69e-2 2.46e-4 1.34e-4 3.61e-4
"Proving 300/64 λ = 8" 1.72e-2 1.70e-2 1.75e-2 6.46e-4 2.95e-4 9.78e-4
"Proving 300/64 λ = 9" 2.17e-2 2.11e-2 2.37e-2 2.06e-3 9.34e-4 3.92e-3
"Proving 300/64 λ = 10" 2.60e-2 2.51e-2 2.91e-2 3.19e-3 7.36e-4 6.51e-3
"Proving 300/128 λ = 8" 1.67e-2 1.64e-2 1.79e-2 1.12e-3 5.36e-4 2.09e-3
"Proving 300/128 λ = 9" 1.99e-2 1.98e-2 2.00e-2 2.09e-4 1.09e-4 3.38e-4
"Proving 300/128 λ = 10" 2.55e-2 2.53e-2 2.60e-2 7.56e-4 4.00e-4 1.16e-3
"Proving 300/256 λ = 8" 1.66e-2 1.66e-2 1.67e-2 1.68e-4 9.74e-5 2.88e-4
"Proving 300/256 λ = 9" 1.99e-2 1.99e-2 2.01e-2 2.22e-4 1.01e-4 3.25e-4
"Proving 300/256 λ = 10" 2.59e-2 2.56e-2 2.66e-2 8.47e-4 2.02e-4 1.67e-3

Raw benchmarks table for cardano-crypto stuff

Name Mean MeanLB MeanUB Stddev StddevLB StddevUB
DSIGN/Ed25519/genKeyDSIGN 1.06e-5 1.06e-5 1.07e-5 9.35e-8 4.82e-8 1.77e-7
DSIGN/Ed25519/signDSIGN 1.14e-5 1.14e-5 1.15e-5 1.95e-7 1.09e-7 3.46e-7
DSIGN/Ed25519/verifyDSIGN 3.27e-5 3.26e-5 3.28e-5 3.35e-7 2.17e-7 4.73e-7
DSIGN/EcdsaSecp256k1/genKeyDSIGN 7.72e-8 7.69e-8 7.77e-8 1.28e-9 9.28e-10 1.96e-9
DSIGN/EcdsaSecp256k1/signDSIGN 2.03e-5 2.02e-5 2.05e-5 4.42e-7 1.74e-7 8.52e-7
DSIGN/EcdsaSecp256k1/verifyDSIGN 2.29e-5 2.29e-5 2.30e-5 1.82e-7 9.27e-8 3.71e-7
DSIGN/SchnorrSecp256k1/genKeyDSIGN 3.10e-8 3.07e-8 3.14e-8 1.22e-9 8.93e-10 1.86e-9
DSIGN/SchnorrSecp256k1/signDSIGN 3.14e-5 3.12e-5 3.19e-5 1.01e-6 5.93e-7 1.69e-6
DSIGN/SchnorrSecp256k1/verifyDSIGN 2.50e-5 2.50e-5 2.53e-5 4.72e-7 1.18e-7 9.74e-7
HASH/Blake2b_224/hashWith 3.58e-7 3.51e-7 3.78e-7 3.60e-8 1.26e-8 6.51e-8
HASH/Blake2b_224/decodeHash 8.32e-8 8.01e-8 8.83e-8 1.43e-8 1.01e-8 2.30e-8
HASH/Blake2b_256/hashWith 3.39e-7 3.39e-7 3.40e-7 2.73e-9 1.90e-9 3.86e-9
HASH/Blake2b_256/decodeHash 7.09e-8 7.07e-8 7.13e-8 7.90e-10 3.37e-10 1.49e-9
KES/Sum6KES/genKey 1.25e-3 1.25e-3 1.26e-3 1.61e-5 1.35e-5 1.95e-5
KES/Sum6KES/signKES 1.26e-3 1.25e-3 1.27e-3 2.32e-5 1.76e-5 3.08e-5
KES/Sum6KES/verifyKES 1.25e-3 1.24e-3 1.29e-3 6.23e-5 1.17e-5 1.30e-4
KES/Sum6KES/updateKES 1.24e-3 1.23e-3 1.25e-3 2.71e-5 1.98e-5 4.49e-5
KES/Sum7KES/genKey 2.45e-3 2.44e-3 2.46e-3 1.97e-5 1.61e-5 2.87e-5
KES/Sum7KES/signKES 2.48e-3 2.46e-3 2.60e-3 1.30e-4 2.59e-5 3.12e-4
KES/Sum7KES/verifyKES 2.46e-3 2.45e-3 2.47e-3 3.64e-5 2.77e-5 5.04e-5
KES/Sum7KES/updateKES 2.42e-3 2.36e-3 2.51e-3 2.51e-4 1.91e-4 3.72e-4
KES/CompactSum6KES/genKey 1.24e-3 1.23e-3 1.26e-3 4.41e-5 2.62e-5 7.32e-5
KES/CompactSum6KES/signKES 1.26e-3 1.24e-3 1.27e-3 5.96e-5 3.26e-5 8.88e-5
KES/CompactSum6KES/verifyKES 1.27e-3 1.26e-3 1.27e-3 1.68e-5 1.27e-5 2.29e-5
KES/CompactSum6KES/updateKES 1.28e-3 1.27e-3 1.28e-3 2.02e-5 1.41e-5 3.18e-5
KES/CompactSum7KES/genKey 2.53e-3 2.51e-3 2.54e-3 3.93e-5 2.76e-5 6.38e-5
KES/CompactSum7KES/signKES 2.54e-3 2.52e-3 2.55e-3 3.38e-5 2.38e-5 5.44e-5
KES/CompactSum7KES/verifyKES 2.53e-3 2.51e-3 2.54e-3 4.35e-5 2.76e-5 7.28e-5
KES/CompactSum7KES/updateKES 2.55e-3 2.54e-3 2.58e-3 6.05e-5 3.82e-5 1.03e-4
VRF/SimpleVRF/genKey 5.39e-7 5.06e-7 6.30e-7 1.71e-7 7.26e-8 3.22e-7
VRF/SimpleVRF/eval 1.19e-2 1.17e-2 1.27e-2 1.03e-3 5.47e-5 2.12e-3
VRF/SimpleVRF/verify 1.02e-2 1.01e-2 1.03e-2 1.87e-4 1.13e-4 2.70e-4
VRF/PraosVRF/genKey 1.18e-5 1.16e-5 1.26e-5 1.25e-6 1.30e-7 2.63e-6
VRF/PraosVRF/eval 1.00e-4 1.00e-4 1.02e-4 2.17e-6 7.78e-7 4.41e-6
VRF/PraosVRF/verify 1.02e-4 9.79e-5 1.20e-4 2.56e-5 2.91e-6 5.55e-5

AB on ALBA Certificates

I spent some time building a Haskell implementation for ALBAs, here are some notes related to this experience.

  • It uses libsodium to provide hashing function, just like cardano-base. I did not want to depend on cardano-crypto-class as hashing is the only function I need, but this would be a good idea later on to handle voting and signatures

  • Hashing is done with Blake2b256. My initial implementation took about 1μs/hash which is 3x what cardano-crypto-class provides. I had a look at the implementation their and it relies on unsafeCreate and useAsCStringLen which provide much better performance. For some reason, my code is still 30% slower than cardano-crypto-class at ~420ns 🤷

    • Perhaps SHA256 would be enough as a hashing function?
  • The code now implements, or is supposed to implement, optimisation from section 3.2 of the paper:

    • The input set of elements are hashed and their modulus $n_p$ is compared to the iteratively computed hashes and modulus of the selected proof tuples
    • On a test run for 200 items, with 70/30 honest to faulty ratio, and security parameter 8, it computes 35400 hashes which seems consistent with the theoretical complexity of $\lambda^3 . n_p$
  • It also implements the optimisation on computation of parameter $d$ from section C.1 of the paper but this does not seem to have a significant impact on running time

  • The code as it is checks the number of signatures needed ($u$) is consistent with what's provided in the paper. In the case of a 1000 signatures, for a ratio of 60/40, and a λ of 128, gives $u=232$.

    • Given a KES signature size of 448 bytes, this gives us a certificate size of 103936 bytes which is about the size of block
  • The biggest issue with the current implementation lies in how the size of the set of items to select from increases over rounds while constructing the proof. Here is an example for the parameter given above, showing the length of the selection set which needs to be paired with input set:

    length = 35446
    length = 95810
    length = 159506
    length = 265489
    length = 441223
    length = 734386
    ...
    

    The values of the protocol inner parameters are: u = 234, d = 21192, q = 8.373238874261325e-3

    The value of $d$ is quite large and therefore leads to fast increase in the size of the set.

  • This implementation can generate a proof for 300 items with a λ of 10 and honest fraction of 80% in a bit over 20ms

Optimising ALBA code was a bit of a journey, took me a while to figure out how to remove the "exponentially increasing" calls to hash while constructing the proofs iteratively. I had to resort to profiling support from GHC and manually insert SCC pragmas all over the place to get a sense of where the time was spent. Biggest time consumer was garbage allocation, which was up to 10s of GBs for a relatively small run, and is now about 150MB for the aforementioned parameters, which seems quite reasonable.

  • I had to look at cardano-base's codebase to understand what I was doing wrong with the call to libsodium: Under the hood it's the exact same code that's called, but I was using allocators from Foreign which apparently are quite inefficient
  • I dropped use of Integer in computations in favor of Word64 to ensure fast modulus for bytestrings. It's possible that native code would be even faster here
  • At some point I have tried to dump GHC core in order to understand why my code was allocating so much data, but it wasn't very helpful at a glance

2024-04-17

Meeting PNSol on Vote network modelling

  • Seems like the vote diffusion should not be problematic on sunny days, so the modelling and thinking effort should be focused on "rainy days", eg. what happens under heavy load, eg. CPU load (also possibly network load?). These are the circumstances into which backpressure should be applied

Some key questions to answer to:

  • How much computation do we do on each vote?
  • How much computation do we do on certificate?
  • What kind of backpressure do we need to bake in?
  • ND: "do these things form a monoid" -> pack things together

Interesting observation:

  • We could build certificate to reduce amount of data transferred, eg. trading CPU time (building certificate) for space and network bandwidth consumption

2024-04-15

  • Working on technical report, filling in section on quickcheck-dynamic example protocol from Quviq
  • Also trying to provide a better intuition about the protocol and rework the introductory section following our discussions in Paris
  • Some questions for researchers related to committee selection and the number of messages we need to broadcast across the network

2024-03-30

New QuickCheck-Dynamic model

A new quickcheck-dynamic model was created for closer and cleaner linkage between code generated by agda2hs and Haskell and Rust simultations. The model has the following features:

  • Separation between "idealized" votes, certificates, blocks, and chains of the specification from "realized" ones generated by agda2hs and used in the simultions.
    • The idealized version ignores some details like signatures and proofs.
    • It would be possible to remove this separation between ideal and real behavior is fully deterministic (including the bytes of signatures and proofs).
    • However, this exercise demonstrates the feasiblity of having a slightly more abstract model of a node for use in Test.QuickCheck.StateModel.
  • A NodeModel with sufficient detail to implement Peras.
    • Ideally, this would be generated by agda2hs.
  • Executable specification for the node model.
    • Ideally, this would be generated by agda2hs.
  • A class PerasNode representing the abstract interface of nodes.
  • An instance StateModel NodeModel that uses the executable specification for state transitions and includes generators for actions, constrained by preconditions.
  • An instance (Monad m, PerasNode n m) => RunModel NodeModel (RunMonad n m) that runs actions on a PerasNode and checks postconditions.
  • An instance PerasNode ExampleNode embodying a simple, intentionally buggy, node for exercising the dynamic logic tests.
  • A simple property for the example node.

The example property simply runs a simulation using ExampleNode and checks the trace's conformance to the executable specification. Because the example node contains a couple of intentional bugs, we expect the test to fail. Shrinkage reveals a parcimonious series of actions that exhibit one of the bugs.

$ cabal run test:peras-quickcheck-test -- --match "/Peras.OptimalModel/Example node/Simulation respects model/"

Peras.OptimalModel
  Example node
    Simulation respects model [✔]
      +++ OK, failed as expected. Assertion failed (after 4 tests and 1 shrink):
      do action $ Initialize (Peras {roundLength = 10, quorum = 3, boost = 0.25}) 0
         action $ ANewChain [BlockIdeal {hash = "92dc9c1906312bb4", creator = 1, slot = 0, cert = Nothing, parent = ""}]
         action $ ATick False True
         pure ()

Finished in 0.0027 seconds
1 example, 0 failures

Instead of migrating peras-iosim and peras-netsim, we might start with a clean slate and use a more TDD-focused approach that builds out from the Agda and Dynamic QuickCheck specifications. Those legacy, prototype codebased can be mined for lessons learned and code fragments in the new, cleaner framework.

Mini state machine refactoring of peras-iosim

The branch bwbush/march-sm contains work in progress for a major refactoring of peras-iosim that will use miniature state machines for each node's upstream/downstream channels and that will implement the full Peras protocol. The key features are:

  • Pure functions for each clause in the March pseudo-code specification for the Peras protocol.
  • Each node-to-node channel is implemented as a composition of state machines.
    • Chains under evaluation or fully evaluated.
    • Block bodies requested and successfully fetched.
    • Certificates requested and successfully fetched.
    • Votes seen.
  • A similar node-level state machine aggregates the activity of the individual channels so, for example, a block body is only requested from one peer instead of every peer that mentions it.
  • InEnvelope and OutEnvelope are redesigned for clean usage.
  • Time is managed via the network's centralized priority queue, which dispatched messages over the channels.
  • No use of threading or STM.
    • The design is compatible with this, but the complexity outweighs the value that would be added.
    • IOSim is single-threaded anyway, so implementing sophisticated parallel simulation would only have benefits outside of IOSim.
  • Overall the node interface has been simplified.
    • Each node receives a single message at a time.
    • The node outputs a sequence of timestamped messages to specified recipients.
    • The result of each message-handling call also includes statistics on CPU and bandwidth usage.
    • Structured Events (including ad-hoc debug tracing) are observable.
    • Outgoing results are tagged with a guaranteed minimum time before which a node will not send any more messages.

This branch is abandoned in favor a TDD approach using QuickCheck Dynamic, but lessons learned and design principles will be incorporated into a future faithful simulation of the March version of the Peras protocol.

Congestion experiment

We conducted a coarse study to exercise peras-iosim in a simulation experiment involving network congestion. It was meant to check capabilities in these areas:

  • simulation/analysis workflow
  • scalability and performance
  • observability

It was a full factorial experiment where bandwidth and latency are varied on a small network with semi-realistic Peras parameters. Each block has its maximum size.

  • 250 nodes with fivefold connectivity
  • ~25 committee members
  • Full blocks (90.112 kB)
  • Latency from 0.25 s to 1.00 s
  • Bandwidth from 8 Mb/s to 400 Mb/s

The experiment is not realistic in several aspects:

  • Memory pool and other non-block/non-vote messages not modeled
  • February version of Peras protocol
  • Simulation not validated - not suitable for making conclusions about Peras

Several findings were apparent:

  • A threshold is readily detectable at a bandwidth of ~20 Mb/s.
  • Non-block and not-vote messages such as those related to the memory pool must be accounted for in congestion.
  • The event logging and statistics system easily supports analyses such as this.
  • Data on node processing times is needed.

The following diagram shows the cumulative bytes received by nodes as a function of network latency and bandwidth, illustrating the ttheshold below which bandwidth is saturated by the protocol and block/vote diffusion.

Cumulative bytes received by nodes as a function of network latency and bandwidth

2024-03-27

AB on quickcheck-dynamic

Trying to unpack the script shrinking process, calling internal functions directly.

test = do
  let dynformula = runDL initialAnnotatedState dl
      dynlogic = unDynFormula dynformula 10
  dlTest <- QC.generate $ generate chooseNextStep dynlogic 1 initialAnnotatedState 10
  let shrunk = shrinkScript dynlogic (getScript dlTest)
      tested = makeTestFromPruned dynlogic <$> pruned

  print $ dlTest
  print $ take 5 tested

yeilds something like:

do action $ Incr
   action $ Incr
   action $ Incr
   action $ Incr
   action $ Incr
   action $ Incr
   action $ Incr
   action $ Incr
   action $ Incr
   action $ Incr
   action $ Assert 10
   pure ()

[
   pure ()
,do action $ Incr
   action $ Incr
   pure ()
,do action $ Incr
   action $ Incr
   action $ Incr
   action $ Incr
   action $ Incr
   action $ Incr
   pure ()
,do action $ Incr
   action $ Incr
   action $ Incr
   action $ Incr
   action $ Incr
   action $ Incr
   action $ Incr
   action $ Incr
   pure ()
,do action $ Incr
   action $ Incr
   action $ Incr
   action $ Incr
   action $ Incr
   action $ Incr
   action $ Incr
   action $ Incr
   action $ Incr
   pure ()
]

eg. the testes is correctly shrunk

When verboseChecking the test I realise that shrunk actions do not contain any Assert statement which means there's no way they can fail. This is to be expected because Assert is marked as restricted and therefore is pruned from the generated test cases

  • Is it not the case the problem lies in the fact we don't shrink the restricted actions?

Looking a bit more closely into Issue31 makes me wonder if it's not a case where the whole idea of integrated shrinking wouldn't make sense and simplify a lot of things. The technical manifestation of the problem here is that we cannot shrink a trace because the Assert action is restricted and never gets generated independently, so once we try to shrink a failing test, it produces only valid traces.

The action needs to be restricted because it depends on the state at runtime: When we generate an initial trace, the value is fixed and then never changed when we shrink. If we kept the sequence of generated values and then shrank on the whole sequence and not individually, we would find the case we are interested in.

We could explicitly model dependency on state at the level of the expression, and therefore be able to detect more easily those dependencies and act accordingly, eg. recompute the rest of the expression?

Team Meeting

Discussing haeder/body split:

  • how does header forwarding work?
  • you can't fully adopt a chain before seeing the block body => the certificate could be just part of the body
  • What if the certificate was part of a transaction? Could even be verified as another smart contract?
  • Would imply that the node depends on the state of the ledger or a single tx but that's already the case?

What are the true requirements for Midnight? Fast settlement does not mean anything: In what context? Under which conditions?

Concordium also has deterministic finality, but then they switched to BFT??

Workshop agenda:

Monday afternoon:

  • Refactoring together -> adjust interfaces and actual code

Tuesday/Wednesday:

  • Overview of finality across blockchains. How do other chains handle that?
  • What does finality really means?
    • => we need a technical definition to be specified/tested
    • => something we can observe?
  • Under what circumstances does Peras provide benefits over Praos?
    • => turn those into proper QC tests
    • => we need to build a compelling case for implementing Peras
    • => Check w/ Trym? we need a product perspective
  • Quviq => remote meeting to discuss their model
  • Walkthrough of formal specification
    • Proof principles?
    • Discuss future work/next steps/properties
  • Aligning crypto proofs/agda proofs/QC properties
    • downstream contributions -> make engineers able to tweak the spec and understand it to be able to change it
  • Rust node?
  • Value demonstration for PI planning?
    • need to raise awereness => fairly strong statements about the project
    • we have spent $X, we have some results, we need to spend $Y to get more info => go/no-go
    • Frame things in terms of options => use the risk section (development time, SPO investment)

2024-03-26

AB on ΔQ

Read Cardano | Timeliness Constraints report from Neil and Peter which analyses the problem of timeliness of transaction inclusion in blocks in order to provide (probabilistic) guarantees to DApps on "time-to-chain" bounds. This is highly relevant to various applications that are sensitive to the time their transaction is included in the chain, eg. oracles or DEXes.

The paper explicitly mentions "finality" as being a different goal (but it's unclear how this is much different?) and explicitly mentions Peras and the impact it would have on the proposed solution:

Affects settlement times in a probabilistic way, but doesn’t impact inclusion times. Settlement times are lower under low system load, but reduced settlement times will depend on a collection of criteria that can be influenced by both load and mischievous/adversarial activity. This may impact the market by changing when it is desirable to pay for priority treatment (in a dynamic way).

The proposed solution is to create a futures market to front-load SPOs mempools: Some party interested in having time-to-chain guarantees can buy a future slot to a market maker who is responsible for making prices in relationship with SPOs consortia that provide the front-loading capability. Inclusion guarantee is tied to a certain time window and the probability depends on a consortium's share of stakes: The larger the share, the smaller the time window can be.

Meeting on ΔQ w/ PNSol

  • Number of active SPOs is ~3000 but the number of more important nodes is 1000
  • => increase the degree
  • => measure the numbers for the number of block size

Notes about the diagrams:

  • Label the axis!!

  • Ensure consistency of colors

  • Karl -> fractions of blocks diffused w/in 3s

  • linear condition on the // operator

  • need to combine them later

  • header/body split => header/body fastpath

    • typical -> pipelining
    • normal operation worst case
    • worst case w/ adversarial
  • analyse by case -> number of forks => throughput will be reduced

  • ND -> forking model

    • cert will increase height battles
  • important interval = beginning of slot where block is minted to start of minting of next block

  • Peras is not about txs -> about blocks

    • tx finality under load vs. block finality under forking
  • forking occurs a bit more frequently => blocks are bigger, longer to diffuse, critical time longer => more chances of forking

  • votes stabilise the system -> block size diffusion dominates under load => impact of certificates is smaller

2024-03-25

AB on ΔQ

Thinking of modelling the whole journey of a tx through ΔQ:

  1. tx gets (or not) into a local mempool at a node (could be rejected because mempool is full but this should not happen in leaf nodes)
  2. tx propagates across the network to other mempools -> a tx has max size of 16kB so we can apply network hops diffusion model for this size => need diffusion time for various size increments a. need to include the effect of network congestion here: If mempools are full, they won't pull new tx which will sit idle for a longer period
  3. tx gets included in a block by a BP => includes block forging time
  4. forged block gets propagated across the network a. reuse previous analysis for block diffusion b. need to take into account a different topology -> more nodes but less well connected?

installing scipy as it seems python is the defacto standard for various scientific tools and graphs algorithms

Using iperf3 to measure throughput between my machine and cardano.hydra.bzh

% iperf3 -c cardano.hydra.bzh -p 5002
Connecting to host cardano.hydra.bzh, port 5002
[  7] local 10.1.18.185 port 58792 connected to 95.217.84.233 port 5002
[ ID] Interval           Transfer     Bitrate
[  7]   0.00-1.00   sec  23.1 MBytes   194 Mbits/sec
[  7]   1.00-2.01   sec  4.62 MBytes  38.7 Mbits/sec
[  7]   2.01-3.01   sec  3.88 MBytes  32.5 Mbits/sec
[  7]   3.01-4.00   sec  12.8 MBytes   107 Mbits/sec
[  7]   4.00-5.00   sec  19.1 MBytes   160 Mbits/sec
[  7]   5.00-6.00   sec  19.9 MBytes   167 Mbits/sec
[  7]   6.00-7.00   sec  18.2 MBytes   153 Mbits/sec
[  7]   7.00-8.00   sec  17.0 MBytes   143 Mbits/sec
[  7]   8.00-9.01   sec  16.6 MBytes   139 Mbits/sec
[  7]   9.01-10.00  sec  18.8 MBytes   158 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  7]   0.00-10.00  sec   154 MBytes   129 Mbits/sec                  sender
[  7]   0.00-10.06  sec   153 MBytes   128 Mbits/sec                  receiver

Going through networkit user guide: https://networkit.github.io/dev-docs/notebooks/User-Guide.html looking for a function to compute distribution of distances between nodes ->

>>> dist = nk.distance.APSP(ergG)
>>> dist.run()
>>> x= dist.getDistances()
>>> plt.hist(x)

Latter's resulting graph is ugly as it plots x-axis with all the (float) values which does not make sense here

2024-03-23

ΔQ Modelling

IETF RFC for Quality of Outcome measure: https://www.ietf.org/archive/id/draft-ietf-ippm-qoo-00.html#name-motivation

2024-03-22

ΔQ modelling

This website provides measured latency between various AWS regions: https://www.cloudping.co/grid/p_90/timeframe/1D Here is the 90th percentile table measured over 1 day

Location code
Africa (Cape Town) af-south-1
Asia Pacific (Hong Kong) ap-east-1
Asia Pacific (Tokyo) ap-northeast-1
Asia Pacific (Seoul) ap-northeast-2
Asia Pacific (Osaka) ap-northeast-3
Asia Pacific (Mumbai) ap-south-1
Asia Pacific (Singapore) ap-southeast-1
Asia Pacific (Sydney) ap-southeast-2
Canada (Central) ca-central-1
EU (Frankfurt) eu-central-1
EU (Stockholm) eu-north-1
EU (Milan) eu-south-1
EU (Ireland) eu-west-1
EU (London) eu-west-2
EU (Paris) eu-west-3
Middle East (Bahrain) me-south-1
SA (São Paulo) sa-east-1
US East (N. Virginia) us-east-1
US East (Ohio) us-east-2
US West (N. California) us-west-1
US West (Oregon) us-west-2
Destination/Source Region af-south-1 ap-east-1 ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-southeast-1 ap-southeast-2 ca-central-1 eu-central-1 eu-north-1 eu-south-1 eu-west-1 eu-west-2 eu-west-3 me-south-1 sa-east-1 us-east-1 us-east-2 us-west-1 us-west-2
af-south-1 6.91 320.79 355.13 349.39 362.82 240.69 280.08 388.8 238.12 154.32 181.63 161.58 161.98 152.85 147.82 196.18 339.34 224.95 234.08 282.67 286.48
ap-east-1 353.5 6.53 57.86 39.89 51.19 101.3 48.54 135.74 206.96 201.8 233.83 191.88 226.92 216.34 209.57 133.79 318.68 222.54 193.63 157.52 148.71
ap-northeast-1 357.04 57.0 4.71 35.35 11.73 141.16 78.76 110.47 160.16 234.39 258.72 219.67 221.13 229.31 237.72 171.11 272.37 169.69 150.19 108.93 98.67
ap-northeast-2 342.64 40.49 38.28 5.22 27.85 128.82 78.77 142.83 183.28 236.07 265.7 222.74 239.39 250.87 244.29 166.76 295.54 199.51 171.02 135.3 121.67
ap-northeast-3 398.91 51.17 11.86 25.53 3.62 140.46 81.46 120.97 159.35 237.84 264.16 225.78 216.82 228.49 253.2 166.88 270.03 175.86 148.67 112.21 102.67
ap-south-1 268.63 97.02 142.17 128.78 140.23 4.68 61.88 155.04 219.75 125.12 152.89 110.42 141.55 133.98 125.89 39.05 319.87 205.78 213.54 233.2 220.49
ap-southeast-1 282.58 48.35 82.29 86.11 78.36 62.72 6.94 94.74 219.48 159.42 190.41 148.99 184.7 176.48 168.9 96.19 331.65 248.93 219.59 172.8 166.73
ap-southeast-2 392.73 137.1 112.98 144.36 121.0 156.23 96.11 6.89 201.35 252.08 284.6 246.8 258.17 267.91 285.22 189.59 314.17 200.23 191.46 142.6 144.18
ca-central-1 242.97 207.51 158.07 183.04 161.26 223.67 219.54 204.45 5.82 111.34 111.69 115.67 74.73 86.1 99.29 177.77 127.08 17.38 28.3 81.89 62.99
eu-central-1 158.05 200.13 229.96 234.17 235.68 122.82 161.43 251.53 106.91 8.34 36.87 13.96 28.71 17.43 14.87 94.74 206.26 95.8 101.64 152.45 155.12
eu-north-1 187.44 229.11 261.62 263.05 264.26 155.96 192.34 285.89 111.09 36.63 4.33 43.98 44.81 32.98 41.33 125.79 230.51 121.28 129.72 179.36 159.09
eu-south-1 167.73 190.82 222.26 221.93 225.87 111.5 149.82 242.25 117.29 15.73 45.78 4.36 37.15 27.89 21.85 113.29 213.83 102.82 110.54 161.34 162.03
eu-west-1 167.71 227.12 215.85 236.98 214.11 141.46 182.17 255.87 68.85 28.81 42.96 35.81 2.88 14.32 21.02 98.81 177.99 69.75 81.52 128.82 118.34
eu-west-2 158.81 219.75 227.55 249.59 224.92 135.17 175.83 266.33 80.41 18.66 33.61 26.71 14.78 2.43 11.93 89.61 204.08 94.1 100.42 150.42 131.47
eu-west-3 150.88 209.63 238.55 242.59 250.9 129.82 168.13 279.51 99.42 13.52 40.68 21.41 19.63 11.74 3.78 86.85 195.77 84.78 92.87 142.96 144.63
me-south-1 202.09 134.32 171.71 163.1 166.78 41.95 95.69 189.9 175.19 91.86 123.33 114.27 103.52 89.85 91.1 6.49 275.74 166.02 171.5 221.72 221.71
sa-east-1 342.64 317.45 272.49 291.29 270.53 320.97 332.45 313.25 125.59 207.67 230.07 214.59 178.57 206.02 195.6 277.95 4.51 118.14 126.25 175.74 176.07
us-east-1 229.81 218.75 171.08 193.94 178.66 205.98 252.04 203.12 18.65 94.55 120.76 101.77 73.54 97.77 90.4 166.61 119.75 6.37 14.79 67.03 65.94
us-east-2 237.95 192.85 151.43 173.89 154.12 218.68 219.75 190.71 31.46 101.6 135.62 114.36 83.48 106.93 92.94 169.45 130.57 12.37 8.97 52.83 58.4
us-west-1 287.61 158.41 109.41 133.19 109.51 229.13 171.3 140.01 81.31 153.2 179.39 159.43 128.08 150.25 143.83 219.65 173.88 63.3 55.26 4.17 22.01
us-west-2 289.92 147.82 98.54 123.17 100.34 221.72 162.92 142.49 63.42 154.99 160.0 161.99 118.65 132.59 144.09 223.0 175.19 67.6 51.93 23.32 5.75

Formula relating TCP throughput, window size and latency:

TCP throughput (bits/s) = TCP window size (bits) / Roundtrip latency (s)

owamp is a standard tool for measuring roundtrip latency: https://software.internet2.edu/owamp/owping.man.html

  • Feeling a bit like a hen in front of a knife, not sure what to do next with ΔQ
  • Will try to write some simple program to be able to quickly generate various models in Haskell, at least to be able to iterate over future models

2024-03-20

Improvements to Haskell and Rust testing and simulation

  • Improved fidelity of the node's network interface, so the bwbush/diffusion branch is ready for a congestion experiment, but still uses the February version of the pseudo-code.
  • Fixed Haskell and Run on main so CI build, tests, and simulations pass.
  • Merged main into bwbush/diffusion branch to create bwbush/diffusion-ci branch.

Block diffusion and resource tracking in peras-iosim

The diffusion branch now contains a generally faithful representation of pull-oriented block diffusion that mimic real-life block diffusion. Additionally, it now handles the latency and bandwidth of links along with the compute time at nodes. A provisional set of network, message, and CPU parameters provides coarsely representative modeling of delays in processing and tranmission.

AB - Work on ΔQ

Trying to make the test comparing the tabulation from a dx on a CDF and the raw QTA is poised to fail on values that lie at the boundary of the resolution of dx! Experimented with increasing the resolution and it's always the case the test fails at low resolutions, but it passes when the resolution is set to 0.001

Now trying to replicate graphs representing header + body diffusion which implies convolution of 2 CDF but it seems it does not work

  • The convolution is the operation needed to compute the PDF of 2 independent random variables. Given 2 CDF, the resulting CDF can be computed by first computing the PDF from the CDF through derivation, then doing the convolution, then computing integrating the resulting PDF to get a CDF.
  • Some link showing computation of a general convolution in Haskell: https://jip.dev/posts/naive-convolution-in-haskell/
  • As it's possible to compute a convolution faster using FFT, seems like this page could be useful: https://www.skybluetrades.net/blog/2013/11/2013-11-13-dat

Going to design models using original code from Neil as it seems the numerical computations in Artem's code are not working. Got stuck trying to model probabilistic choice among the avarious path lengths possible, as it seems the expression I am constructing naively does not make sense.

Here are the CDFs I get for block diffusion for 2 different average degree (10 and 15).

The DeltaQ expression takes into account data from section 4 of DQSD paper:

  • Block size of 64kB
  • Uniform distribution of RTT between short, medium, and long distance pairwise connections
  • Header/Body sequential transfer (eg. first request header, then request body)
  • Distribution of number of hops for block transfer depending on graph with 10 and 15 average degree (note the current valency target for cardano-node is 20)

Unfortunately the expression is not readable when showed:

∅ (0⇋1) δ⎣1.2e-2⎤ (1⇋2) δ⎣6.9e-2⎤ (1⇋1) δ⎣0.268⎤ ⊕ ∅ (0⇋1) δ⎣1.2e-2⎤ (1⇋2) δ⎣6.9e-2⎤ (1⇋1) δ⎣0.268⎤ ⊕ ∅ (0⇋1) δ⎣1.2e-2⎤ (1⇋2) δ⎣6.9e-2⎤ (1⇋1) δ⎣0.268⎤ ⊕ ∅ (0⇋1) δ⎣2.4e-2⎤ (1⇋2) δ⎣0.143⎤ (1⇋1) δ⎣0.531⎤ (3458764513820541⇋573001987789602947) ∅ (0⇋1) δ⎣1.2e-2⎤ (1⇋2) δ⎣6.9e-2⎤ (1⇋1) δ⎣0.268⎤ ⊕ ∅ (0⇋1) δ⎣1.2e-2⎤ (1⇋2) δ⎣6.9e-2⎤ (1⇋1) δ⎣0.268⎤ ⊕ ∅ (0⇋1) δ⎣1.2e-2⎤ (1⇋2) δ⎣6.9e-2⎤ (1⇋1) δ⎣0.268⎤ ⊕ ∅ (0⇋1) δ⎣2.4e-2⎤ (1⇋2) δ⎣0.143⎤ (1⇋1) δ⎣0.531⎤ ⊕ ∅ (0⇋1) δ⎣1.2e-2⎤ (1⇋2) δ⎣6.9e-2⎤ (1⇋1) δ⎣0.268⎤ ⊕ ∅ (0⇋1) δ⎣1.2e-2⎤ (1⇋2) δ⎣6.9e-2⎤ (1⇋1) δ⎣0.268⎤ ⊕ ∅ (0⇋1) δ⎣1.2e-2⎤ (1⇋2) δ⎣6.9e-2⎤ (1⇋1) δ⎣0.268⎤ ⊕ ∅ (0⇋1) δ⎣2.4e-2⎤ (1⇋2) δ⎣0.143⎤ (1⇋1) δ⎣0.531⎤ (826860891585223⇋8180338363155769) ∅ (0⇋1) δ⎣1.2e-2⎤ (1⇋2) δ⎣6.9e-2⎤ (1⇋1) δ⎣0.268⎤ ⊕ ∅ (0⇋1) δ⎣1.2e-2⎤ (1⇋2) δ⎣6.9e-2⎤ (1⇋1) δ⎣0.268⎤ ⊕ ∅ (0⇋1) δ⎣1.2e-2⎤ (1⇋2) δ⎣6.9e-2⎤ (1⇋1) δ⎣0.268⎤ ⊕ ∅ (0⇋1) δ⎣2.4e-2⎤ (1⇋2) δ⎣0.143⎤ (1⇋1) δ⎣0.531⎤ ⊕ ∅ (0⇋1) δ⎣1.2e-2⎤ (1⇋2) δ⎣6.9e-2⎤ (1⇋1) δ⎣0.268⎤ ⊕ ∅ (0⇋1) δ⎣1.2e-2⎤ (1⇋2) δ⎣6.9e-2⎤ (1⇋1) δ⎣0.268⎤ ⊕ ∅ (0⇋1) δ⎣1.2e-2⎤ (1⇋2) δ⎣6.9e-2⎤ (1⇋1) δ⎣0.268⎤ ⊕ ∅ (0⇋1) δ⎣2.4e-2⎤ (1⇋2) δ⎣0.143⎤ (1⇋1) δ⎣0.531⎤ ⊕ ∅ (0⇋1) δ⎣1.2e-2⎤ (1⇋2) δ⎣6.9e-2⎤ (1⇋1) δ⎣0.268⎤ ⊕ ∅ (0⇋1) δ⎣1.2e-2⎤ (1⇋2) δ⎣6.9e-2⎤ (1⇋1) δ⎣0.268⎤ ⊕ ∅ (0⇋1) δ⎣1.2e-2⎤ (1⇋2) δ⎣6.9e-2⎤ (1⇋1) δ⎣0.268⎤ ⊕ ∅ (0⇋1) δ⎣2.4e-2⎤ (1⇋2) δ⎣0.143⎤ (1⇋1) δ⎣0.531⎤ (844875290094705⇋281024616747919) ∅ (0⇋1) δ⎣1.2e-2⎤ (1⇋2) δ⎣6.9e-2⎤ (1⇋1) δ⎣0.268⎤ ⊕ ∅ (0⇋1) δ⎣1.2e-2⎤ (1⇋2) δ⎣6.9e-2⎤ (1⇋1) δ⎣0.268⎤ ⊕ ∅ (0⇋1) δ⎣1.2e-2⎤ (1⇋2) δ⎣6.9e-2⎤ (1⇋1) δ⎣0.268⎤ ⊕ ∅ (0⇋1) δ⎣2.4e-2⎤ (1⇋2) δ⎣0.143⎤ (1⇋1) δ⎣0.531⎤ ⊕ ∅ (0⇋1) δ⎣1.2e-2⎤ (1⇋2) δ⎣6.9e-2⎤ (1⇋1) δ⎣0.268⎤ ⊕ ∅ (0⇋1) δ⎣1.2e-2⎤ (1⇋2) δ⎣6.9e-2⎤ (1⇋1) δ⎣0.268⎤ ⊕ ∅ (0⇋1) δ⎣1.2e-2⎤ (1⇋2) δ⎣6.9e-2⎤ (1⇋1) δ⎣0.268⎤ ⊕ ∅ (0⇋1) δ⎣2.4e-2⎤ (1⇋2) δ⎣0.143⎤ (1⇋1) δ⎣0.531⎤ ⊕ ∅ (0⇋1) δ⎣1.2e-2⎤ (1⇋2) δ⎣6.9e-2⎤ (1⇋1) δ⎣0.268⎤ ⊕ ∅ (0⇋1) δ⎣1.2e-2⎤ (1⇋2) δ⎣6.9e-2⎤ (1⇋1) δ⎣0.268⎤ ⊕ ∅ (0⇋1) δ⎣1.2e-2⎤ (1⇋2) δ⎣6.9e-2⎤ (1⇋1) δ⎣0.268⎤ ⊕ ∅ (0⇋1) δ⎣2.4e-2⎤ (1⇋2) δ⎣0.143⎤ (1⇋1) δ⎣0.531⎤ ⊕ ∅ (0⇋1) δ⎣1.2e-2⎤ (1⇋2) δ⎣6.9e-2⎤ (1⇋1) δ⎣0.268⎤ ⊕ ∅ (0⇋1) δ⎣1.2e-2⎤ (1⇋2) δ⎣6.9e-2⎤ (1⇋1) δ⎣0.268⎤ ⊕ ∅ (0⇋1) δ⎣1.2e-2⎤ (1⇋2) δ⎣6.9e-2⎤ (1⇋1) δ⎣0.268⎤ ⊕ ∅ (0⇋1) δ⎣2.4e-2⎤ (1⇋2) δ⎣0.143⎤ (1⇋1) δ⎣0.531⎤ (9006298534815517⇋900719925475) ∅ (0⇋1) δ⎣1.2e-2⎤ (1⇋2) δ⎣6.9e-2⎤ (1⇋1) δ⎣0.268⎤ ⊕ ∅ (0⇋1) δ⎣1.2e-2⎤ (1⇋2) δ⎣6.9e-2⎤ (1⇋1) δ⎣0.268⎤ ⊕ ∅ (0⇋1) δ⎣1.2e-2⎤ (1⇋2) δ⎣6.9e-2⎤ (1⇋1) δ⎣0.268⎤ ⊕ ∅ (0⇋1) δ⎣2.4e-2⎤ (1⇋2) δ⎣0.143⎤ (1⇋1) δ⎣0.531⎤ ⊕ ∅ (0⇋1) δ⎣1.2e-2⎤ (1⇋2) δ⎣6.9e-2⎤ (1⇋1) δ⎣0.268⎤ ⊕ ∅ (0⇋1) δ⎣1.2e-2⎤ (1⇋2) δ⎣6.9e-2⎤ (1⇋1) δ⎣0.268⎤ ⊕ ∅ (0⇋1) δ⎣1.2e-2⎤ (1⇋2) δ⎣6.9e-2⎤ (1⇋1) δ⎣0.268⎤ ⊕ ∅ (0⇋1) δ⎣2.4e-2⎤ (1⇋2) δ⎣0.143⎤ (1⇋1) δ⎣0.531⎤ ⊕ ∅ (0⇋1) δ⎣1.2e-2⎤ (1⇋2) δ⎣6.9e-2⎤ (1⇋1) δ⎣0.268⎤ ⊕ ∅ (0⇋1) δ⎣1.2e-2⎤ (1⇋2) δ⎣6.9e-2⎤ (1⇋1) δ⎣0.268⎤ ⊕ ∅ (0⇋1) δ⎣1.2e-2⎤ (1⇋2) δ⎣6.9e-2⎤ (1⇋1) δ⎣0.268⎤ ⊕ ∅ (0⇋1) δ⎣2.4e-2⎤ (1⇋2) δ⎣0.143⎤ (1⇋1) δ⎣0.531⎤ ⊕ ∅ (0⇋1) δ⎣1.2e-2⎤ (1⇋2) δ⎣6.9e-2⎤ (1⇋1) δ⎣0.268⎤ ⊕ ∅ (0⇋1) δ⎣1.2e-2⎤ (1⇋2) δ⎣6.9e-2⎤ (1⇋1) δ⎣0.268⎤ ⊕ ∅ (0⇋1) δ⎣1.2e-2⎤ (1⇋2) δ⎣6.9e-2⎤ (1⇋1) δ⎣0.268⎤ ⊕ ∅ (0⇋1) δ⎣2.4e-2⎤ (1⇋2) δ⎣0.143⎤ (1⇋1) δ⎣0.531⎤ ⊕ ∅ (0⇋1) δ⎣1.2e-2⎤ (1⇋2) δ⎣6.9e-2⎤ (1⇋1) δ⎣0.268⎤ ⊕ ∅ (0⇋1) δ⎣1.2e-2⎤ (1⇋2) δ⎣6.9e-2⎤ (1⇋1) δ⎣0.268⎤ ⊕ ∅ (0⇋1) δ⎣1.2e-2⎤ (1⇋2) δ⎣6.9e-2⎤ (1⇋1) δ⎣0.268⎤ ⊕ ∅ (0⇋1) δ⎣2.4e-2⎤ (1⇋2) δ⎣0.143⎤ (1⇋1) δ⎣0.531⎤

so I need to resort to visual inspection of the output graph for validation :(

2024-03-19

AB - Work on ΔQ

Copied Artjom's code into peras repo to start working on a ΔQ model for Peras. the idea is to gradually refactor the existing code into something that can be more widely shared and published as an open-source repository, adding tests, utilities, and integrating Peter's Piecewise polynomial model. As this code provides a way to generate both CDF diagrams and outcome graphs (albeit through slightly different languages that should be unified), this will be a good basis to share results and simulations.

Implementing a function for transforming a list of Quantitative Time assessment values, eg. a monotonically increasing list of pairs of probability/value, into an A expression, and struggling to define a proper test for that. My current idea is to generate arbitrary QTAs and, given an arbitrary value, check the CDF computed from the expression gives a probability that's consistent with the value and the original list of QTA

There's something wrong with the way tabulate is computed for NCDF. It says:

  tabulate :: t -> R t -> [(I t, R t)]

where the first element should be in the unit interval and the second span the possible values for which we are defining probabilities. But it seems that what's computed is the opposite!

Looks like tabulate should indeed be defined as :

tabulate :: t -> R t -> [(R t, I t)]

This raises an interesting "design" issue: Even though we have strong types in the definition of the interfaces, we are practically using Double in both places which makes it irrelevant! We are using directly tabulate to draw diagrams though, so it should have raised a flag that the x and y values are not in the right position...

Still struggling to get my tests right though, to define a good property test for the fromQTA function. I need to work with intervals and sum values which is more or less what the code already does, and it's a bit annoying. I think I need to construct intervals from the simple (Prob, Value) list I have so that I can know where a random value falls in.

Team meeting

  • Discussing ΔQ modelling, network data

    Arguments sound like navigating in the XVth century -> you're gonna fall off a cliff

  • Observability layer for the node -> logs all the state transitions

  • Agenda:

    1. March protocol update
    2. Node/QC interface
    3. Agenda for next week's demo

Agda model:

  • Andre is working on a UC framework in Agda -> used by Genesis

  • 2 different approaches, crypto research approach + CT expression

  • Weekly FM consensus meeting

  • Keeping dangling votes around in local state was a mistake, local state = BlockTree

  • Messages : receive vote, cert, chain

    • validity of messages is a property of the block tree -> discarded if invalid, eg. no state change
  • Chain weight function is simpler

  • Switch to not using google doc pseudo-code anymore and use only Agda

  • Some issues w/ cryptographic primitives we need to highlight in the tech report:

    • comittee selection -> how should it be done in practice?
    • certificate generation and verification -> need new keys? interactive?
  • Refactoring on simulation code

    • PerasNode is a typeclass/interface
    • NodeContext -> provide various static utilities (including tracer)
    • Abstract the details of the protocol state machine
    • Wiring / node wrapper
    • NodeStats -> computed, monoidal
    • NodeResult -> wakeup time => represent the node's business , node is abstracted away from STM stuff, it's a pure state machine
    • plugged into any environment -> network layer would deal with all this
    • Protocol -> contains the pure function primitives of the protocol
  • Network:

    • contains a single queue of all messages for all nodes, single threaded loop pulling one message, letting the node react and updating its state
    • Events -> Compute event, interesting to have some timeline of the node's actions
  • March Monthly demo agenda:

    • Updated protocol (with certs) -> show diagram
      • highlight some "new" properties of the protocol, eg. smaller headers + certs verif/generation
      • simpler chain weight and chain decision -> only change on new chain/certs (eg. no need to decide on every new vote)
    • Update Agda model -> BlockTree + focus on properties
    • Block congestion experiment? => still on Feb protocol version
    • Quviq's early work -> a short presentation (ask Max/Ulf)
    • DeltaQ -> early model for Praos (+ Peras)
      • real world data from Markus/pooltool?
    • Draft Technical report

Formal specification in Agda

  • March version of Peras specification in Agda
  • The blocktree is the only local state. It is an abstract data type specified by properties

2024-03-18

Innovation stream steering committee

Shared a few issues with the steering committee regarding relationship between Engineering and Research in the context of innovation projects.

  1. How to improve communication between the 2 departments in order to shorten feedback loop and ensure we both keep up with each other's work?
  2. How to make the formal modelling effort useful both for engineering and for research?
  3. How to maximize reuse of tools across teams and departments (eg. not duplicate effort in implementing network simulation)?

AB - Understanding newer version of Peras

Trying to understand better the newest version of the protocol, I drew some diagram:

Peras with Certificates

Also shared my understanding of the potential benefits of ΔQ analysis:

  • My understanding is that the Delta-Q methodology was applied for the development of Shelley and lead the engineering team behind this development to fix the critical parameter Δ to 5s. This parameter is very important because the security argument of Ourobouros Praos, as far as I understand it, rests on partial synchrony assumption about network diffusion, and the actual value of k is derived from this bound
  • The aforementioned paper section 4 describes what I think is the process that lead to this definition, namely the modeling of various characteristics of the network at a high-level:
    • Average time of TCP transmission for various block sizes gives a baseline for a single hop diffusion assuming uniform distribution of various classes of distances (small, medium, long), yielding a CDF for one block on one hop
    • The characteristics of the network (number of nodes, connectivity of each node) yields a probability distribution for the number of hops a single block needs to go through to travel between any 2 arbitrary pair of nodes
    • The Delta-Q language provides combinators and their semantics to build a global cumulative distribution function from those building blocks that tells us what to expect depending on various packets size, using some numerical operations on the individual CDFs
    • This modeling also takes into account the probability of failure yielding so-called Improper Random Variables and a global distribution of the probability of failure
  • In the case of Shelley, I think this modeling was done somewhat informally and one of the thing I have asked @neil and @Peter (PNSol) is to dig or rebuild a Delta-Q model for Praos as we'll need it to contrast it with and without Peras
  • While I must confess I was initially quite puzzled by Delta-Q and it did not "click", reading the paper and poking around on https://pnsol.com website for other articles lead to some kind of "illumination", and ultimately helped me appreciate the potential benefits of this approach. Having the ability to quickly model the expected performance profile a complex networked system is, IMHO, invaluable to quickly weed out infeasible solutions.
  • This does not preclude other approaches, and in particular I think Delta-Q modeling complements more "traditional" network simulation approaches, and one reinforces the others: If the simulation results are widely off w.r.t Delta-Q, that's strong evidence something might be off in our reasoning, in either or both models.
    • In Peras, we are also building simulations as they help us refine our understanding of the system we plan to build, and they are amenable to finer grained analysis and experiments. For example, simulations would allow us to test the system under various adversarial behaviours in a more precise way than the Delta-Q model would allow
    • Moreover, it could be the case that we can merge both approaches in order to simplify and scale up the simulation's handling of messages: If we have a Delta-Q model as an IRV, we can use that model as a function to compute the delay (and possible loss) for all messages emitted by nodes in the simulation without having to explicitly simulate individual links.
  • It might be the case Delta-Q is not suitable for modelling FastBFT, that's a call I am unable to make of course. And that's perfectly fine.

2024-03-17

Redesign of Peras IOSim

The bwbush/diffusion branch contains a major upgrade of peras-iosim:

  1. The PerasNode typeclass and the SomeNode type separates interface from interface.
  2. Peras.IOSim.Nodes.Honest implements the honest node.
  3. NodeState has been split into Node, NodeContext, NodeResult, and PerasNode, so that static, dynamic, and result fields are separated.
  4. Detailed event logging is provided by Event and NodeStats.
    • CPU resource consumption is now computed and log.
    • Bytes transmitted and received are tracked.
    • Slot leadership and committee membership are logged.
    • Rollbacks are recorded.
    • The sending, receiving, and dropping of messages are stored.
  5. Since IOSim is single-threaded anyway, the use of STM has been eliminated, through it can be cleanly reintroduced if needed.
  6. Time progresses in sub-increments of a slot.

Next steps:

  • Rework peras-quickcheck and peras-rust so they are compatible.
  • Finish implementing proper diffusion of blocks.

2024-03-13

Formal specification in Agda

  • The formal specification in Agda is inspired by the work of Søren Eller Thomsen and Bas Spitters summarized in the research paper "Formalizing Nakamoto-Style Proof of Stake". In this paper a formalization for a PoS protocol like Ouroboros Praos is presented together with the proofs for safety and liveness. For the proofs they assume 2/3 of the parties to be honest, whereas using the characteristic string technique 1/2 is sufficient. This raises the following questions:

    • Can the technique described in the paper be extended easily and used for Peras as well with the 2/3 honest bound
    • Do we need a formalization of the characteristic string technique anyway, in order to express 1/2 honest bound
  • The dangling votes are now kept explicitly in the local state. An other approach would be to add the dangling votes as well to the block tree and therefore hide them in the abstraction. Formalizing both approaches and showing that they are equivalent might be used as an argument that the dangling votes are not needed.

Meeting w/ Quviq

We go through a tour of the codebase with Quviq.

Next steps: Design a simple example how a state machine in Agda could be reused in q-d to model an actual system's behaviour

Meeting w/ PNSol

While discussing the details of Peras, we realise we need to make the protocol clearer, visualising how it unfolds over time in a more precise way than how it's currently done.

Note: Maximum header size of 1400 bytes is baked in the consitution of Cardano => need a constitutional change to increase it!

If the header increases beyond a MUT, we'll be unable to meet the 5s threshold (need to make that more precise through ΔQ modeling)

In practice:

  • diffusion is 400-600ms for small blocks
  • no forks longer than 3 blocks since shelley
  • code is optimised for the worst case => adversary cannot do much, system needs to do less work than the adversary
  • Grinding attacks => large number of k

Tuning the ΔQ analysis depending on the header size => increase it and shows => quantify opportunity

  • Having a predictable window for vote casting might be an attack vector

  • We should avoid averaging over a round and stick to worst case analysis

  • Even a single larger header every minute => increase faults

  • Karl has a Dapp provider providing "canary" for settlement time

  • https://pooltool.io => provides observation of forks (survivor bias?)

  • We could do an offline analysis from mainchain to use as a baseline for current work

  • Isn't a case to be made that Praos "dominates" Peras from a game theory perspective on the worst case behaviour?

  • If the voting window is too big in the past => Praos will already have settled it

  • current valency ~ 20 => probability of split brain is negligible

  • if a whole continent's internet fails we have other problems to cater for than Cardano's settlement time

Next steps:

  • Provide details of the timeline of messages flowing between nodes to PNSol
  • Draft a ΔQ analysis

Plan for Code:

  • How to present tool to potential users -> Jupyter notebook
  • Backend -> piecewise polynomial -> lack thorough testing
  • Provide a consistent API?
  • link language for modelling and simulating => Artjoms' code

PeerSim meeting

  • PeerSim is the ancestor of PeerNet and was exclusively cycle-driven

  • Peernet: https://github.com/PeerNet/PeerNet

    • provide pluggable protocol/node
    • event driven only
    • deterministic and reproducible simulations
    • interaction is only through message passing (use real world/synthetic latency)
    • defines "adversaries" Control class
    • can simulate several protocols at the same time
  • Simulation mode = simulated time

  • Emulation mode = 1 thread/node -> scale down experiment, actual time, can identify actual race conditions, non-deterministic

  • Net mode = real-world execution and actual network connection

    • uses UDP message transport
    • can scale to 100s of 1000s of nodes
  • lots of simulation on block dissemination in Cardano

    • take into account the validation time
    • used ΔQ model -> break a block into packets and use RT time to distritbute the blocks at that time

Discussing how we could collaborate?

  • Share a common simulation description language?
  • Generate JS code from Agda code to interpret it in the JVM (!)

Next steps: Share code about block diffusion simulations

Weekly meeting

We discuss the expected property of Peras and how to formulate it: Under good circumstances, a tx will be adopted much faster, eg. Txs are buried under enough weight that even a large adversarial party won't overthrow the chain. It could be the case that "the adversary" is dormant, eg. it behaves as honest node(s) until there's an opportunity that arises.

We need to write scenarios under different regimes and emphasize the behaviour of the system while transitioning as the core difference is when going from good to bad: Peras won't rollback/fork as deep as Praos.

Practical engineering question is: How much? And what's the value of this property?

2024-03-11

Designs for sync protocol

  1. Simple handoffs between client and server
    • ➕ Closely corresponds to Agda Message
    • ➕ Client could use blocking calls to tidily process messages
    • FetchChain does not stream, so another FetchChain request must be made for each subsequent block header in the new chain
    • ➖ Cannot handle FetchVotes or FetchBlocks when multiple hashes are provided, so all queries must be singletons
  2. Messy multiplexing
    • ➕ Similar to how we currently use incoming and outgoing STM channels
    • ➖ Incoming messages won't be in any particular order
    • ➖ Client needs to correlate what they received to what they are waiting for, and why - maybe use futures or promises with closures
  3. Sequential mini-protocols
    • ➕ Reminiscent of the production Ouroboros design
    • ➖ Client needs to Cancel and re-query when they want a different type of information
  4. Parallel mini-protocols
    • ➕ Separate threads for each type of sync (header, vote, block)
    • ➖ Client needs to orchestrate intra-thread communication
  5. Constrained fetching
    • ➕ Supports the most common use case of fetching votes and bodies right after a new header is received
    • ➕ Reduces to a request/replies protocol if the protcol's state machine is erased or implicit
Design 1 Design 2 Design 3 Design 5
Simple handoffs Multiplexing Mini-protocols Constrained fetching

These highlight some key issues:

  • FetchVotes and FetchBlocks trigger multiple responses, as FetchChain may also do.
  • Three types of information are being queried (headers, votes, blocks) in only a semi-predictable sequence.
  • The DoS attack surface somewhat depends upon when the node yields agency to its peers.
  • Pull-based protocols involve more communication than push-based ones.

A simple request/replies protocol would avoid this complexity:

  • If we abandon the idea of a typed protocol based on a state machine, we can just have one request receive many replies.
  • That's quite similar to the fifth design, but with no Next* request
  • We can similarly abandon the notion of when the client or server has agency.
  • If the client sends a new request before the stream of responses to the previous request is complete, then responses will be multiplexed.

The state-machine representation of Ouroboros and its handling of agency were for proof and security that maybe aren't in scope for Peras prototyping.

Team sync

Interesting metrics for Peras:

  • Fraction of the time (chain) spent in cooldown period
  • Forks statistics
  • What fraction of cast votes reached the slot leader?
  • How many votes expired?

AB - Rust node #64

There's a problem with the way netsim defines the Socket interface and how we model network interactions:

  • At the node level we have an InEnvelope and OutEnvelope data strutcures that represent respectively incoming messages and outgoing messages
  • At the network level, one is transformed into the other
  • .... but a Socket in netsim has only one type of Messages transported!

Catch-up Rust code with changes in main, back to sane state: The network runs and some messages are dispatched but we don't observe any chain being formed.

Now propagates NewChain messages to downstream peers and adopt chains when receiving it!

  • Seems like spawning many many threads without stopping them is not a good idea as the test fails with thread spawn failing! Not very surprising as the stop() method for network is commented out...
  • Successfully stops nodes, but the test fails. Adding noShrinking helps to avoid endless loop and shows the problem is pretty simple: We always retrieve an empty chain from all nodes which fails the condition for the test

Got a working praos rust node that passes one instance of the NetworkModel test.

  • The inner workings are quite complicated as there are a lot of different threads interacting through queues and netsim sockets.
  • Need to move to actix framework to simplify the nodes code and let them use directly the netsim socket instead of having to handle rx/tx queueus.

Now the NetworkModelSpec tests pass: I had to introduce a delay for each tick in the driver's side to prevent the queueues from being flooded and give time to each node to react. This should be replaced by better queue discipline, applying back pressure to the caller when they are full.

2024-03-09

"Split-Brain" Experiment

This first "split-brain" experiment with peras-iosim involves running a network of 100 nodes with fivefold connectivity for 15 minutes, but where nodes are partitioned into two non-communicating sets between the 5th and 10th minute. The nodes quickly establish consensus after genesis, but split into two long-lived forks after the 5th minute; shortly after the 10th minute, one of the forks is abandoned as consensus is reestablished.

Nodes are divided into two "parities" determined by whether the hash of their name is an even or odd number. When the network is partitioned, only nodes of the same parity are allowed to communicate with each other: see Peras.IOSIM.Experiment.splitBrain.

Metric Blocks Slots
Length of discarded chain at slot 1000 75 1000
Length of dominant chain at slot 1000 66 1000
Number of blocks in discarded chain after slot 1000 1-3 18-137
Number of blocks afters slot 1000 to reach quorum 18 304

Forking and reestablishment of quorum in Peras split-brain experiment

Findings:

  1. The complexity of the forking, voting, and cool-down in the Peras results highlights the need for capable visualization and analysis tools.
  2. The voting boost can impede the reestablishment of consensus after a network partition is restored.
  3. It would be convenient to be able to start a simulation from an existing chain, instead of from genesis.
  4. VRF-based randomization make it easier to compare simulations with different parameters.
  5. Even though peras-iosim runs aren't particularly fast, we probably don't need to parallelize them because typical experiments involve many executions of simulations, which means we can take advantage of CPU resources simply by running those different scenarios in parallel.
  6. The memory footprint of peras-iosim is small (less than 100 MB) if tracing is turned off; with tracing, it is about twenty times that, but still modest.

2024-03-08

Message Interface and Event Logging

  • Adga types for messages and events are now essentially faithful to the chain-sync protocol.
  • Implemented event logging in peras-iosim using the contra-trace approach.
    • All sending and receiving messsages are logged.
    • Arbitrary JSON can also be logged.
    • The CLI can either collect or discard events.

AB - Rust Node

Struggling with the handling of random seed for network generation and execution.

  • When we create the network, we pass Parameters which contain a u64 seed, and also another seed which does not make sense.
  • This u64 needs to be transformed into a StdGen in order to be useful for generating data, but we cannot change the type of randomSeed in Parameters because it needs to be serialisable.

The model for Socket in netsim is UDP rather than TCP, eg. not connection oriented but message oriented -> no need to connect nodes

Do I really need to separate the Node from teh NodeHandle?

  • Tried to get rid of the split between Node and NodeHandle but got stuck again by the thread spawn issue: The self referenced needs to be moved and this makes it unusable afterwards.

Seems like actix is really the way to go for serious actor-like modelling

  • Will try to have a first crude version of the Netsim-based network working and then refactor to use actix.

Running the test now does not crash but eats up 500% CPU!

  • Seems like it works, I can see the nodes starting/stopping and the message being broadcast but of course nothing happens because the nodes don't receive the message! I need to actually wire the socket handling in the network code.

2024-03-07

Impact of Uneven Distribution of Stake

Here is a rough analysis of the effect of staking centralization and multi-pool operation upon the Peras committee size.

Let's say that we set the lottery probability under the assumptions of 22.8e15 lovelace being staked equally by 3029 stake pools so that the committee size is 200. The formula for the probability that a stake pool will be on the committee is $p_\text{committee} = 1 - (1 - p_\text{lottery})^\text{stake}$, where $p_\text{lottery} = 9.103828801926284 \cdot 10^{-15}$ is the probability that the holder of a single lovelace will be elected to the committee and $\text{stake}$ is the number of lovelace the pool has staked.

The actual staking distribution is not uniform over stakepools. If we use historical data from epoch 469, the mean committee size would be 173.3. So, the uneven staking levels reduce the committee size by 13%.

However, some multi-pool operators run many stake pools. Binance, for instance, runs several dozen pools. Using the data at https://www.statista.com/statistics/1279280/cardano-ada-biggest-staking-pool-groups/, we compute that number of unique operators in the committee would reduce its mean size to just 76.6, where the multi-pool operators control 13.9 votes and the single-pool operators control 62.9 votes.

pool votes
SPO 62.92786759845929
Binance 30.489730311076144
New 7.882970991288584
Adalite 9.32793551428378
1PCT 9.973457954408731
New Girl 5.53677941289975
MS 5.646352580388863
Etoro 6.114357535725741
Emurgo 5.838590839441291
EVE 5.022310339902703
New Guy 5.3873765779532
Wave 4.834000742214385
LEO 3.95591191080253
CCV 2.8533879053049582
Bloom 2.8009939694400066

Thus, the concentration of coins in multi-pool staking operations radically shrinks the committee size.

YH Agda formalization

  • Refactoring of how dangling votes are handled in Agda: Rather than delegating the dangling votes to the blocktree (abstract data type), they are kept explicitly in the local state and taken into account when needed
    • This helps with the separation of local state and what is considered on chain, i.e. in the blocktree

Peras Netsim-based Simulation

Finished work on tallying rx/tx bytes per node in the IOSim network simulation. This won't be anywhere near accurate right now as we don't really receive and sned blocks, but just want to make sure we tracks these and display them.

Getting back to work on peras-rust, now want to integrate netsim library and model a network of nodes in order to pass the NetworkModelSpec test Looking at the fastbft prototype to get some inspiration on how netsim is used there

Got sidetracked fixing some discrepancy between local environment using agda2hs 1.3 which is unreleased, vs. Nix/CI which uses 1.2. There were also a bunch of generated types that were not up-to-date anymore which lead to more fixes in the code.

Then ran into disk storage shortage again for CI, and had to nix-garbage-collect -d to reclaim some space, plus spend some time to fix it in the longer run. This page list some configuration tweaks to enable auto-optimization which I Shall try to setup: https://nixos.wiki/wiki/Storage_optimization

nix.optimise.automatic = true;
nix.optimise.dates = [ "03:45" ];

Seems like the option I want is min-free as the others are relevant for nixOS only

Configured nix to have min-free of 1GB and then had to reload daemon

sudo launchctl stop org.nixos.nix-daemon
sudo launchctl start org.nixos.nix-daemon

Worked out top-down as usual:

  • Implemented the haskell side of the netsim network interface, defining an IO Simulator structure, then the Haskell wrappers and ultimately the C stubs. Main question at this stage was about marshalling more complex data structures, eg Topology and Parameters between Haskell and C/Rust, now settling for ToJSON/FromJSON conversions but this is slow and cumbersome so we might want a better way to share the data structure, probably defining senseible Storable instances that could cross the FFI boundary easily
  • Then implemented the Rust side, starting from the foreign "C" functions exposed by the library and started implementing lifecycles functions
  • Main concern is how to properly construct the simulation's network and manage the nodes. ce-fastbft uses an actor library to handle the dispatching of messages which seems like a good idea, so I probably would want to move the NodeHandle to be a proper actor. I need to build the links between nodes according to the Topology and Parameters given by the test driver
  • A next hurdle will be to retrieve a node's best chain on demand, which implies access to the node's internal states => store it as part of the Idle messages returned by a node when it's done processing

Peras IOSim Improvements

  • Implemented handling SomeBlock message.
  • Revised to use the latest version of the Vote and Message types.

2024-03-06

Thorough Validation in Peras IOSim

  • Implemented validation of incoming chains, blocks, and votes in peras-iosim.
  • Discarding of invalid votes when computing chain weight.
  • Detection and discarding of equivocated votes.
  • Construction of type BlockTree = Data.Tree Block from [ChainState].

The additional rigorous checking of validity throughout the node's process adds a significant computational burden. We'll need aggressive memoization for some validation-related computations in order to have efficient Haskell and Rust nodes. Naive implementations will call chainWeight, voteInRound, quorumOnChain, and validVote repeatedly with the same input; new messages require recomputing small portions of these, but many of the previous computations can be retained. The new indexing in ChainState made things quite a bit more efficiently already, but we'll probably have to add memoization to it, too. We need to evaluate appropriate techniques for this because we won't want the memoization table to grow too large over the course of a simulation.

AB+BB Pairing

  • Worked on refining the Topology descriptor and aligning its representation in Rust and Haskell
  • Added a messageDelay and reliability on each NodeLink represented in the topology file
    • These could be further abstracted away later on using ΔQ distribution?
  • Added golden tests for the relevant types
    • We added those manually to peras-hs but the types should probably live in Agda and get generated
    • We reuse the golden tests on the Rust side to ensure consistency

Team meetings

Discussing the problem of dangling votes and equivocation -> vote tracking is problematic in general in the model

  • When looking at the history of the chain, you don't know if someone cheated -> decision is local

Also: How much votes can be in the dangling set?

  • About 2x committee size ~ unbounded nr of rounds but in practice 2 or 3 at most. As a node can only vote once per round, this bounds the number of votes to committee size, times the time it takes to forge a new block including all those votes

Weekly peras meeting

Q: We need to distinguish worst case vs. average case

  • Peras improves the average case, not the worst case -> this is an interesting question
  • But what does worst case means? -> there's a difference b/w worst case block load (eg. from current 50% to 80+%) and adversarial worst case (eg. adversary power close to 50%?)

Discussing the situation of dangling votes -> could be the case that we don't need the dangling votes for selecting chain

  • Sandro => will check whether it's still the case we need it in the algorithm

Dangling votes counting for block selection can have an adversarial effect: a node can flip-flop between blocks because of votes arriving that change the decision which chain is heaviest

  • This could be a form of attack from an upstream adversary?
  • It's also problem from implementation pov because it implies there's more state to manage and carry around, and more blocks/votes to cache for possibly a long time

2024-03-05

Peras IOSim improvements

Further improved validation logic in peras-iosim along with the monad stack.

  • Fixed drawing of network topology
  • Switch to using VRF-based random numbers
  • Enumeration type Invalid for tracking different kinds of validation failures
  • Exception handling within protocol functions
  • Easy collection of MonadSay output

Also,

  • Revised UML activity diagram, based on feedback from Sandro
  • Fixed LD_LIBRARY_PATH in Nix shell

On ΔQ

Played a bit over the week-end with existing ΔQ libraries, trying to reproduce the basic charts from the paper. I have tried various codebases:

  • https://github.com/DeltaQ-SD/dqsd-piecewise-poly is supposed to be the latest implementation by Peter, based on Piecewise polynomials implementing the full spectrum of ΔQ language operators (eg. convolution, all-to-finish, any-to-finish). It kind of works but when I tried to generate the CDF at 50 sample points for a distribution built from 5 convolution of a basic transfer function, all the probabilities were set to 0

  • Peter suggested I try https://github.com/DeltaQ-SD/dqsd-wip-jacob which is a reimplementation by an intern but it lacked some basics (eg. convolve = undefined in the source) and seemed unfinished. It could be the case some code has not been pushed or merged. It provides some type classes along with a symbolic backend and a polynomials backend to implement ΔQ language semantics

  • I ended up using https://github.com/abailly-iohk/pnsol-deltaq-clone which is historical code from Neil implemented the langauge with step "functions" and random sampling using statistics package. This code works and I was able to produce the following graph:

    ΔQ example

  • There is some code in Agda, Haskell, and Python that's been written by Artjoms in this repo to support a follow-up paper on Algebraic reasoning but it's unclear what state it's in and I haven't tried to model the examples with it. It seems it supports a numpy-based backend for easy plotting, approximating the distribution with a vector of sample values, plus a Haskell version (slow?)

2024-03-04

Rewrite of peras-iosim

A major refactoring and rewrite of peras-iosim:

  • Sub-algorithms for Peras protocol are now cleanly separated into pure or monadic functions.
  • The implementation is now putatively faithful to the Peras pseudo-code.
    • The pseudo-code seems incorrect for two aspects of voteInRound.
      • A quorum will never be reached unless we define quorumOnChain(_, _, 0) = True.
      • The ellipses in the pseudo-code document should not be present, since otherwise a quorum can fail permanently.
    • The "weight to cut off for common prefix" is not yet implemented, but it is mostly inconsequential.
  • The new ChainState type efficiently indexes blocks, votes, dangling blocks/votes, and rounds, which also tracking the preferred chain.
    • Implementation runs faster.
    • Much smaller memory footprint.
  • Fixed handling of the simulation-termination condition.
    • The peras-quickcheck-tests run much faster and never deadlock.
  • A proxy for the VRF has been added.
  • Added support for SVG graphics.

Next steps:

  • Update UML diagrams.
  • Properly model block and vote diffusion via a "pull" process instead of the current "push" one. This is critical for any realistic performance study.
  • Add unit tests, especially for ChainState functions.
  • Implement the population of BlockTree and revise observability.
  • Define language-independent schemas for scenario definition, observability, and visualization.
  • Evaluate adding exception types specific to each kind of validation failure that can occur in the pseudo-code.
  • Revise time management in simulation and parallelize its execution.

2024-03-01

YH Agda formalization

  • Adjusted the small-step semantics to include relations per party. This diverges from the PoS-NSB paper, but has to the following advantages:
    • No need to keep track of the global execution-order
    • No need for the PermParties rule, permutations of the messages are sufficient
    • No need of the Fold relation
    • Progess does not have to be tracked
  • Added an initial weight function for the chain

AB on peras-rust

Rust node is panicking with some incoming messages:

called `Result::unwrap()` on an `Err` value: Error("unknown variant `inMessage`, expected `Stop` or `SendMessage`", line: 1, column: 12)

Need to check the serialization of InEnvelope and OutEnvelope as well

Rust code fails to deserialise the following message:

Failed to deserialise message: {"inMessage":{"contents":1,"tag":"NextSlot"},"origin":null,"tag":"InEnvelope"}

Then more issues serializing messages:

called `Result::unwrap()` on an `Err` value: Error("cannot serialize tagged newtype variant Message::NextSlot containing an integer", line: 0, column: 0)

This is surprising as the compiler does not balk at the serde annotations. According to this documentation

This representation works for struct variants, newtype variants containing structs or maps, and unit variants but does not work for enums containing tuple variants. Using a #[serde(tag = "...")] attribute on an enum containing a tuple variant is an error at compile time.

However it does fail compilation 🤷 Seems like I need Adjacently tagged variants

Solved the Haskell $\rightarrow$ Rust side at least for NextSlot messages, now solving the other direction:

       Failed to deserialise received message ("{\"tag\":\"Idle\",\"timestamp\":\"2024-03-01T08:12:24.653742Z\",\"source\":\"N1\",\"bestChain\":{\"blocks\":[],\"votes\":[]}}"): Error in $.source: parsing Peras.Message.NodeId(MkNodeId) failed, expected Object, but encountered String

Turned NodeId into a newtype upon code generation and refreshed the golden files and serialization to use deriving newtype, still need to fix some variant names

Now, seems like there's an additional level of indirection in the OutEnvelope's messages. After fixing (some) serialization issues, I now get a genuine and interesting error:

  test/Peras/NodeModelSpec.hs:25:5:
  1) Peras.NodeModel, Netsim Honest node, mints blocks according to stakes
       Assertion failed (after 2 tests):
         do var0 <- action $ Tick 1861
            action $ ForgedBlocksRespectSchedule [var0]
            pure ()

         Actual: 5, Expected:  18.610000000000003
         Stake: 1 % 10, active slots: 0.1, Slot: 1861

Another interesting error this time in the CI: https://github.com/input-output-hk/peras-design/actions/runs/8108712754/job/22162439028?pr=45#step:5:147: the size of the buffer allocated to receive messages from the node is too small

There's something fishy going on in the way we get the best chain from a node, as test fails with the following error:

         Chain: [{"creatorId":1,"includedVotes":[],"leadershipProof":"a04ce0a663426aa1","parentBlock":"0000000000000000","payload":[],"signature":"472682854cfeb3e1","slotNumber":283},{"creatorId":1,"includedVotes":[],"leadershipProof":"a04ce0a663426aa1","parentBlock":"0000000000000000","payload":[],"signature":"472682854cfeb3e1","slotNumber":283},{"creatorId":1,"includedVotes":[],"leadershipProof":"a04ce0a663426aa1","parentBlock":"0000000000000000","payload":[],"signature":"472682854cfeb3e1","slotNumber":283},{"creatorId":1,"includedVotes":[],"leadershipProof":"a04ce0a663426aa1","parentBlock":"0000000000000000","payload":[],"signature":"472682854cfeb3e1","slotNumber":283},{"creatorId":1,"includedVotes":[],"leadershipProof":"a04ce0a663426aa1","parentBlock":"0000000000000000","payload":[],"signature":"472682854cfeb3e1","slotNumber":283},{"creatorId":1,"includedVotes":[],"leadershipProof":"a04ce0a663426aa1","parentBlock":"0000000000000000","payload":[],"signature":"472682854cfeb3e1","slotNumber":283}]

eg. these are all the same blocks, but tracing the handle_slot function in rust shows the chain actually grows correctly. Could be that the Node 's best_chain changes are not visible from the Idle message?

  • Well, no, tracing the Idle messages sent shows the correct and changing chain
  • Actually, it's a problem with the ordering of the chain produced by the node: In Haskell, the tip is expected to be at the head of the chain, whereas in Rust it's pushed at the end!

There were actually 2 sorting issues:

  • One in rust, which is fixed by inserting at the head for the best_chain
  • One in NodeModel where we cons-ed the variables and therefore produced a chain where chunks were ordered but was overall not sorted

Team sync

  • Discussing logs and analysis/visualization
  • Everything we want to put into a browser should be in Rust
  • need to write an ADR about serialization
  • Create a Plutus contract to do Peras voting on-chain?
  • Q: What if we did not take into account the dangling votes when computing the weight of the chain?
  • Organise a presentation/workshop w/ Quviq on agda2hs
  • Generation -> standardise and write an ADR

2024-02-29

AB on peras-rust

Trying to wrap my head around how to design the Rust node, seems like the way to go in Rust is to have:

  • A pair of channels than can be cloned and shared between threads -> let (tx, rx) = mpsc::channel();
  • Run the thread moving references ownership : let thread = thread::spawn(move || { rx.receive()...} )
  • Collect the threads higher up to join on them?

Found nice writeup on how to use PBT in Rust: https://www.lpalmieri.com/posts/an-introduction-to-property-based-testing-in-rust/: uses fake and quickcheck

Found a reasonably good structure for the peras node, after a couple hours struggling with borrow checker and ownership constraints:

  • The state is maintained in a Node structure that's owned by the thread started with Node::start()
  • Communication with the Node is done through a NodeHandle that contains some channels and exposes methods send() and receive() to dispatch and retrieve messages
  • The node can be stopped through the NodeHandle by sending a "poison pill" message which will stop the thread
  • Found the take() method from Option which takes ownership of the content of the Option

Also started to use quickcheck to test some simple properties of the leader election function, mostly as an exercise. Seems like the Rust quickcheck does not have all the nice reporting functions I am used to with Haskell's.

Now working on the core algorithm for Praos block forging in order to be able to forge a chain. Once a node can forge a block and emit the corresponding NewChain message, I will ensure the NodeModel test passes before moving to NetworkModel and use netsim.

The formulat to compute slot leadership is given as

1 - (1 - activeSlotCoefficient') ** (fromIntegral staked / fromIntegral total)

but this seems wrong: if activeSlotCoefficient is 1, then we always produce a block no matter what our share is.

  • Actually, this is a core property of Praos: The leadership check is independent of the slot and only depends on the node's stake

Trying to move the active_coefficient in the Node structure but it ripples all over the place. What I need is to pass a parameters structure that will have sensible defaults.

2024-02-28

AB - Agda ↔ Haskell ↔ Rust matching

Trying to write some tests in Rust to check serialisation with golden files in Haskell is ok.

  • Seems a bit clunky to reference files from parent directory but not sure I have the choice... Going to write the tests in the peras_node test section

Needed to change the way some data structures are serailsied in Orphans.hs: All newtypes were serialised with explicit fields, which does not seem useful at this stage -> made all those types produce JSON via Bytes in order to directly represent them as hex strings.

Added hex dependency on Rust side to deserialise from hex strings into vector of bytes. This requires transforming type alias into structs but serde provides a transparent annotation that makes it trivial to recover Haskell's newtype behaviour

I am trying to change the Agda code to use Vote Hash instead of Vote Block but this fails. Is there a way to force the conversion in Agda2Hs for a specific field?

  • There are Rewrite rules in Agda2hs but not sure if they work with converting expressions. Perhaps just need to define an alias?

    This somehow works:

    rewrites:
      - from: "Peras.Chain.VoteB"
        to: "Vote Hash"
    

    but unfortunately the Hash datatype is not imported, need to import it explicitly?

Seems like I cannot force the explicit import of a symbol that's not used on the Agda side Might need some cleverer trick but for now, let's just manually modify the Haskell file...

BB fixed some erors in the nix build which is not able to find the golden files as they need to be added to the cabal descriptor to be picked up by nix build.

2024-02-26

Simulation experiment

Tracking and reporting of rollbacks has been added to peras-iosim:

  1. Which nodes rolled back
  2. At what slot this occurred
  3. How many slots were rolled back
  4. How many blocks were rolled back
  5. The weight/length of the old and new chains

Voting propagation and graphics were also improved.

A reasonable set of protocol parameters and network configuration was set for a 100-node network with a mean committee size of 10. Here is the formula for setting the mean committee size: $$ P_\text{lottery} = (1 - 1 / C)^(C / S) $$ where $C$ is the mean committee size and $S$ is the total stake in the system.

Findings from the simulation runs highlight the impracticality of blindly running simulations with realistics parameters and then mining the data:

  • The simulation results are strongly dependent upon the speed of diffusion of messages throught the network, so I high fidelity model for that is required.
  • Both Peras and Praos are so stable that one would need very long simulations to observe forks of more than one or two blocks.
  • Only in cases of very sparse connectivity or slow message diffusion are longer forks seen.
  • Peras quickly stabilizes the chain at the first block or two in each round, so even longer forks typically never last beyond then.
  • Hence, even for honest nodes, we need a mechanism to inject rare events such as multi-block forks, so that the effect of Peras can be studied efficiently.

YH - Agda formalization

  • Refactoring of the data model in Agda. This requires to adjust all dependent modules

2024-02-23

Metric and visualization

  • Added observation of the whole tree of blocks to peras-iosim.
  • Revised chain visualization to use tree data type of observed blocks.
  • Design for computing metrics such as frequency of forks.

AB - Rust Node

Writing custom setup file to automatically build the peras-rust projet with cargo, inspired by: https://github.com/CardanoSolutions/cardano-node/blob/release/8.1.2/cddl/Setup.hs

  • Curiously it cannot find the peras.h header file even though I have added it in the includeDirs? -> the correct parameter to use is includes not includeDirs: https://cabal.readthedocs.io/en/3.4/cabal-package.html#pkg-field-includes
  • Need move FFI code to the library component of the package, not the test component as it's the former that will depend on rUst code. This is debatable and perhaps I should stick to have the code in test/?

Need to restrict the scope of the state that's returned in the Idle message in order to avoid sneaking in implementation details from the Haskell node. The fact that Idle is part of OutEnvelope type and therefore of the interface between the network simulator and the node is problematic. Or is it just an artefact stemming from the fact the NodeModelSpec is trying to test at too low a level for its own good?

Annoyingly default JSON serialisation provided by serde is not compatible with Aeson's :

Failed to deserialise received message ("{\"Idle\":{\"timestamp\":1708594079,\"source\":\"N1\",\"currentState\":{}}}"): Error in $: key "timestamp" not found

Trying to remove the NodeState from the Idle message but it does not work as expected: I use the chainsSeen in the NetworkState to store a map from nodes to preferred chain, but it's not updated for all nodes, eg. nodes which don't have downstream consumers won't send a NewChain message! I need to change the way the NetworkState's chainsSeen is updated: It's now updated in the Idle handler also and is a map from NodeId to Chain. This should not be a problem for plotting the end graph, but it will lose forks that get abandoned by a node in favor of a longer/better chain which might not be what we want.

After changes in the Setup.hs file, cabal build peras-quickcheck fails with:

% cabal build peras-quickcheck
Build profile: -w ghc-9.6.3 -O0
In order, the following will be built (use -v for more details):
 - peras-quickcheck-0.1.0.0 (lib:peras-quickcheck, test:peras-quickcheck-test) (configuration changed)
Warning: peras-quickcheck.cabal:55:1: Ignoring trailing fields after sections:
"extra-doc-files"
Configuring peras-quickcheck-0.1.0.0...
Error: setup: Missing dependency on a foreign library:
* Missing (or bad) C library: peras_rust
This problem can usually be solved by installing the system package that
provides this library (you may need the "-dev" version). If the library is
already installed but in a non-standard location then you can use the flags
--extra-include-dirs= and --extra-lib-dirs= to specify where it is.If the
library file does exist, it may contain errors that are caught by the C
compiler at the preprocessing stage. In this case you can re-run configure
with the verbosity flag -v3 to see the error messages.

Error: cabal: Failed to build peras-quickcheck-0.1.0.0. The failure occurred
during the configure step.

eg. it cannot find the DLL built by the peras-rust project.

Seems like the confHook of cabal checks the existence of the library and fails if it does not exist -> this blog post uses the preconfigure hook to ensure the Rust dependency is built.

2024-02-22

YH - Formalisation in Agda

  • Added new property to block tree asserting that equivocal votes can not be added

Nix maintenance

  • Fixed Nix derivations for Rust-Haskell linkages.
  • Added quickcheck test to CI.

2024-02-21

YH - Formalisation in Agda

  • Refactored CollisionFree predicate and adjusted proof for the property that if the global state is collision-free, all previous state have been collision-free as well.
  • Initial model of Peras in Agda (currently in separate branch)

Peras IOSim

  • Implemented quorum logic in PseudoPeras protocol.
  • Fixed orphan JSON instance for bytes.
  • Main returns exit code explicitly.
  • Cleaned up github CI.
  • Activity and sequence diagrams.

Meeting on ΔQ

ΔQ is not static, can be a function of load of the system for example, or adversarial behaviour, eg. one can Transform ΔQ by some function and parameterised. Could even extend operators if needed. If we can model an adversary as a ΔQ and then hypothesis about adversarial power, we can construct a new ΔQ with probabilistic alternative of being connected to an adversary or not.

PT did recent experimental work on how to represent probability distrubtions as piecewise polynomials, it needs a proper project to get momentum and concrete work. There's embryonic library representing ΔQ algebra.

Cardano ΔQ model in Neil's head => being able to answer questions about the network behaviour

Several ways of using ΔQ:

  • ΔQ distribution in the simulation
  • Using for design/screening => ex-ante analysis, we have a 10-12 dimensional space => sampling for scenario
  • Exposing infeasability => fail early

Next steps:

  • Outcome model for Peras => whiteboarding session
  • Plug in some estimates (uniform distribution)
  • Build a model with a tool

AB - Rust node

Fleshing out the interface of the RustNode, eg. how it handles messages.

Writing the FFI functions on the Rust side, trying to build a library without any implementation

Managed to call into the Rust library:

Peras.NodeModel
  Netsim Honest node
    mints blocks according to stakes [ ]thread '<unnamed>' panicked at src/lib.rs:10:5:
not implemented
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
fatal runtime error: failed to initiate panic, error 5

Not sure how to handle the dependency between the peras-quickcheck and peras-rust library at build time 🤔? The latter would need to be installed somewhere and accessible for use by the former, perhaps provide some extra flags at build time?

2024-02-20

Pseudo-Peras Protocol

Finished migrating the pseudo-Peras protocol from random-forks to peras-iosim.

  • Updated peras-iosim to track voting.
  • Implemented voting logic.
  • Flagged potential ambiguities in Peras specification.
    • Clarification needed before implementing quorums.
  • Revised visualization to display each vote.
  • Added simulation execution to CI.

AB - Implement Rust Node

Starting work to implement a simple honest Peras node in Rust that I can connect to the NodeModel. As usual, I would like to start from the top so I need to decouple the NodeModel from the actual implementation of a Node in Haskell, and then implement a Haskell driver connecting to a Rust node through netsim's FFI.

Trying to generalise runPropInIOSim as used in the tests to an arbitrary runner, typechecker does not like the forall s. PropertyM (IOSim s) a type and cannot unify with ``Property m (). Gave up trying to parameterise the Actions -> Property` function with a monadic function, will just implement 2 functions in 2 different properties which will be just fine

Trying to introduce a runPropInNetsim function which would run in some Netsim m monad transformer. This does not work because there are still MonadSTM constraints on the RunModel for NodeModel so I first need to cut a proper interface that's independent of io-classes.

Scaffolded test runner in netsim and some simple Rust code, not linked still. Looking at waht are the good practices for Cargo.lock on https://doc.rust-lang.org/cargo/guide/cargo-toml-vs-cargo-lock.html, seems like it's a good idea at first to commit it.

Revisiting Rust code and how to connect with Haskell code through C FFI. I probably don't need netsim.h directly to interact with a rust node through q-d. I should rather define my own interface, expose that and use netsim under the hood to orchestrate the process, but there's no value in using netsim-ffi directly.

What I need is an interface that exposes these functions:

  , sendMessage :: InEnvelope -> m ()
  , receiveMessage :: m OutEnvelope

but for Rust

Got sidetracked while setting up peras-rust project as rust-analyser failed to run properly with LSP => Needed a rustup update which gave me rustc 1.76.0 Worked out a test runner that takes a PropertyM (RunMonad IO) a and runs actions, creating and destroying and internal node in the process. Given we are working in IO it's good hygiene to ensure that we don't leak resources when start/stop nodes living in Rust, so better wrap everything in some kind of simple bracket-style function

Reading again CAPI guide: https://downloads.haskell.org/ghc/9.0.1/docs/html/users_guide/exts/ffi.html?highlight=capiffi#extension-CApiFFI

Pairing on Agda Network Model

Pairing on the Agda network model, we want to prove that a ⊆ b -> a ⨉ a ⊆ b ⨉ b. It turned out to be surprisingly challenging, because of the functions and types used which did not lend themselves to nice reduction. YH finally found a solution later using some existing property of list membership:

   open import Data.List.Membership.Propositional.Properties

    ⊆→⊆-cartesianProduct : ∀ {a b} → a ⊆ₘ b → cartesianProduct a a ⊆ₘₓ cartesianProduct b b
    ⊆→⊆-cartesianProduct {a} a⊆b x =
      let x₁ , x₂ = ∈-cartesianProduct⁻ a a x
       in ∈-cartesianProduct⁺ (a⊆b x₁) (a⊆b x₂)

which replaced 20+ lines of complicated and unfinished proof by induction over various cases of a and b.

Made me think that perhaps we could use postulate or trustMe more liberally at this stage of the code, to focus on types that matter. I have the feeling that spending an afternoon to prove that a ⊆ b -> (a x a ) ⊆ (b x b) is not the best use of our time, just like spending time to define what a transaction is or implementing real VRF in the prototype

2024-02-19

Nix CI and Development Environment

Fixed Nix CI and development environment to handle literate Agda and an Agda dependency on agda2hs.

AB fixing Common Prefix

Trying to fix error in NetworkModelSpec, I first struggle with shrinking problems as I end up with traces without any Tick which is problematic as this means there won't be anything but a Genesis chain in all nodes.

  • Fix it by adding a clause in the property, namely that if all nodes' chains are Genesis, this is fine as this means noone made progress. Anyhow, this is not the correct commonPrefix property.
  • Tried to fiddle with DynLogicModel instance and function restricted which restricts which value is generated with anyAction (I think).

Ended up with a chain that's supposed to fail the property but is apparently not, problem being that the Show instance yields something that's not Readable again:

err =
  [ Cons (Block{slotNumber = 49, creatorId = 7392 b2bd2853, includedVotes = (), leadershipProof = f2a6ab5f8122, payload = [], signature = 06 f34b7da9fd}) (Cons (Block{slotNumber = 44, creatorId = ece20dec9edc, includedVotes = (), leadershipProof = 0 faf57e3c126, payload = [], signature = c63cff5266ee}) Genesis)
  , Cons (Block{slotNumber = 44, creatorId = ece20dec9edc, includedVotes = (), leadershipProof = 0 faf57e3c126, payload = [], signature = c63cff5266ee}) Genesis
  , Cons (Block{slotNumber = 49, creatorId = 7392 b2bd2853, includedVotes = (), leadershipProof = f2a6ab5f8122, payload = [], signature = 06 f34b7da9fd}) (Cons (Block{slotNumber = 44, creatorId = ece20dec9edc, includedVotes = (), leadershipProof = 0 faf57e3c126, payload = [], signature = c63cff5266ee}) Genesis)
  , Cons (Block{slotNumber = 49, creatorId = 7392 b2bd2853, includedVotes = (), leadershipProof = f2a6ab5f8122, payload = [], signature = 06 f34b7da9fd}) (Cons (Block{slotNumber = 44, creatorId = ece20dec9edc, includedVotes = (), leadershipProof = 0 faf57e3c126, payload = [], signature = c63cff5266ee}) Genesis)
  , Cons (Block{slotNumber = 49, creatorId = 7392 b2bd2853, includedVotes = (), leadershipProof = f2a6ab5f8122, payload = [], signature = 06 f34b7da9fd}) (Cons (Block{slotNumber = 44, creatorId = ece20dec9edc, includedVotes = (), leadershipProof = 0 faf57e3c126, payload = [], signature = c63cff5266ee}) Genesis)
  , Cons (Block{slotNumber = 44, creatorId = ece20dec9edc, includedVotes = (), leadershipProof = 0 faf57e3c126, payload = [], signature = c63cff5266ee}) Genesis
  , Cons (Block{slotNumber = 49, creatorId = 7392 b2bd2853, includedVotes = (), leadershipProof = f2a6ab5f8122, payload = [], signature = 06 f34b7da9fd}) (Cons (Block{slotNumber = 44, creatorId = ece20dec9edc, includedVotes = (), leadershipProof = 0 faf57e3c126, payload = [], signature = c63cff5266ee}) Genesis)
  , Cons (Block{slotNumber = 49, creatorId = 7392 b2bd2853, includedVotes = (), leadershipProof = f2a6ab5f8122, payload = [], signature = 06 f34b7da9fd}) (Cons (Block{slotNumber = 44, creatorId = ece20dec9edc, includedVotes = (), leadershipProof = 0 faf57e3c126, payload = [], signature = c63cff5266ee}) Genesis)
  , Cons (Block{slotNumber = 49, creatorId = 7392 b2bd2853, includedVotes = (), leadershipProof = f2a6ab5f8122, payload = [], signature = 06 f34b7da9fd}) (Cons (Block{slotNumber = 44, creatorId = ece20dec9edc, includedVotes = (), leadershipProof = 0 faf57e3c126, payload = [], signature = c63cff5266ee}) Genesis)
  , Cons (Block{slotNumber = 49, creatorId = 7392 b2bd2853, includedVotes = (), leadershipProof = f2a6ab5f8122, payload = [], signature = 06 f34b7da9fd}) (Cons (Block{slotNumber = 44, creatorId = ece20dec9edc, includedVotes = (), leadershipProof = 0 faf57e3c126, payload = [], signature = c63cff5266ee}) Genesis)
  ]

First issue: There are missing double-quotes in the Show instance for Bytes

Trying to have lawful Read/Show instance for Chain in order to investigate with copy/paste why the sample chains do not validate commonPrefix property, got hit by a weird error when trying to derive via Bytes TX instances:

src/Peras/Orphans.hs:221:1: error: [GHC-18872]
    • Couldn't match representation of type ‘BS8.ByteString’
                               with that of ‘Tx’
        arising from a use of ‘ghc-prim-0.10.0:GHC.Prim.coerce’
    • In the expression:
        ghc-prim-0.10.0:GHC.Prim.coerce
          @(Int -> ReadS Bytes) @(Int -> ReadS Tx) (readsPrec @Bytes)
      In an equation for ‘readsPrec’:
          readsPrec
            = ghc-prim-0.10.0:GHC.Prim.coerce
                @(Int -> ReadS Bytes) @(Int -> ReadS Tx) (readsPrec @Bytes)
      When typechecking the code for ‘readsPrec’
        in a derived instance for ‘Read Tx’:
        To see the code I am typechecking, use -ddump-deriv
      In the instance declaration for ‘Read Tx’
    |
221 | deriving via Bytes instance Read Tx
    | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Found the issue: Tx was declared as data Tx which breaks the representation equivalence of newtypes.

  • Need to add an annotation on Agda code to ensure it produces a newtype

Looks like the IsString instance for some types are invalid!

  • Finally got to the bottom of the problem: The fromString function was delegating to read but this is wrong as read for a String would expect double-quotes but fromSTring would be given directly the String value without the double-quotes!

Getting back to the main problem, there's definitely something wrong with my commonPrefix function as demonstrated by some unit tests I copy/pasted from the failures observed in NetworkModelSpec!

Fixed the commonPrefix computation: The Agda code was not recursing when the 2 values were inequal, which was a remainder of the previous Haskell computation where the prefix was computed by first reverseing the block list 🤦

2024-02-16

Odds & Ends

  • Added CLI interface to peras_topology.
  • Refactored and cleanly isolated random-number generation in peras-iosim.

YH code extraction for property based tests

  • For extracting properties from Agda to Haskell we can use as similar type as the Equal type from the agda2hs examples. The constructor for Equal takes a pair of items and a proof that those are equal. When extracting to Haskell the proof gets erased. We can use this idea for extracting properties to be used with quick-check.
prop-genesis-in-slot0 :  {c : Chain}  (v : ValidChain c)  slot (last c) ≡ 0
prop-genesis-in-slot0 = ...

propGenesisInSlot0 :  (c : Chain)  (@0 v : ValidChain c)  Equal ℕ
propGenesisInSlot0 c v = MkEqual (slot (last c) , 0) (prop-genesis-in-slot0 v)

Notes on serialization

Serialisation from Agda

  • serde can mimick Aeson serialisation -> configuration on the rust side
  • serialisation should be CBOR
  • should we start from the CDDL or the code? Probably want to be both, as long as we respect the specification

AB - commonPrefix in Agda

Trying again to implement commonPrefix in Agda, this time depending on Haskell equality instances

  • need to add --erasure in the agda-lib as a flag to be able to import agda2hs

Looks like the Agda version of commonPrefix does not work as expected, investigating what's wrong... It's an interesting exercise as I would like to write properties about that code, eg. something like:

commonPrefixEq : {t : Set } -> ⦃ eqt : Eq t ⦄ -> (c₁ c₂ : Chain t) ->  (c₁ ≡ c₂) -> (commonPrefix (c₁ ∷ c₂ ∷ []) ≡ c₁)
commonPrefixEq = {!!}

where c1 and c2 would be Arbitrary on the Haskell side.

Here is the property I write in QC:

spec :: Spec
spec =
  prop "(c₁ ≡ c₂) -> (commonPrefix (c₁ ∷ c₂ ∷ []) ≡ c₁)" propCommonPrefixSelf

propCommonPrefixSelf :: Property
propCommonPrefixSelf =
  forAllShrink arbitrary shrink $ \c ->
    commonPrefix @() [c, c] == c

I am not too far but still stuck with scope of type varilables in where clause, don't know how to generate the needed forall t. for nested definitinos in where clause. Alternatively, erasing the type ascriptions could work 🤔

  • Found the issue after adding some property to check asChain (asList c) == c which would also be great to guarantee in Agda!
  • There was an additional reverse when returning the list of blocks prefix which did not make sense!

Now I have an interesting error in the properties!

Test suite quickcheck-model-test: RUNNING...

Peras.NetworkModel
  Chain progress [✘]

Failures:

  test/Peras/NetworkModelSpec.hs:40:33:
  1) Peras.NetworkModel Chain progress
       Falsified (after 4 tests and 3 shrinks):
         do action $ ChainsHaveCommonPrefix [var3,var4,var5,var6,var7,var8,var9,var10,var11,var12]
            pure ()

         FailureEvaluation Variable var12 is not bound!
         CallStack (from HasCallStack):
           error, called at src/Test/QuickCheck/StateModel.hs:252:14 in qckchck-dynmc-3.3.1-99b3f6b4:Test.QuickCheck.StateModel
         Exception thrown while showing test case:
           Variable var12 is not bound!
           CallStack (from HasCallStack):
             error, called at src/Test/QuickCheck/StateModel.hs:252:14 in qckchck-dynmc-3.3.1-99b3f6b4:Test.QuickCheck.StateModel

This error was caused by the lack HasVariables instance to properly shrink, now I have another error:

Peras.NetworkModel
  Chain progress [✘]

Failures:

  test/Peras/NetworkModelSpec.hs:40:33:
  1) Peras.NetworkModel Chain progress
       Falsified (after 19 tests and 8 shrinks):
         do var62 <- action $ ObserveBestChain N1
            var63 <- action $ ObserveBestChain N2
            var64 <- action $ ObserveBestChain N3
            var65 <- action $ ObserveBestChain N4
            var66 <- action $ ObserveBestChain N5
            var67 <- action $ ObserveBestChain N6
            var68 <- action $ ObserveBestChain N7
            var69 <- action $ ObserveBestChain N8
            var70 <- action $ ObserveBestChain N9
            var71 <- action $ ObserveBestChain N10
            action $ ChainsHaveCommonPrefix [var62,var63,var64,var65,var66,var67,var68,var69,var70,var71]
            pure ()

         FailureEvaluation Map.!: given key is not an element in the map
         CallStack (from HasCallStack):
           error, called at libraries/containers/containers/src/Data/Map/Internal.hs:617:17 in containers-0.6.7:Data.Map.Internal
         Exception thrown while showing test case:
           Map.!: given key is not an element in the map
           CallStack (from HasCallStack):
             error, called at libraries/containers/containers/src/Data/Map/Internal.hs:617:17 in containers-0.6.7:Data.Map.Internal

Which seems to mean the nodes are not available to observe. Given the number of variables involved the trace took a while to shrink!

2024-02-15

Rust-based generator of randomized network topologies

The peras_topology Rust crate contains a rudimentary implementation of a randomized network generator, like that in the peras-iosim Haskell package.

  • Reads YAML-formatted Parameters describing the network.
  • Outputs a JSON file with the Topology.
  • The serialization formats are compatible with peras-iosim.
  • Added build of peras_topology to the CI.

This raises several issues related to Rust:

  1. How do we want to organize our Rust crate structure?
  2. Should we try to generate the Rust types directly from Agda?
    • How mature is IOG's work on agda2rust?
  3. Which tools do we want to code in Rust vs Haskell?
  4. Is the default rustfmt sufficient as a code-style standard, for the time being?

YH working on code extraction using agda2hs

ΔQ library

Did some brushing up of PNSol's DeltaQ library as it's easier for me to understand things once they are written down as code. Would be interesting to have a look at how they manage the probability distributions and integrate that in our network modelling.

AB on NetworkModeling

Shaving some yaks while trying to implement "common prefix" property for the NetworkModel. I implemented a couple of functions for Chain types in Agda, got sidetracked into looking for shortcuts to do some goal-driven development and could not find one to fill in the body of a function given its type.

  • Wrote asChain and asList functions to convert between Chain and [Block] as the two types are isomorphic.
  • I would like to implement all pure functions related to chain manipulation in Agda. Going to not waste time right now to do that but something to keep in mind

There is something mind boggling with the runNetwork function that mesmerizes me! I can't find a good way to start cutting it in pieces to be able to separate the step, stop and observation functions!

(few hours later)

Managed to connect the network model to the network runner at least for the step part, now need to retrieve the preferred chain from each node. The problem is that nodes' state is pretty much opaque, eg. it's initialised and then updated in a separate thread, need a new kind of message ?

Pairing on NetworkModel's last mile

Thinking about time, we realise that time management and messages delivery are tricky. The Network code ransforms individual delays of messages into a statistical delivery of messages which is currently very simple, eg. 300ms delay over a 1s slot => 30% of messages got delayed to next slot.

This seems to be a nicelink to ΔQ formalism because we could put a (partial) CDF as a function to manager the dealys of messages and then use q-d to check the behaviour of the model under various assumptions.

  • no need to reinvent the wheel, we could leverage their formalism for handling statistical distributions

We have some nice monitoring in place to report what the chain look like after a quickcheck test run. Here is a report for 10 samples with 1/20 active slot coefficient which demonstrates the observed behaviour is consistent with what we expect from Praos chain:

Peras.NetworkModel
  Chain progress [✔]
    +++ OK, passed 10 tests.

    Action polarity (360 in total):
    100.0% +

    Actions (360 in total):
    64.7% +Tick
    32.5% +ObserveBestChain
     2.8% +ChainsHaveCommonPrefix

    Prefix density (10 in total):
    50% <= 6%
    40% <= 5%
    10% <= 10%

    Prefix length (10 in total):
    70% <= 100
    20% <= 400
    10% <= 300

Struggling again to convert the commonPrefix function as an Agda function, forgot that scope of implicit variables comes from pattern matching on the LHS. Now need to understand equality...

Seems like porting my simple prefix function to Agda is a non-trivial effort:

  • Agda does not have builtin structural equality (?) nor typeclasses so one needs to define equality for specific types
  • There is a decidable equality provided by the builtins but it's not erased. According to Orestis one needs _erased decidability which is like https://github.com/agda/agda2hs/blob/master/lib/Haskell/Extra/Dec.agda
  • it's a bit sad as this function seems simple enough :(

2024-02-14

BB on testing statistical properties

  1. Ran and statistically analyzed an experiment using this Jupyter notebook to verify the correctness of slot leadership and block production for the PseudoPraos protocol.
  2. Research QuickCheck-based approaches for handling approximate equality and statistical properties.
    • There don't seem to be special-purpose libraries of this.
    • QuickCheck and its extension provide some control over coverage and sampling, but this doesn't address the fundamental problem of determining when a failure is a valid statistical outlier vs an error.
  3. Revised QuickCheck test for block production to use a three-sigma confidence interval. Eventually, we'll need to make the sigma level tunable.

Other revisions to peras-iosim:

  • Added an explicit parameter for total stake to Peras.IOSim.Simulate.Types.Parameters.
  • Changed serialization of Peras.Chain so that it doesn't keeply next JSON, since that causes most JSON parsers to crash for long chains.

AB on Network Modeling

Reading about Rust's FFI: https://www.michaelfbryan.com/rust-ffi-guide/setting_up.html

  • ByteBuffer is a type exported from ffi_support library which means foreign code will need to link to this library too, which is annoying.
  • There's a PR #9 baking to build a proper FFI library

Looking at runNetwork code to understand how I can instrument it for inclusion in a quickcheck model. runNetwork is opaque, eg. it runs a predefined network until some slot is reached.

AB on ΔQ

Trying to (again) understand what ΔQ is and how to use it, found this paper on PnSOL's website that sheds some light on it: The idea is to relate SLAs to Quantitative Timeliness Attenuation which is then expressed as an "improper CDF" incorporating expectations about delivery time and loss probability. For example, one can say:

  • 50% delivered under 100ms
  • 95% delivered under 200ms
  • 2% packet loss

This defines a crude distribution function with steps and then ΔQ actual measurements can be done to assess whethere these are met.

Also, individual components' ΔQ profiles can be composed to yield global measure.

Original paper

Going through the first sections of https://www.mdpi.com/2073-431X/11/3/45 to understand how it's been applied to Cardano block diffusion

Problem is stated as:

Starting from blockchain node A, what is the probability distribution of the time taken for a block to reach a different node Z when A and Z are picked at random from the graph?

Then the next paragraph says the "the graph is random with some limited node degree N" which seems to imply we assume a random graph. But is it really the case? Is the graph really a random graph (with all the properties as defined by Barabasi et al.) or is it rather a power law graph? What would that change?

Delay distributions can be combined either sequentially (through a convolution?):

Definition on continuous domain is: $$ (f*g)(t):=\int _{-\infty }^{\infty }f(t-\tau )g(\tau ),d\tau. $$

which translates to finite domains as: $$ h(n) = (f \star g)(n) = \Sigma_{i=-\inf}^{\inf} f(n−i)g(i). $$

Found this Algorithm in 1D (which is what's interesting for us): https://www.algorithm-archive.org/contents/convolutions/1d/1d.html

Refinement & alternatives exploration

ΔQ can be applied to refine steps in a process, but also explore alternative designs:

  • Impact of a distribution of path length on block diffusion time
  • Impact of header/body split on overall diffusion

For example:

For the ‘Near’ peers shipping a 64 kB block, this means an intensity of 42.7 Mb/s (8 × 64000/(0.024 − 0.012))

  • 0.024 = time to ship a full block between 2 nodes in the same DC
  • 0.012 = latency between 2 nodes in same DC (RTT)
  • 8 = number of nodes from which we download full block ?
  • 64000 = size of a block

Not sure what does this number means? Bandwidth lower-bound of a node?

Comparison between ΔQ analysis and simulation

Aims at showing that simulation is intractable as it would need too many samples to provide similar information

  • Feeling that the argument about the number of graphs to generate is a strawman argument. There's a huge amount of them of course, but you don't really care to cover those, you only want to ensure your sample has some variety while staying consistent with the expected characteristics of the network.
  • you can reduce the search space to relevant graphs and leave pathological ones aside

Semantics of ΔQ

Δ0 is a partial CDF, eg. a CDF with some cutoff before 1 to represent failures

ΔQ(Δ0) = Δ0
ΔQ(A -> B) = ΔQ(A) * ΔQ(B) -- convolution
ΔQ(A // B) = p(A)ΔQ(A) + p(B)ΔQ(B) -- alternative w/ proba
ΔQ(∃(A | B)) = ΔQ(A) + ΔQ(B) - ΔQ(A) x ΔQ(B) -- exclusive one outcome
ΔQ(∀(A | B)) = ΔQ(A) x ΔQ(B) -- all outcomes

How can we apply ΔQ to the problem of finality in a blockchain? We first need to define finality and settlement: It's the time it takes for a transaction to be considered Final by one node? By all nodes?

A survey paper about networking in cryptocurrencies: https://arxiv.org/pdf/2008.08412.pdf

2024-02-13

YH - Formal specification

  • Introduced CollisionFree predicate
  • Refactored relation on global state
  • Proofing property that when there is collision-free state all previous states have been collision-free as well

IOSim enhancements

  • Improved faithfulness of slot-leader selection.
  • Refactored to use MonadRandom throughout.

AB - Model-Based Testing a node

Trying to express block production property as a ratio of number of blocks over current slot for a single node, got a failed test:

  2) Peras.NodeModel Honest node mints blocks according to stakes
       Assertion failed (after 5 tests):
         do var0 <- action $ Tick 45
            var1 <- action $ Tick 73
            var2 <- action $ Tick 35
            var3 <- action $ Tick 19
            action $ ForgedBlocksRespectSchedule [var3,var2,var1,var0]
            pure ()

         trace:

         Produced 4 blocks in 172 slots
  • First, the good news is that the test works and we actually produce blocks and can check those
  • Second, the condition is an equality but it should be a bounde interval c
  • I like the fact the failure produces a useful trace, but in this case it does not shrink

Tweaking the property checked, taking into account the actual stake of the node which is randomly generated from max stake. I am surprised by the discrepancy between what I compute as expected and what's actually produced:

 Assertion failed (after 2 tests):
   do var0 <- action $ Tick 85
      action $ ForgedBlocksRespectSchedule [var0]
      pure ()

   trace:

   Actual: 17, Expected:  7.769
   Stake: 457 % 500, active slots:  0.1, Slot: 85

Trying to scale the number of slots in each action to see if this has an impact on the difference innumber of blocks produced

Looks like the discrepancy is not very surprising as the formula for slot learder ship is currently computed as:

  let expectedStake = fromIntegral $ peerCount * maximumStake `div` 2
      pSlotLottery = 1 - (1 - activeSlotCoefficient') ** (1 / expectedStake)
      p = 1 - (1 - pSlotLottery) ^ currency
   in (<= p) `first` uniformR (0, 1) gen

Ensemble discussion

We need to start Experimenting with adversarial node/network behaviour, to be able to mix honest nodes + various types of malicious node in a single test run and simulation. Big question is: What it means to be adversarial? How does it impact the chain and the network? Perhaps a simple and generic answer is that dishonest nodes are the ones producing "malicious" blocks, whose content we do not really care about?

:right_arrow: We want to mark blocks produced by adversarial nodes and check how much such blocks the chain ultimately contains

This leads to the interesting question of whether or not the mempool should be out of scope? We think so but perhaps there are expectations we are not aware of?

Network Simulator

  • Tools to simulate and emulate protocols

  • 2 different multiplexers -> green vs. native threads (tokio)

    • sockets/handles for sending/receiving messages
    • implement policies/latencies
    • in Cardano -> nodes come slower to each other
  • Alan -> working on Simplex, no imminent consumer of it

    • Fast BFT/HotStuff/Simplex -> need some kind of network simiulator
    • have something, but not as complete as what netsim aims
    • Rust/Scala
  • easy to port and use Wasm -> native threads converted to JS

  • handling time? -> global clock

  • metrics? => latency, bandwidth

2024-02-12

IOSim enhancements

  1. Enabled optional profiling of Haskell code, via Nix.
  2. Fixed memory leak cause by bug in io-classes-1.3.1.0.
  3. Finished provisional algorithm for routing.
  4. Improved visualization of chain forking.

Haskell package maintenance

  1. Separated Haskell files generated by agda2hs to separate folders from the Agda source, in the peras-hs package.
  2. Relocated Peras.Ophans from peras-iosim to peras-hs.
  3. Added Haskell package builds to CI.
  4. Added formatting pre-commit check for Haskell and Nix files.
  5. Turned on StrictData language extension for peras-hs and peras-iosim.
  6. Temporarily disabled peras-quickcheck in cabal.project and .github/workflows/ci.yaml, due to missing netsim library.

AB - Connecting quickcheck & Rust

Having "fun" calling Rust library from Haskell through FFI.

Managed to use dummyFFI from netsim library which is just an echo, but it definitely works. Now need to better understand how the library is supposed to work, possibly implementing some very simple node.

AB - Connecting Quickcheck & peras-iosim

  • Split peras-quickcheck's model in 2, one NetworkModel and one NodeModel which should respectively assert properties of a whole network, and of a single node
  • Run model test against a single Peras.IOSim.Node with the goal of checking the node respects leader's schedule which is a very basic property but not so simple to express and useful to check
    • Property is not implemented, I would like to express it as some expected number of blocks produced during a time interfval
    • Had some discussoing with John Hughes on how to express that in DL: The number of blocks is not directly accessible as it's an output of the SUT and only accessible to the DL as Variables
    • Currently needs to be done as an action with a postcondition that can resolve the variables, but it might be possible and useful to extend DL to be able to resolve variables as an expression in order to avoid having to modify the model only for the purpose of observing some computation result

Team Discussion

The quality of the Network simulation is important

  • We want both an idealised network, statistical regularities, and a faitful model of the actual network -> realistically inferred from existing network

  • Network modelling is a big risk area for the project as its quality and the ability to run simulations and experiments will be critical to produce meaningful and useful results

  • => meet netsim implementation team to better align on goals and where we want to go

  • Take the Agda types as is + incorporate in the simulation

    • How faithful is the traduction?
  • Goal of the week:

    • Express knowledge propagation property in Agda?
    • Perhaps leader schedule respect would be an even simpler property to state and test?
    • Translate to Haskell
    • Run q-d over implementation in both Haskell and Rust

2024-02-10

Peras IOSim

Mostly finished refactoring of peras-iosim to incorporate functionality of random-forks.

  • Additions and changes
    1. Various refactorings.
    2. Implemented a Praos-like protocol.
    3. Command-line options.
    4. YAML input and JSON output.
    5. Visualization.
  • Next steps
    • Diagnose and fix memory leak.
    • Migrate Peras algorithm from random-forks.
    • Implement logging via a suitable logging monad.

2024-02-09

Peras IOSim

Continued refactoring of peras-iosim to incorporate functionality of random-forks.

  • Additions and changes
    1. Integrated peras-iosim with Agda-derived types.
    2. Implemented orphan instances.
    3. Implemented JSON serialization.
    4. Aligned usage of Agda types.
    5. Implemented network routing.
    6. Implemented random-network generation.
  • Next steps
    • Read YAML files as input.
    • Migrate Praos algorithm from random-forks.
    • Migrate Peras algorithm from random-forks.
    • Implement logging via a suitable logging monad.
    • Migrate visualization from random-forks.

YH - Model in Agda

  • Moved the initial implementation of the semantics into a separate module: SmallStep.agda
  • Similar for the proofs: SmallStep/Propeties.agda
  • The small-step semantics implement closely the semantics proposed in the PoS-NSB paper

AB - Quickcheck Model

Adding self-hosted runner on my Mac OS machine for Peras in order to benefit from caching. Configuration is straightforward but unfortunately it's not possibler to share runners between repositories on a case-by-case basis: They have to be defined as organisation-level runners and then can be shared.

  • Initial build time is 12m57s
  • Rebuild took 1m9s 🎉

Got a first stab at ModelSpec compiling and "running", trying to express chain progress property as:

chainProgress :: DL Network ()
chainProgress = do
  anyActions_
  getModelStateDL >>= \Network{nodeIds} -> do
    bestChains <- forM nodeIds (action . ObserveBestChain)
    let (settled, unsettled) = commonPrefix bestChains
    DL.assert "" $
      all (((< 42) . length) . asList) unsettled
        && not (null (asList settled))

Need to look at the actual definition in the Praos or PoS-NB paper perhaps...

One issue is that this can't work as is: an action SomeAction where SomeAction :: Action MyState a return Var a which is just a symbolic reference to a value and not the value itself. It actually does not make sense for the Model to attempt to track all the chains produced, it's actually simpler to just track the elapsed time (slots) and compute how long the common prefix should be

Modified model to check observation on the postcondition of the RunModel but now the test loops!

  • Sprinkling traces does not reveal anything of interest, seems like it's stuck somewhere between computing next state and performing the action on the run model 🤔
  • Struggling to understand why my q-d run is hanging, trying to run it within IO instead of IOSim but the type tetris is annoying...

Managed to write a function to run actions in IO, inspired by runPropertyStateT

  • I don't understand much of what I am doing here as it's really unclear what's going on in the PropertyM, but it compiles.
  • It still hangs however -> compiling with profiling to be able to dump some traces statistics

Thanks to profiling I found out where my error was 🤦 : the baseNodes function uses unfoldr to generate node ids and then take 10, but then it generates node ids using unfoldr again without any stopping condition which of course never terminates

Team sync

Discussing whether or not to check-in generated code?

  • Checking in would be useful and then have some check in place in the CI
  • it's probably ok either way now

What about Praos Genesis?

  • could have impact on our work as the criterion for chain extension is different
  • also has impact on the networking stuff

First simulation should focus on committee size -> most important number in the system, needs intuition on what's the right value

2024-02-08

Nix, Agda, and Haskell housekeeping

  1. Reconciled Haskell package dependencies.
  2. Switched to IOGX-style flake.
  3. Included pre-built Agda standard library.
  4. Added nix build .#peras for building the Peras library in Agda and Haskell.
  5. Nix-based CI.
  6. Cabal file for Haskell code generated by agda2hs.

Issue: the Nix-based CI test takes ~28 minutes, but this will be far faster when caching is enabled.

Pairing session - Agda modelling of messages

Some resources on agda2hs:

Together with AB we started to work on the extraction of the model from Agda to Haskell using agda2hs

  • The tool requires to put the AGDA2HS pragma for all data types that are exported
  • We renamed constructors to be upper case and type parameters to be lower case as required in Haskell
  • Data types that used the standard library had to be adjusted. For example we changed the Block data type that used set to a parameterized data type:
open import Data.Tree.AVL.Sets renaming (⟨Set⟩ to set)

...

record Block (t : Set) : Set where
  field slotNumber : Slot
        blockHeight : Word64
        creatorId : PartyId
        parentBlock : Hash
        includedVotes : t
        leadershipProof : LeadershipProof
        payload : List Tx
        signature : Signature

Block⁺ = Block (set HashO)
  • The import of Data.ByteString in Haskell is possible using the FOREIGN pragma

We then try to parameterise the Block type on the agda side to be able to inject whatever we want as concrete type, but agda2hs does not like our kind for the parameter -> needs to be a Set, or a simple type-level expression. There should be a way to map types/imports from Agda to HS?

We have some kind of strategy to go from Agda, sticking to "simple" types wherever we need it but it's not entirely satisfactory as we are losing information in the process Then got an interesting discussion with some other agda2hs user:

I found that if I had something in Agda that couldn't turn into Haskell then I was going to wrong way - you need to figure out what it looks like in Haskell and then try to handle that in Agda, rather than fitting the Haskell to the Agda, if you see what I mean?

At least you are doing proofs over the Haskell code, not an abstract model that may or may not relate to the real implementation. Agda2hs seems to come with a bunch of Haskell.Prelude related modules that mirror the Haskell modules. For anything bespoke I just used postulate (and, if I had kept going, would have re-implemented them as "Haskell in Agda")

I would suggest that agda2hs should be renamed to hsInAgda :slightly_smiling_face: I found like that agda2hs keeps almost exactly what I wrote in the .agda file when it converts. I also found that you could do nice things with syntax declarations if you want to use unicode in the agda file though. So I defined functions with long Haskell names but could give them unicode symbols. Wherever I used the unicode symbol subsequently agda2hs dropped in the long name.

2024-02-07

Team session

Q: How does Praos Security model relates to PoS-NB paper's model? Seems like the latter is simpler, has stronger assumptions thus leading to weaker guarantees

Some high-level questions being discussed:

  • What are the next steps on simulations?
  • Could be good to have "foundation" code in Agda and then evaluates simulation tools using always the same code in order to ensure interoperability
  • JSON serialisation is hard in Agda, need to be done at generated code level
  • Write down the types together?

What is a good property to express?

  • How to express finality?
  • What are block diffusion properties and their impact on the finality => impact on nr. of peers on speed of finality

Exploring simulation tools:

Some hypothesis about Peras protocol itself:

  • if $A$ is too low, cooldown periods happen to often
  • Later discussion w/ researchers tell us $A$ is not so important because the protocol is geared towards optimistic cases, so it can be small

Cool idea we played a bit with: Have a visualisation up and running as a demo so that people can tweak the parameters to see how the protocol is behaving. Then we can gamify it, asking people to share their favorite parameters set and run more thorough simulations with those parameters. It will also be a nice way to engage with the community and have apeople understand the benefits and tradeoffs of this new protocol extension.

Code generation:

  • Trying to write a simple model of the message types in Agda and generate HS code
  • Default generator yields unreadable and encoded haskell -> not super useful for embedding in QC, so we'd rather want agda2hs which generates pretty readable code

Possible experiment:

  • build a slurp node
  • Run the node in various datacenters connected to other peers
  • store the chain following data -> reconstruct the progress of the chain at vartious point in time
  • Is this not what the CF is doing with their Cardano network monitoring project? => ask for more information

Peras simulation using IOSim

  • Began work on IOSim-based Peras simulator.
    • A couple of days from now, this will replace the random-forks model.
    • Based on ideas from Hydra simulator.
    • Initially use a simple centralized barrier for synchronizing clocks.
      • A more sophisticated barrier algorithm can be implemented later, if needed.
    • Status
      • Foundational types for Praos.
      • Partial implementation of network.
    • Next steps
      • Basic Praos-like protocol.
      • Visualization of output.
      • Integrate with Agda-generated types.
  • Notes and ideas
    • Study other network simulators (6+ candidates already)
    • Review design document for Cardano network.
    • Consider how $\Delta$Q-analysis can be used with the simulation.
    • The random-forks simulator runs 3000 nodes for one hour of simulated time in two seconds of wall-clock time.
    • Statistical design of experiments will be needed (OAT sensitivities, then LHS or OA designs).
    • Talk to Karl Knutson of CF about network statistics
    • Eventually, publish a web-based version of the simulator.

2024-02-06

Simplfied Praos/Peras Simulation/Visualization

  • Changes
    • Removed use of IO, in order to accommodate IOSim.
    • Refactored random-number generation.
    • Refactored types into separate module.
    • Refactored graph visualization into separate module.
    • Added rounds and voting.
  • Not yet implemented
    • Cooling-off period.
    • The code is still hacky and disposable.
    • Vertical alignment of blocks by slot in visualization.
    • Display of voting-round boundaries in visualization.
  • Observations
    • StatefulGen and IOSim don't easily coexist.
    • Visualization the block production, voting, and forking is practical for many nodes and slots.
    • The Peras procotol can equally well be simulated via a centrally managed or a distributed simulation, but a distributed simulation seems cleanest and closest to the protocol.
  • Next steps
    • Consider whether it is worthwhile further developing the model, or whether it would be better to re-architect the model in Agda or in a simulation framework.
    • Document visualization lessons-learned and next-generation design.
Example chain Example topology
chain topology

YH

Reading the PoS-NSB paper and Coq code

  • Use of math-comp library in Coq takes getting used to
  • Sketched the relation on what states are reachable, i.e. the small steps semantics and the transitive closure in Agda
    data _↝_ : GlobalState GlobalState Set where
    Deliver : {s ms ps}
    ⟪ s , Ready , ms , ps ⟫ ↝
    let gs = List.foldr party_rcv_step_world ⟪ s , Ready , ms , ps ⟫ ps
    in record gs { progress = Delivered }
    Bake : {s ms ps}
    ⟪ s , Delivered , ms , ps ⟫ ↝
    let gs = List.foldr party_bake_step_world ⟪ s , Delivered , ms , ps ⟫ ps
    in record gs { progress = Delivered }
    NextRound : {s ms ps}
    ⟪ s , Baked , ms , ps ⟫ ↝ ⟪ incrementSlot s , Ready , ms , ps ⟫
    PermParties : {s p ms ps}
    ⟪ s , p , ms , ps ⟫ ↝ ⟪ s , p , ms , permParties ps ⟫
    PermMsgs : {s p ms ps}
    ⟪ s , p , ms , ps ⟫ ↝ ⟪ s , p , permMessages ms , ps ⟫

AB - Test Model

Reading PoS-NSB paper let me think about how to generate meaningful tests:

  • The Model could generate a valid sequence of transactions, inject them in arbitrary nodes, and verify all nodes converge to a valid maximal chain
  • This requires modelling txs which is annoying
  • We could instead have the Model just play the role of the network/diffusion layer for blocks (And later votes) and select block to broadcast from whatever each node has made available some blocks, possibly injecting also rogue blocks. This removes the need to forge transactions and put the model in the shoes of the adversary, interposing itself between otherwise honest nodes

Trying to get a working environment (Again) but I have issues with HLS/LSP: Code action for filling missing variables did not work, so I upgraded to latest available lsp-haskell but now I get another error:

Symbol’s value as variable is void:
lsp-haskell-plugin-cabal-code-actions-on

Managed to propertly configure auto formatting for Haskell on local environment for Peras, such that it picks up the right configuration file. I ditched use of lsp-format-code action because it's borked as per haskell/haskell-language-server#411

Pushed a first version of a test model, along the lines sketched above. The model's actions are very simple:

  • Tick to advance the slot for all nodes,
  • ObserveChain to retrieve the current best chain from a node.

Then when performing Tick on the real nodes, they will produce some Blocks which will be broadcast to other nodes, possibly later with some delays or loss...

A basic property that could be interesting to state as our first test would be the Common Prefix property:

do anyActions_ getState >>= \ nodes -> do
   chains <- mapM (action. ObserveBestChain) nodes
   assert $ all ((< k) . lengthDivergingSuffix) chains

eg. all nodes' potential forks are not deeper than the security parameter k or equivalently all nodes have a common prefixs.

When we express properties for Peras, we could have this property provide a tighter bound than k in the "steady state", or fallback to k when confronted with adversarial behaviour.

2024-02-05

BB

Initial work in progress on a crude simulation of peers, chains, and forks:

  • Generates a reasonable but random topology of peers.
  • Simplified election of slot leaders.
  • Simplified lottery of voting committee.
  • Forking of chain and message passing to downstream neighbors.
  • The chaining is a work in progress, with several glaring but easily remedied deficiencies.
  • The code is unpolished and hacky.
  • The next step would be to add an approximation of the Peras protocol.

This is a disposable model, just for exploration and building intuition.

Example chain Example topology
chain topology

AB

Trying to assess what would be most useful to start working on for Peras.

We basically have a few moving pieces that need to fit together:

  • A formal (eg. Agda-based) model of the protocol, along the lines of Formalizing Nakamoto-Style Proof of Stake
    • need to analyse the original paper and the modeling and proof techniques used
    • need to port those to Agda
  • An executable encoding of this formal model that's suitable for use by quickcheck-dynamic (or similar tool) to generate test traces for the protocol
    • requires to be able to generate something from Agda?
    • requires some kind of modeling of the environment of an instance of the protocol -> need to be done at the formal level too?
  • Some driver/glue code to be able to actually test an executable (prototype) against this specificaiton/environment
  • An actual prototype
    • need to be "embeddable" in order to maximise testing capabilities and minimise time, eg. use some form of IPC/shared memory instead of relying on full-blown network communications
    • abstracts away the details of the communication, e.g assumes some form of async message-passing from a network layer

Some interesting article defining what it means to formalise a Nakamoto consensus: https://allquantor.at/blockchainbib/pdf/stifter2018agreement.pdf

Looking at generated code from Agda:

  • HS code is just unreadable: names and symbols are encoded, variables are just numbered with a single character prefix, etc.
  • Not sure what this MAlonzo thing is? -> https://wiki.portal.chalmers.se/agda/Docs/MAlonzo says it's the standard compiler backend that generates Haskell from Agda code

As expected, generated code should be treated as opaque and "Sealed" and not easily amenable to extensions and modifications

Perhaps I should just start focusing on a simple aspect, e.g the Chain Weight algorithm. Algorithm is rather complicated as it involves both chains and votes and is pretty critical for the system. Other option is to go top-down and just ignore the details of chain weight for the moment, eg. write the pure protocol state machine model and build the full chain from that?

Looking at AFLNet article which leverages AFL fuzzer to work on protocol/network systems, not sure how this is done or if there's actual introspection of the fuzzed system as AFL does

Retrieving https://github.com/AU-COBRA/PoS-NSB and adding as submodule for peras, then trying to "Build it". My memories of Coq are a bit blurry but it seems necessary to understand how they formalised the problem if we hope to be able to replicate it for Peras

Ensemble session

Topology of the network

A good experiment to evaluate it:

  • Spin-up nodes all over the network and check propagation time
  • Would allow us to create realistic simulation
  • build an intuition about the structure of the network
  • how much of the protocol survives depending on topology of netwokr

We might want to start asking now to experts what the actual topology looks like:

  • Start w/ network team
  • -> Karl Knutsson + Markus Gufler
  • Galois ? -> CNSA project -> ?

Key metrics we might want to evaluate for running Peras:

  • bytes storage => Disk
  • bytes in RAM => Mem
  • nbr + size of messages => impact on bandwidth
  • pooltool.io -> measure latency

first-order -> network details independent, use distribution (ΔQ)

  • used dQ in Marlowe to reason about degradation and back-pressure
  • used fixed size queues -> tell the client to wait

partial deployment of Peras?

which formal model?

Some estimates for the protocol params

Some values for the parameters

  • $T$: the length (in slots) of a voting round --> order of a few network deltas
  • $K$: the length of a cooldown period (in voting rounds) --> large enough so that order of $b\times n + k$ blocks are produced in time $T\times K$, where $k$ is the current common-prefix parameter
  • $L_{low}$, $L_{high}$: define vote-candidate window -->
    • $L_{high}$: some constant so that we think there is common prefix in practice if we cut off blocks newer than $current_time - L_{high}$;
    • $L_{low}$: should in theory be security parameter to guarantee that there exists a block in the interval $[L_{low}, L_{high}]$
  • $A$: max. age for including vote --> security parameter to ensure honest votes get included (because of bad chain quality of Praos)
  • $\tau$: number of votes required for quorum --> $ \frac{3}{4\times n} + 2\times\delta, \mathrm{for some} \delta &gt; 0 $
  • $b$: weight boost per vote --> prob. some small constant
  • $W$: weight to cut off for common prefix --> order of security parameter; should be informed by the fact that R successful voting rounds give error $e^{-2\delta\times b\times n\times R}$
  • $n$: committee size: order of security parameter to avoid adversarially dominated committees; we're also hoping to make it work for smaller committee sizes at some point

2024-01-31

Peras Meeting

  • Cardano Vision workshop -> CH + Research plan for next years

  • Intermediate solutions to speed up finality -> low hanging fruits

    • CH ask about anti-grinding?
    • Peter also asked what happened to anti-grinding?
  • grinding = making a block makes it possible to manipulate future lottery to make more block

    • introduce some PoW to calculate nonce over the epoch
    • no work on the Peras R&D team
    • could be good to still track it here and ensure it's making progress and tackled by someone?
  • Peras paper: Alex, Aggelos, Christian, Peter, Sandro

    • targeting CCS but it's a bit soon
    • draft of the paper?
    • what kind of support? -> depends on the venue
    • could be good to have some kind of artifact
  • Formal model ?

    • Best known approach so far : https://arxiv.org/abs/2007.12105
    • We need to start somewhere -> Write Sandro's algorithm in Agda
    • Could then port proof techniques from Coq