Testing: Exposing internals and utilities #681

hackaugusto · 2023-02-03T13:36:27Z

hackaugusto
Feb 3, 2023

This is a discussion to gather the pros and cons of exposing testing utilities and data structures / algorithms internals for testing purposes.

Some of the drivers behind this are re-use of testing code among our different projects, and easier testing.

vlopes11 · 2023-02-03T14:02:47Z

vlopes11
Feb 3, 2023

I see two embedded questions:

How should we name functions that assumes an argument is checked?

One example of this is the tiered sparse merkle tree. When we are computing the hash of a node of a ordered list, after we deserialize the list from the disk, we can assume it was serialized as ordered, so we don't need to re-order it every time. However, if the serialization was faulty, then we will end up computing invalid nodes.

We can have two methods get_max_tier_node_value and get_max_tier_node_value_from_sorted. We could alternatively use _unchecked suffix, and that would be reasonable, but this is commonly used in the standard library when we have potential memory unsafety. So we might mix the two; I would recommend going for a different suffix there to not mix the memory safety subject with logical behavior of the program.

The pros of this approach (of having two methods; one checked, the other unchecked) is that we allow the user to skip unnecessary redundant operations when he is certain the conditions are met for an argument (i.e. the list is serialized in its sorted state).

The cons is that we increase the likeliness of bugs as there might be an error or divergence between the expected sort strategy and the one implemented for serialization. We can mitigate this con by requiring the API to always provide a method with its "unchecked" counterpart that will take the argument in arbitrary form and transform it into the expected form (in that example, sort the list).

Another con is that we increase API complexity. Having a single get_max_tier_node_value is a nice thing, but might be more expensive for some cases.

Functions that might transition the component into an invalid state.

This is often desirable for stress test as we might want to check if the component will panic under certain conditions. However, these should never be exposed by the API because, regardless of the user input, the component shouldn't ever be invalid.

There is a rather clear differentiation from the previous item, and it is the fact that the previous item handles the output of a function. Meaning: the output will be invalid if the input is invalid. This case covers the internal state of the component, that should always be valid.

We can benefit of such risky API in some cases if we assume certain behavior of a dependency, and couple the internals of the component to these assumptions. But, before we do that, I think we should have clear answers to the following questions:

Why can't we just create a new API on the dependency?

One possible answer to that question is "this is an external library that we don't have control of". One case that might happen is when something is defined in winterfell, and we can't really change that code as it might impact somewhere else.

Another possible answer is the dependency itself won't benefit from exposing such API as it might just forward the problem of the possible inconsistent state to itself. In that case, it is desirable to move these unchecked functions as downstream as possible, covered by robust tests. They are often ad-hoc routines, and should be treated as such. The biggest con of this approach is that we create a hard couple between the versions, making it much harder to upgrade to latest versions as the internals of the dependency might change, breaking the previous assumption.

Why can't we create an intermediate component that will handle these states?

It might be overkill sometimes to create components for everything, but this is ideal.

Is the dependency API insufficient and should be refactored?

If we face such needs, it is very likely the dependency API is poor. It should, ideally, provide safe and checked options to manipulate its state, covering all use-cases.

Is this meant to only test a critical edge case of the current API?

This case would be a solid yes to use an internals feature flag. If we try to use a #[cfg(test)], it will not work because test API is not available downstream. The easiest option is to control this with #[cfg(any(test, feature = "internals"))]

2 replies

hackaugusto Feb 11, 2023
Author

For naming. I would just follow the API guidelines:

The convention is to mark these opt-out functions with a suffix like _unchecked or by placing them in a raw submodule.

https://rust-lang.github.io/api-guidelines/dependability.html#dynamic-enforcement-with-opt-out

I would also only introduce this after measuring significant impact in performance. For the given example it is most likely not worth it, if we are reading the data from disk, we already go over every element O(n), and there are algorithms that are O(n) for the case were data is already sorted (e..g insertion/heap sort), so we don't have significant costs even when sorting while reading (It may make sense, again, it would need benchmarking)

Another con is that we increase API complexity. Having a single get_max_tier_node_value is a nice thing, but might be more expensive for some cases.

That is usually the case with performance, great performance usually breaks encapsulation, that is why we should do back of the envelope estimations and benchmark.

This is often desirable for stress test as we might want to check if the component will panic under certain conditions. However, these should never be exposed by the API because, regardless of the user input, the component shouldn't ever be invalid.

I don't understand this. If the public API would transition to an invalid state, we can just use that on the test. If there isn't an API to do that, then there is nothing to test. What am I missing?

But I think we are talking about different goals. My initial motivation was exposing the testing helpers to downstream crates, so that we don't have to re-implement testing infrastructure.

vlopes11 Feb 13, 2023

For naming. I would just follow the API guidelines:

The convention is to mark these opt-out functions with a suffix like _unchecked or by placing them in a raw submodule.

https://rust-lang.github.io/api-guidelines/dependability.html#dynamic-enforcement-with-opt-out

SGTM!

I would also only introduce this after measuring significant impact in performance. For the given example it is most likely not worth it, if we are reading the data from disk, we already go over every element O(n), and there are algorithms that are O(n) for the case were data is already sorted (e..g insertion/heap sort), so we don't have significant costs even when sorting while reading (It may make sense, again, it would need benchmarking)

The example is mostly an abstraction. Sorting a set can be cheap, but if we are checking that a set is sorted for every tick of the VM, this can easily become expensive. The abstraction is the assumption that we make over the input (that is ordered, because we know the storage will always write/read it sorted), and the logical assumption is the key thing here. We can assert that via some form of type-safety (like creating a SortedList type), or via the API (our current approach).

We need to balance what we benchmark with what we define logically. Ofc, ideally, if we had production simulations/benchmark of everything, it would be awesome, but this might generate a big development overhead.

This is often desirable for stress test as we might want to check if the component will panic under certain conditions. However, these should never be exposed by the API because, regardless of the user input, the component shouldn't ever be invalid.

I don't understand this. If the public API would transition to an invalid state, we can just use that on the test. If there isn't an API to do that, then there is nothing to test. What am I missing?

Let's assume two components A and B.

B contains A
B is a critical component and shouldn't panic

In this case, it might be worthy to override the state of A to an invalid state, and stress test B to assert that, regardless of the alleged properties of A, it will not panic. However, if the API of A will not allow such transition, then this test is impossible.

But I think we are talking about different goals. My initial motivation was exposing the testing helpers to downstream crates, so that we don't have to re-implement testing infrastructure.

Yes, IMO we should go with the #[cfg(any(test, feature = "internals"))] option.

hackaugusto · 2023-02-14T17:13:17Z

hackaugusto
Feb 14, 2023
Author

2023/02/14: We had a discussion and decided to start by exposing the Process struct for testing.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Testing: Exposing internals and utilities #681

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Testing: Exposing internals and utilities #681

hackaugusto Feb 3, 2023

Replies: 2 comments · 2 replies

vlopes11 Feb 3, 2023

How should we name functions that assumes an argument is checked?

Functions that might transition the component into an invalid state.

Why can't we just create a new API on the dependency?

Why can't we create an intermediate component that will handle these states?

Is the dependency API insufficient and should be refactored?

Is this meant to only test a critical edge case of the current API?

hackaugusto Feb 11, 2023 Author

vlopes11 Feb 13, 2023

hackaugusto Feb 14, 2023 Author

hackaugusto
Feb 3, 2023

Replies: 2 comments 2 replies

vlopes11
Feb 3, 2023

hackaugusto Feb 11, 2023
Author

hackaugusto
Feb 14, 2023
Author