Using the new `t_start`/`t_unit` attributes in `RandomVariable`, remove padding whenever possible (hopefully everywhere). #216

gvegayon · 2024-06-26T19:00:15Z

Implementation notes

The main question right now is when to apply the time adjustments (i.e., padding, shifting, etc.). Note that it is not possible to request all calls to RandomVariable.sample() to output data of the same length as the observed data. Take the seeding process for example, which starts before the observation process, so it has a different length.

During RV instantiation

We could have the RVs be time-aware right during instantiation. The following points should be considered for this approach:

The benefit: Mostly computational as we would reduce the number of calculations and may be able to pre-allocate some data (unsure about this last bit).
Requirement(s): We would need to add a new argument for all constructors, namely, model_t_unit and n_obs, to adequately broadcast/aggregate the RV to match the model. So that way, all RandomVariable.sample() calls would generate the same length array.
Pitfall(s): You need to have a clear idea of the full-time range of the model before running the model. Adding a seeding process or extending a time series beyond observed data becomes problematic.

What happens if the length of the output of a call to RandomVariable.sample() is not easy to get before calling Model.run()? Is this case relevant? I can't think of a specific method/model in which this could happen, but if it does, it would be a pain.

During model setup

During the setup of the model, which is automatically called as a function of the Model.run() method, we could extract t_unit and t_start from all RVs and store that information for adjusting the RVs output as needed.

The benefit: It would still be done as a init step, so it would save computations making the whole thing faster.
Requirement(s): Require a way to list all RVs instances within a model. This wouldn't need to be recursive. As long as RVs within the model sample method are identified, it should be OK. An alternative would be to add a way to record RVs for the model; this could be added to the Model.validate() method.

Using the retrieved information, the model could wrap RandomVariable.sample to broadcast/pad/summarize data as needed. This could be implemented via a decorator.
Pitfall(s): Like the previous method, it would probably be complicated to anticipate the final size of all arrays involved in the process.

During model run

This would be the simplest way to approach the issue. During run time, the model is fully aware of the length of the RVs as we can directly inspect the resulting arrays by calling RandomVariable.sample():

Benefit(s): Less complicated implementation. We adjust/query the array sizes when needed. This is also less automagical, which could be an issue.
Requirement(s): The resulting tuples and namedtuples from the RandomVariable.sample() method are not time-aware. Therefore, if we wanted to sync these arrays, we would have to do so using both the RV member and the output from the sample function. Following the time.qmd vignette, the actual implementation would be along these lines (an awkward implementation):
```
Rt_aligned, infections_aligned = align([
    [self.rt_rv, Rt],
    [self.inf_rv, infections]
])
```
To avoid a clunky implementation, we may need to create a wrapper class of the output from RandomVariable.sample(). That way, the previous code would be reduced to:
```
Rt_aligned, infections_aligned = align([Rt, infections])
```
Pitfall(s): Even if we can live with that way of always passing both the RV and the array, we would still need to deal with potential computational bottle-necks, as the calculations adjusting size may be relatively inefficient.

The text was updated successfully, but these errors were encountered:

gvegayon · 2024-07-03T23:00:00Z

(all content now moved to the main section of the issue)

gvegayon · 2024-07-09T18:53:15Z

@CDCgov/multisignal-epi-inference-devs, I just finished a first pass on the notes about the implementation of the time-aware component. Please take a look and let me know what you think and if I'm missing something. Once I get some feedback, I can start working on a proposal for approaching this issue. I am taking more time to think about this problem as it could possible have a big impact across the package.

damonbayer · 2024-07-09T19:52:13Z

Thanks for writing this up, @gvegayon!

I am inferring that you think it is important or desirable to have all RV's output objects of the same length. Is that correct? If so, why? For example, why would you want the overdispersion parameter of a observation process to have the same length as the number of latent infections? Is the idea that all of these parameters could or should be time-varying?

Quotes that I read that gave me this impression:

Note that it is not possible to request all calls to RandomVariable.sample() to output data of the same length as the observed data.

So that way, all RandomVariable.sample() calls would generate the same length array.

gvegayon · 2024-07-09T20:54:05Z

Great question, @damonbayer! That would be my first first approach. As we have done in other places with the model, matching lengths and padding with 0/NaN just works. We could try to be more fancy about it, but keeping all of the same length should make things easier, I believe. Let me think more about it.

damonbayer · 2024-07-09T22:33:52Z

Happy to discuss more in detail, but, overall, I think some combination of "During model setup" and "During model run" will be the best approach.

gvegayon · 2024-07-10T15:43:26Z

Thanks for today's discussion, @damonbayer and @dylanhmorris. This is what I got out of it:

Not everything needs to be of length $n$. Mainly, it would be good to allow scalars to stay as scalars. Everything else will be length-adjusted as needed (so not always).
Since the information about time is relevant to the sample output, we will bring it closer by creating a NameTuple instance called TimeArray. This will replace the Jax arrays, allowing us to pass along information about time to returns of RandomVariable.sample().
Again, regarding standardizing length, not all need to be $n$. As I implement this, I will keep array resizing to a minimum. We will have a better idea of how good/bad would be to have everything matching in length.
There will be an issue when retrieving stored sites. Internally, TimeArray will hold all the needed information, but the stored arrays won't have access to time-related information. So, for instance, if RandomVariable.sample() returns a biweekly array, outside of the model call we won't be able to much it just yet. This is a shortcoming that we will need to address in the future.

gvegayon · 2024-07-25T15:07:42Z

As a note for our future selves: We decided to call these SampledValue instead of TimeArray. The PR associated with this issue ensures all RandomVariable.sample() calls return tuple/namedtuple of SampledValue.

damonbayer · 2024-07-26T20:42:23Z

@dylanhmorris @gvegayon We did not address padding in #262

damonbayer · 2024-08-27T14:31:49Z

Closing this for now, as padding is not really used in any of the models, and I am not convinced it should be.

gvegayon mentioned this issue Jun 26, 2024

Discuss time handling (related to data class) #185

Closed

6 tasks

gvegayon changed the title ~~Using the new attributes, remove padding whenever possible (hopefully everywhere).~~ Using the new timeseries_* attributes in RandomVariable, remove padding whenever possible (hopefully everywhere). Jun 26, 2024

gvegayon added this to the 🐏 Micropachycephalosaurus milestone Jun 26, 2024

damonbayer modified the milestones: 🐏 Micropachycephalosaurus, N Sprint Jul 9, 2024

gvegayon linked a pull request Jul 11, 2024 that will close this issue

Implements SampledValue #262

Merged

damonbayer added the pyrenew related to pyrenew internals label Jul 12, 2024

damonbayer modified the milestones: N Sprint, O sprint Jul 19, 2024

dylanhmorris changed the title ~~Using the new timeseries_* attributes in RandomVariable, remove padding whenever possible (hopefully everywhere).~~ Using the new t_start/t_unit attributes in RandomVariable, remove padding whenever possible (hopefully everywhere). Jul 25, 2024

dylanhmorris closed this as completed in #262 Jul 26, 2024

damonbayer reopened this Jul 26, 2024

damonbayer modified the milestones: 🦆 Olorotitan, P Sprint, Q Sprint Aug 15, 2024

damonbayer closed this as not planned Won't fix, can't repro, duplicate, stale Aug 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using the new `t_start`/`t_unit` attributes in `RandomVariable`, remove padding whenever possible (hopefully everywhere). #216

Using the new `t_start`/`t_unit` attributes in `RandomVariable`, remove padding whenever possible (hopefully everywhere). #216

gvegayon commented Jun 26, 2024 •

edited

Loading

gvegayon commented Jul 3, 2024 •

edited

Loading

gvegayon commented Jul 9, 2024

damonbayer commented Jul 9, 2024 •

edited

Loading

gvegayon commented Jul 9, 2024

damonbayer commented Jul 9, 2024

gvegayon commented Jul 10, 2024

gvegayon commented Jul 25, 2024

damonbayer commented Jul 26, 2024

damonbayer commented Aug 27, 2024

Using the new t_start/t_unit attributes in RandomVariable, remove padding whenever possible (hopefully everywhere). #216

Using the new t_start/t_unit attributes in RandomVariable, remove padding whenever possible (hopefully everywhere). #216

Comments

gvegayon commented Jun 26, 2024 • edited Loading

Implementation notes

During RV instantiation

During model setup

During model run

gvegayon commented Jul 3, 2024 • edited Loading

gvegayon commented Jul 9, 2024

damonbayer commented Jul 9, 2024 • edited Loading

gvegayon commented Jul 9, 2024

damonbayer commented Jul 9, 2024

gvegayon commented Jul 10, 2024

gvegayon commented Jul 25, 2024

damonbayer commented Jul 26, 2024

damonbayer commented Aug 27, 2024

Using the new `t_start`/`t_unit` attributes in `RandomVariable`, remove padding whenever possible (hopefully everywhere). #216

Using the new `t_start`/`t_unit` attributes in `RandomVariable`, remove padding whenever possible (hopefully everywhere). #216

gvegayon commented Jun 26, 2024 •

edited

Loading

gvegayon commented Jul 3, 2024 •

edited

Loading

damonbayer commented Jul 9, 2024 •

edited

Loading