-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using the new t_start
/t_unit
attributes in RandomVariable
, remove padding whenever possible (hopefully everywhere).
#216
Comments
timeseries_*
attributes in RandomVariable
, remove padding whenever possible (hopefully everywhere).
(all content now moved to the main section of the issue) |
@CDCgov/multisignal-epi-inference-devs, I just finished a first pass on the notes about the implementation of the time-aware component. Please take a look and let me know what you think and if I'm missing something. Once I get some feedback, I can start working on a proposal for approaching this issue. I am taking more time to think about this problem as it could possible have a big impact across the package. |
Thanks for writing this up, @gvegayon! I am inferring that you think it is important or desirable to have all RV's output objects of the same length. Is that correct? If so, why? For example, why would you want the overdispersion parameter of a observation process to have the same length as the number of latent infections? Is the idea that all of these parameters could or should be time-varying? Quotes that I read that gave me this impression:
|
Great question, @damonbayer! That would be my first first approach. As we have done in other places with the model, matching lengths and padding with 0/NaN just works. We could try to be more fancy about it, but keeping all of the same length should make things easier, I believe. Let me think more about it. |
Happy to discuss more in detail, but, overall, I think some combination of "During model setup" and "During model run" will be the best approach. |
Thanks for today's discussion, @damonbayer and @dylanhmorris. This is what I got out of it:
|
As a note for our future selves: We decided to call these |
timeseries_*
attributes in RandomVariable
, remove padding whenever possible (hopefully everywhere).t_start
/t_unit
attributes in RandomVariable
, remove padding whenever possible (hopefully everywhere).
@dylanhmorris @gvegayon We did not address padding in #262 |
Closing this for now, as padding is not really used in any of the models, and I am not convinced it should be. |
Implementation notes
The main question right now is when to apply the time adjustments (i.e., padding, shifting, etc.). Note that it is not possible to request all calls to
RandomVariable.sample()
to output data of the same length as the observed data. Take the seeding process for example, which starts before the observation process, so it has a different length.During RV instantiation
We could have the RVs be time-aware right during instantiation. The following points should be considered for this approach:
The benefit: Mostly computational as we would reduce the number of calculations and may be able to pre-allocate some data (unsure about this last bit).
Requirement(s): We would need to add a new argument for all constructors, namely,
model_t_unit
andn_obs
, to adequately broadcast/aggregate the RV to match the model. So that way, allRandomVariable.sample()
calls would generate the same length array.Pitfall(s): You need to have a clear idea of the full-time range of the model before running the model. Adding a seeding process or extending a time series beyond observed data becomes problematic.
What happens if the length of the output of a call to
RandomVariable.sample()
is not easy to get before callingModel.run()
? Is this case relevant? I can't think of a specific method/model in which this could happen, but if it does, it would be a pain.During model setup
During the setup of the model, which is automatically called as a function of the
Model.run()
method, we could extractt_unit
andt_start
from all RVs and store that information for adjusting the RVs output as needed.The benefit: It would still be done as a init step, so it would save computations making the whole thing faster.
Requirement(s): Require a way to list all RVs instances within a model. This wouldn't need to be recursive. As long as RVs within the model
sample
method are identified, it should be OK. An alternative would be to add a way to record RVs for the model; this could be added to theModel.validate()
method.Using the retrieved information, the model could wrap
RandomVariable.sample
to broadcast/pad/summarize data as needed. This could be implemented via a decorator.Pitfall(s): Like the previous method, it would probably be complicated to anticipate the final size of all arrays involved in the process.
During model run
This would be the simplest way to approach the issue. During run time, the model is fully aware of the length of the RVs as we can directly inspect the resulting arrays by calling
RandomVariable.sample()
:Benefit(s): Less complicated implementation. We adjust/query the array sizes when needed. This is also less automagical, which could be an issue.
Requirement(s): The resulting tuples and namedtuples from the
RandomVariable.sample()
method are not time-aware. Therefore, if we wanted to sync these arrays, we would have to do so using both the RV member and the output from the sample function. Following thetime.qmd
vignette, the actual implementation would be along these lines (an awkward implementation):To avoid a clunky implementation, we may need to create a wrapper class of the output from
RandomVariable.sample()
. That way, the previous code would be reduced to:Pitfall(s): Even if we can live with that way of always passing both the RV and the array, we would still need to deal with potential computational bottle-necks, as the calculations adjusting size may be relatively inefficient.
The text was updated successfully, but these errors were encountered: