Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cpp optimization #3

Open
nickbrazeau opened this issue Dec 12, 2022 · 0 comments
Open

Cpp optimization #3

nickbrazeau opened this issue Dec 12, 2022 · 0 comments
Labels

Comments

@nickbrazeau
Copy link
Collaborator

From BV:

Just had a super interesting chat with Giovanni.

ccing in Jason because some of these points will also be relevant to our C++ work.

First of all, sounds like his package ("individual") is unlikely to have quite enough flexibility to handle the sorts of small-scale heterogeneities we talked about, e.g. social network structure. If you're doing anything simpler, such as a population or meta-population model, then well worth a look.

We also talked a bunch about coding structure designs that are most efficient, which I have to admit has changed my outlook somewhat and I will try and take forward myself. He said that efficiency in C++ often boils down to two things: 1) not allocating any memory during the progress of the program, and 2) making use of cached memory as much as possible.

On the first point, this argues against any sort of scheduler objects, because these will grow and shrink dynamically and so have large overheads. Fair enough.

On the second point, he said loops are super fast as long as they are doing the same action repeatedly over memory that is as close to contiguously allocated as possible. For example, rather than having a class for each individual, which holds within it their infection status, immune status, age etc, you could break all of these things out into different vectors. Then you have some sort of update_age() function that you apply to every value in the age vector (via a tight loop), before moving on to the immune update etc. This explodes the idea of having a nice simple individual class, because looping over every individual and applying all updates for that individual will involve stepping over non-contiguous memory and also jumping around between different functions. I can foresee this having pretty negative consequences in terms of code structure and organisation, because probably harder to keep track of what's going on, so there is likely to be a trade-off between code evaluation speed and debug time.

As another example, imagine you want to keep track of all infected people in your population (this is something that I explicitly do in SIMPLEGEN). One way (the way I do it) would be to have a vector of IDs of the infected people, which is added to or chopped down as new people become infected/recover. My argument for doing this is that there is no point looping over uninfected people when it comes to things like sampling a host that infected a mosquito, or checking for recovery events. But he said a better thing to do is to have a "bitset" where there is a 1/0 flag denoting for each individual whether they are infected. Looping through this bitset is incredibly fast, even if you spend a lot of time stepping over 0s. The loss due to wasted attempts is small compared to the overhead of allocating/deallocating memory, so overall it's much faster. Essentially it boils down to what extent the compiler is able to do clever optimisations by looking at the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant