You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Mar 15, 2022. It is now read-only.
seeds[i] if seeds is not None else numpy.random.randint(2**16)
The issue with this line is that if a user runs multiple copies of their code in parallel on a cluster, these often get initially seeded when numpy is imported by the system time, which can result in their internal random number generators being identical. This propagates through to any future seeding using these internal random number generators, and winds up giving correlated data that the user doesn't expect.
As we give the user the option to specify their own seeds, they can definitely circumvent the issue themselves, but if they don't know about the problem, this becomes a notoriously hard error to find and debug, as it usually only presents as seemingly super random correlations / signal noise being larger than expected (and it doesn't replicate easily).
Also, I'm slightly worried that passing around seeds and updating numpy.random with them can lead to some really funky behaviour if numpy is called separately in two files (at least, I've observed this in the past) - namely, that there can be multiple internal rngs hiding behind the scenes.
I don't know if there's a 'standard' method for fixing this, but I have two suggestions: firstly, I would suggest adding a warning whenever we need to seed a rng and the user doesn't provide a seed to use. Secondly, I would suggest passing explicit numpy.random.RandomStates around instead of seeds for numpy.random, as this makes it easier to keep track of what rngs we actually have.
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
So I just saw the seeding behaviour in _study.py, e.g. at:
OpenFermion-Cirq/openfermioncirq/variational/study.py
Line 365 in d2f5a87
the offending statement being the line:
The issue with this line is that if a user runs multiple copies of their code in parallel on a cluster, these often get initially seeded when numpy is imported by the system time, which can result in their internal random number generators being identical. This propagates through to any future seeding using these internal random number generators, and winds up giving correlated data that the user doesn't expect.
As we give the user the option to specify their own seeds, they can definitely circumvent the issue themselves, but if they don't know about the problem, this becomes a notoriously hard error to find and debug, as it usually only presents as seemingly super random correlations / signal noise being larger than expected (and it doesn't replicate easily).
Also, I'm slightly worried that passing around seeds and updating numpy.random with them can lead to some really funky behaviour if numpy is called separately in two files (at least, I've observed this in the past) - namely, that there can be multiple internal rngs hiding behind the scenes.
I don't know if there's a 'standard' method for fixing this, but I have two suggestions: firstly, I would suggest adding a warning whenever we need to seed a rng and the user doesn't provide a seed to use. Secondly, I would suggest passing explicit numpy.random.RandomStates around instead of seeds for numpy.random, as this makes it easier to keep track of what rngs we actually have.
The text was updated successfully, but these errors were encountered: