Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

experience_replay_interface.py: terminal_actions defined as int, but de-facto float #83

Open
Wuodan opened this issue Aug 22, 2024 · 2 comments · May be fixed by #84
Open

experience_replay_interface.py: terminal_actions defined as int, but de-facto float #83

Wuodan opened this issue Aug 22, 2024 · 2 comments · May be fixed by #84

Comments

@Wuodan
Copy link
Contributor

Wuodan commented Aug 22, 2024

There is a bit of inconsistency in class Experience in experience_replay/experience_replay_interface.py.
terminal_actions is defined as int, the documentation says it's int or math.inf (float). And in the code it's only filed by buffer_management.py with this line:
terminal_actions = float((n_frames - 1) - i) if "race_time" in rollout_results else math.inf

So de-facto terminal_actions is always float, not just when it's math.inf.

I tried replacing replacing float()' with inf()` and math.inf with sys.maxsize and got some error like "int cannot be converted to C long" or "int to large for C long". Other easy workarounds also failed.

@Wuodan
Copy link
Contributor Author

Wuodan commented Aug 22, 2024

I tried changing the above line to
terminal_actions = n_frames - 1 - i if "race_time" in rollout_results else sys.maxsize
so terminal_actions is always int, but it fails after running for quite.
The error is:

OverflowError: Python int too large to convert to C long

I still think terminal_actions should be and can be int.

Full stack-trace on failure is:

All rollout queues were empty. Learner sleeps 1 second.
Race time ratio   3.518026737413334
 NMG=7992970 
Process Process-1:
Traceback (most recent call last):
  File "C:\Users\stefa\miniconda3\envs\linesight\Lib\multiprocessing\process.py", line 314, in _bootstrap
    self.run()
  File "C:\Users\stefa\miniconda3\envs\linesight\Lib\multiprocessing\process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "C:\development\github\Linesight-RL\linesight\trackmania_rl\multiprocess\learner_process.py", line 494, in learner_process_fn
    loss, grad_norm = trainer.train_on_batch(buffer, do_learn=True)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\development\github\Linesight-RL\linesight\trackmania_rl\agents\iqn.py", line 231, in train_on_batch
    batch, batch_info = buffer.sample(self.batch_size, return_info=True)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\stefa\miniconda3\envs\linesight\Lib\site-packages\torchrl\data\replay_buffers\replay_buffers.py", line 671, in sample
    ret = self._prefetch_queue.popleft().result()
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\stefa\miniconda3\envs\linesight\Lib\concurrent\futures\_base.py", line 456, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\stefa\miniconda3\envs\linesight\Lib\concurrent\futures\_base.py", line 401, in __get_result
    raise self._exception
  File "C:\Users\stefa\miniconda3\envs\linesight\Lib\concurrent\futures\thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\stefa\miniconda3\envs\linesight\Lib\site-packages\torchrl\data\replay_buffers\utils.py", line 72, in decorated_fun
    output = fun(self, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\stefa\miniconda3\envs\linesight\Lib\site-packages\torchrl\data\replay_buffers\replay_buffers.py", line 607, in _sample
    data = self._collate_fn(data)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\development\github\Linesight-RL\linesight\trackmania_rl\buffer_utilities.py", line 61, in buffer_collate_function
    ) = tuple(
        ^^^^^^
  File "C:\development\github\Linesight-RL\linesight\trackmania_rl\buffer_utilities.py", line 63, in <lambda>
    lambda attr_name: fast_collate_cpu(batch, attr_name),
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\development\github\Linesight-RL\linesight\trackmania_rl\buffer_utilities.py", line 38, in fast_collate_cpu
    buffer[:] = source[:]
    ~~~~~~^^^
OverflowError: Python int too large to convert to C long

@Wuodan
Copy link
Contributor Author

Wuodan commented Aug 22, 2024

There seems to be no problem when using int and 2**31 - 1 instead of math.inf.
terminal_actions = n_frames - 1 - i if "race_time" in rollout_results else 2**31 - 1

Wuodan added a commit to Wuodan/linesight that referenced this issue Aug 22, 2024
Warning was:
Expected type 'int', got 'float' instead

Fixes Linesight-RL#83
@Wuodan Wuodan linked a pull request Aug 22, 2024 that will close this issue
Wuodan added a commit to Wuodan/linesight that referenced this issue Aug 22, 2024
Warning was:
Expected type 'int', got 'float' instead

Fixes Linesight-RL#83
Wuodan added a commit to Wuodan/linesight that referenced this issue Aug 22, 2024
Warning was:
Expected type 'int', got 'float' instead
Wuodan added a commit to Wuodan/linesight that referenced this issue Aug 22, 2024
Warning was:
Expected type 'int', got 'float' instead
Wuodan added a commit to Wuodan/linesight that referenced this issue Aug 30, 2024
Warning was:
Expected type 'int', got 'float' instead

Fixes Linesight-RL#83
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant