A3C vs PPO #15

ahmad-hl · 2022-01-19T09:56:09Z

ahmad-hl
Jan 19, 2022

What is the essential part (both code and in paper) that transform Penseive-A3C to Penseive-PPO?

To my understanding, the adaptive entropy weight in Suphx: Mastering Mahjong with Deep Reinforcement Learning∗ , and clipping the policy. These two points are the update to the value function of A3C.
Am I right?

Could you please elaborate with reference to paper and page, and code and line?