New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Dueling dqn equation #1

Open

HencyChen opened this issue Jan 15, 2018 · 1 comment

HencyChen commented Jan 15, 2018

Thanks for offering this wonderful code. But I have a question.

Why in the combination part of the equation, the advantage A need to subtract it's average? I've already refer to the paper but still don't understand.

HareshKarnan commented Sep 28, 2020

^ because of the fact that there can be multiple V(s) and A(s,a) that satisfy the Advantage equation. For example,

Q(s,a) = V(s) + A(s,a) = (V(s)+c) + (A(s,a)-c)

So, to learn that unique V and A, you subtract mean of Advantage for actions so the advantage for the optimal action is 0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment