Different DQN variations

Over the years, different variations of the classic DQN have appeared each with their own attempt at reducing the amount of data needed to learn i.e. data efficiency and increasing the overall performance racked against humans at the ATARI benchmarks. These variations are listed below.

Classic DQN

Double DQN

Prioritised Replay

Dueling Networks

where is the value function and is the advantage function.

Multi-step Returns

Distributional RL

where is the projection operator as explained in the original distributional RL paper. The cross entropy is minimised here instead of loss function as in classic DQN.

Important to remember that is usually fixed in these algorithms but it can be learnt however for each different time-step. For a fixed gamma the time-horizon can be computed as

Therefore, the effective time-horizon for is 100 time-steps.

Written on September 10, 2017