Most grasping attempts in computer vision and robotics have primilarily been based on RGB images. Cameras are cheap and if you have a decent infrastructure to collect dataset (either in real world or simulations) you are in a very good spot to play with robotic grasping with vision. Grasping is a challenging problem but humans are often very good at determining an appropriate grasp location to hold an object or doing any complicated manipulation. This is largely in part due to our superior tactile sensing and extreme dexterity that the human hand offers.
Over the years, different variations of the classic DQN have appeared each with their own attempt at reducing the amount of data needed to learn i.e. data efficiency and increasing the overall performance racked against humans at the ATARI benchmarks. These variations are listed below.
An important step towards photo-immersive virtual reality (or computer generated photo-realistic renderings that our visual system can be duped and tricked easily to believe that it is real) is to be able to approximate the light transport happening in the real world. Real world is very chaotic and various complicated phenomena happen as photons travel from one medium to another in space. Obviously, our visual system is not capable of sampling the world at anything beyond 60Hz and therefore, we can never see in reality how photons move around, bounce off and interact with each other. However, given simple laws of optics, we can write down the light transport equation to a sufficient degree of accuracy that allows us to create a faithful representation of real world via ray tracing on a computer. The classic rendering equation lies at the heart of all ray tracers except that each one uses a different way to approximate and solve the equation. In its simplest form, the rendering equation is the following