3.9.1. Whole task on GPU

The necessity of the latter is clear from considering the unacceptable cost that would be imposed on the CPU if the many check_ready() from 5120 GPU cores were to be executed on the CPU. If, for example, a Volta GPU is associated with a 48 CPU, each core on the CPU would have to hande the check_ready() testing of 100 GPU cores, and even though these cores are slower, the cost would still be unacceptable.

Fortunately, the features of the Volta V100 GPU are perfectly suited to handle all aspects of the task update cycle. The new capability to allow asynchronous thread evolution in a block, illusrated on pages PP-PP in the nVidia Volta Whitepaper allows even the requeing of tasks to take place on the GPU.