The main point of the performance of a code for a GPU (Graphical Processing Unit) is data locality. For the PIC method this means that all the particles belonging to one cell must be located closely in memory. During the particle push the particles might move to other cells, and must be transported to a different place in memory (to a different cell). This is called particle reordering.
For the first time the particle reordering technique is proposed that involves no critical sections, semaphores, mutexes, atomic operations, etc. This results in almost 10 times redution of the reordering time compared to the straightforward reordering algorithm.