Offload thrust::sort to GPU with use of cached allocator (!3) · Merge requests · Julian Adamek / LATfield2

Ioannis Magkanaris requested to merge ioannmag/gpu-particle-updated-sort-gpu into gpu-particles Oct 18, 2024

As promised, I've worked a bit into enabling thrust::sort on the GPU and using a cached allocator (as suggested by @lmosimann ) for it to avoid multiple cudaMallocs that can be a measurable overhead. The cached allocator could be potentially improved by freeing unused blocks frequently, etc. I tried to just provide an initial implementation to give an idea how something like this could work. There's significant speed up in multiple parts because of the sorting algorithm offloading to the GPU so 🚀

Offload thrust::sort to GPU with use of cached allocator

Merge request reports