This repository has been archived by the owner on Mar 21, 2024. It is now read-only.
CUB 1.7.0
Summary
CUB 1.7.0 brings support for CUDA 9.0 and SM7x (Volta) GPUs. It is compatible with independent thread scheduling. It was incorporated into Thrust 1.9.2.
Breaking Changes
- Remove
cub::WarpAll
andcub::WarpAny
. These functions served to emulate__all
and__any
functionality for SM1x devices, which did not have those operations. However, SM1x devices are now deprecated in CUDA, and the interfaces of these two functions are now lacking the lane-mask needed for collectives to run on SM7x and newer GPUs which have independent thread scheduling.
Other Enhancements
- Remove any assumptions of implicit warp synchronization to be compatible with SM7x's (Volta) independent thread scheduling.
Bug Fixes
- #86: Incorrect results with reduce-by-key.