Sorting-Free GPU Kernels for LLM Sampling

Shanli Xing (UW), Zihao Ye (UW, NVIDIA), Bohan Hou (CMU), Luis Ceze (UW, NVIDIA), Tianqi Chen (CMU, NVIDIA)
MLSys