DEEPSEEK
deepseek
NVIDIA Enhances GEMM Kernel Tuning with Heuristics and CUTLASS 4.2
NVIDIA introduces nvMatmulHeuristics to streamline GEMM kernel tuning, reducing time and improving performance on GPUs, integrated with CUTLASS 4.2.
deepseek
NVIDIA's CUTLASS 3.x Enhances GEMM Kernel Design with Modular Abstractions
NVIDIA's CUTLASS 3.x introduces a modular, hierarchical system for GEMM kernel design, improving code readability and extending support to newer architectures like Hopper and Blackwell.
deepseek
NVIDIA Unveils Grouped GEMM APIs in cuBLAS 12.5 to Boost DL and HPC Performance
NVIDIA's cuBLAS 12.5 introduces Grouped GEMM APIs for enhanced deep learning and HPC workloads.