AMD ROCm 6.1 Enhances AI and HPC Performance with New Capabilities

Luisa Crawford  Jun 20, 2024 20:09  UTC 12:09

0 Min Read

AMD has unveiled ROCm 6.1, the latest iteration of its open-source software platform designed to maximize the performance of AMD Instinct™ accelerators. According to AMD.com, the update brings a host of new features and enhancements aimed at AI and high-performance computing (HPC) developers.

Enhanced GPU Support and Ecosystem Expansion

ROCm 6.1 significantly expands its support for AMD Instinct™ and Radeon™ GPUs. The update includes optimizations across various computational domains and extends ecosystem support to keep up with rapid advancements in AI frameworks. These enhancements aim to improve the stability and performance of applications, enabling developers to push the boundaries of AI and HPC.

New Video Decoding Capabilities

The new ROCm library introduces high-performance video decoding directly on the GPU, utilizing the Video Core Next (VCN) engines built into AMD GPUs. This feature, known as rocDecode, allows compressed video to be decoded directly into video memory, minimizing data transfers over the PCIe bus and eliminating common bottlenecks in video processing. This capability is crucial for real-time applications like video scaling, color conversion, and augmentation, which are essential for advanced analytics, inferencing, and machine learning training.

Advanced Model Inference with MIGraphX

MIGraphX, the AMD graph inference engine, receives significant updates in ROCm 6.1. The engine now supports Flash Attention, which enhances the memory efficiency of transformer-based models like BERT, GPT, and Stable Diffusion. Additionally, a new Torch-MIGraphX library integrates MIGraphX capabilities directly into PyTorch workflows, supporting a range of data types including FP32, FP16, and INT8.

Improved Deep Learning with MIOpen

MIOpen, AMD's open-source deep-learning primitives library, also sees notable improvements. ROCm 6.1 introduces Find 2.0 fusion plans to optimize inference tasks and updates convolution kernels for the NHWC format, enhancing performance in various applications. These updates aim to optimize memory bandwidth and GPU launch overheads, crucial for efficient deep learning operations.

Composable Kernel and hipSPARSELt Enhancements

The Composable Kernel (CK) library in ROCm 6.1 now supports stochastic rounding, replacing the traditional FP8 rounding logic. This method improves model convergence, offering a more accurate approach to handling data within machine learning models. Additionally, hipSPARSELt introduces support for structured sparsity matrices, enhancing the flexibility and performance of Sparse Matrix-Matrix Multiplication (SPMM) operations.

Advanced Tensor Operations with hipTensor

hipTensor, AMD's dedicated C++ library for accelerating tensor operations, introduces support for 4D tensor permutation and contraction. This update broadens the scope of operations that can be optimized by hipTensor, essential for complex computational tasks such as neural network training and advanced simulations.

Overall, the ROCm 6.1 update aims to provide developers with powerful tools to unlock their innovative potential. Each enhancement is designed to improve performance, streamline workflows, and help developers achieve their goals more efficiently.



Read More