Oracle Expands NVIDIA GPU Instances on OCI for AI and Digital Twins

Terrill Dicki  Aug 01, 2024 10:03  UTC 02:03

0 Min Read

Oracle Cloud Infrastructure (OCI) has announced the availability of NVIDIA L40S GPU bare-metal instances, according to NVIDIA Blog. This expansion aims to meet the growing demand for advanced technologies like generative AI, large language models (LLMs), and digital twins.

NVIDIA L40S Now Available to Order on OCI

The NVIDIA L40S GPU is designed to deliver multi-workload acceleration for various applications, including generative AI, graphics, and video. It features fourth-generation Tensor Cores and supports the FP8 data format, making it ideal for training and fine-tuning small- to mid-size LLMs and performing inference across a wide range of use cases.

For instance, a single L40S GPU can generate up to 1.4 times more tokens per second than a single NVIDIA A100 Tensor Core GPU for Llama 3 8B with NVIDIA TensorRT-LLM. The L40S also excels in graphics and media acceleration, making it suitable for advanced visualization and digital twin applications. It delivers up to 3.8 times the real-time ray-tracing performance of its predecessor and supports NVIDIA DLSS 3 for faster rendering and smoother frame rates.

OCI will offer the L40S GPU in its BM.GPU.L40S.4 bare-metal compute shape, featuring four NVIDIA L40S GPUs, each with 48GB of GDDR6 memory. This setup includes local NVMe drives with 7.38TB capacity, 4th Generation Intel Xeon CPUs with 112 cores, and 1TB of system memory. These configurations eliminate virtualization overhead for high-throughput and latency-sensitive AI or machine learning workloads.

“We chose OCI AI infrastructure with bare-metal instances and NVIDIA L40S GPUs for 30% more efficient video encoding,” said Sharon Carmel, CEO of Beamr Cloud. “This will reduce storage and network bandwidth consumption by up to 50%, speeding up file transfers and increasing productivity for end users.”

Single-GPU H100 VMs Coming Soon on OCI

OCI will soon introduce the VM.GPU.H100.1 compute virtual machine shape, accelerated by a single NVIDIA H100 Tensor Core GPU. This new offering aims to provide cost-effective, on-demand access for enterprises looking to leverage the power of NVIDIA H100 GPUs for their generative AI and high-performance computing (HPC) workloads.

A single H100 GPU can generate more than 27,000 tokens per second for Llama 3 8B, offering up to four times the throughput of a single A100 GPU at FP16 precision. The VM.GPU.H100.1 shape includes 2×3.4TB of NVMe drive capacity, 13 cores of 4th Gen Intel Xeon processors, and 246GB of system memory, making it well-suited for a range of AI tasks.

GH200 Bare-Metal Instances Available for Validation

OCI has also made the BM.GPU.GH200 compute shape available for customer testing. This shape features the NVIDIA Grace Hopper Superchip and NVLink-C2C, providing a high-bandwidth, cache-coherent 900GB/s connection between the NVIDIA Grace CPU and Hopper GPU. This setup enables up to 10 times higher performance for applications running terabytes of data compared to the NVIDIA A100 GPU.

Optimized Software for Enterprise AI

Maximizing the potential of GPU-accelerated compute instances requires an optimized software layer. NVIDIA NIM, part of the NVIDIA AI Enterprise software platform available on the OCI Marketplace, offers a set of microservices designed for secure, reliable deployment of high-performance AI model inference.

Optimized for NVIDIA GPUs, NIM pre-built containers offer improved cost of ownership, faster time to market, and enhanced security. These microservices can be easily deployed on OCI, enabling enterprises to develop world-class generative AI applications.

For more information, visit the NVIDIA Blog.



Read More