Hugging Face Introduces Inference-as-a-Service with NVIDIA NIM for AI Developers

Timothy Morano  Jul 30, 2024 14:37  UTC 06:37

0 Min Read

Hugging Face, a leading AI community platform, is now offering developers Inference-as-a-Service powered by NVIDIA's NIM microservices, according to NVIDIA Blog. The service aims to boost token efficiency by up to five times with popular AI models and provide immediate access to NVIDIA DGX Cloud.

Enhanced AI Model Efficiency

This new service, announced at the SIGGRAPH conference, allows developers to rapidly deploy leading large language models, including the Llama 3 family and Mistral AI models. These models are optimized using NVIDIA NIM microservices running on NVIDIA DGX Cloud.

Developers can prototype with open-source AI models hosted on the Hugging Face Hub and deploy them in production seamlessly. Enterprise Hub users can leverage serverless inference for increased flexibility, minimal infrastructure overhead, and optimized performance.

Streamlined AI Development

The Inference-as-a-Service complements the existing Train on DGX Cloud service, which is already available on Hugging Face. This integration provides developers with a centralized hub to compare various open-source models, experiment, test, and deploy cutting-edge models on NVIDIA-accelerated infrastructure.

The tools are easily accessible through the “Train” and “Deploy” drop-down menus on Hugging Face model cards, enabling users to get started with just a few clicks.

NVIDIA NIM Microservices

NVIDIA NIM is a collection of AI microservices, including NVIDIA AI foundation models and open-source community models, optimized for inference using industry-standard APIs. NIM offers higher efficiency in processing tokens, improving the efficiency of the underlying NVIDIA DGX Cloud infrastructure and increasing the speed of critical AI applications.

For example, the 70-billion-parameter version of Llama 3 delivers up to 5x higher throughput when accessed as a NIM compared to off-the-shelf deployment on NVIDIA H100 Tensor Core GPU-powered systems.

Accessible AI Acceleration

The NVIDIA DGX Cloud platform is purpose-built for generative AI, offering developers easy access to reliable accelerated computing infrastructure. This platform supports every step of AI development, from prototype to production, without requiring long-term AI infrastructure commitments.

Hugging Face’s Inference-as-a-Service on NVIDIA DGX Cloud, powered by NIM microservices, offers easy access to compute resources optimized for AI deployment. This enables users to experiment with the latest AI models in an enterprise-grade environment.

More Announcements at SIGGRAPH

At the SIGGRAPH conference, NVIDIA also introduced generative AI models and NIM microservices for the OpenUSD framework. This aims to accelerate developers’ abilities to build highly accurate virtual worlds for the next evolution of AI.

For more information, visit the official NVIDIA Blog.



Read More