NVIDIA has announced a collaboration with Hugging Face aimed at simplifying the deployment of generative AI models. This partnership leverages NVIDIA's NIM (NVIDIA Inference Microservices) technology to enhance the accessibility and efficiency of deploying AI models on Hugging Face, a leading platform for AI developers.
Enhancing AI Model Performance with NVIDIA NIM
As the demand for generative AI continues to grow, NVIDIA is optimizing foundational models to boost performance, reduce operational costs, and improve user experience. According to NVIDIA's official blog, NIM is designed to streamline and accelerate the deployment of generative AI models across various infrastructures including cloud, data centers, and workstations.
NIM utilizes the TensorRT-LLM inference optimization engine, industry-standard APIs, and prebuilt containers to provide low-latency and high-throughput AI inference. It supports a wide array of large language models (LLMs) such as Llama 3, Mixtral 8x22B, Phi-3, and Gemma, and offers optimizations for domain-specific applications in speech, image, video, and healthcare.
Deploying NIM on Hugging Face
NVIDIA's partnership with Hugging Face aims to make deploying these optimized models more accessible. Developers can now deploy models like Llama 3 8B and 70B directly on their preferred cloud service providers through Hugging Face, enabling enterprises to generate text up to 3x faster.
The deployment process is straightforward:
- Navigate to the Llama 3 model page on Hugging Face and select 'NVIDIA NIM Endpoints' from the deployment menu.
- Choose the preferred cloud service provider and instance type, such as A10G/A100 on AWS or A100/H100 on GCP.
- Select 'NVIDIA NIM' from the container type drop-down menu in the advanced configuration section and create the endpoint.
- Within minutes, an inference endpoint will be up and running, allowing developers to start making API calls to the model.
This collaboration ensures high throughput and near-100% utilization with multiple concurrent requests, significantly boosting enterprise revenue by increasing token processing efficiency.
Future Prospects
The integration of NVIDIA NIM with Hugging Face is expected to enhance the adoption of generative AI applications across various industries. With over 40 multimodal NIMs available, developers can prototype and deploy AI solutions more rapidly and cost-effectively.
To explore and prototype applications using NVIDIA's offerings, developers can visit ai.nvidia.com. The platform also offers free NVIDIA cloud credits for building and testing prototype applications, making it easier for developers to integrate NVIDIA-hosted API endpoints with minimal code.
Image source: Shutterstock