NVIDIA has announced the launch of its new NVIDIA AI Foundry service along with NVIDIA NIM™ inference microservices, aimed at revolutionizing generative AI capabilities for enterprises worldwide. The initiative features the Llama 3.1 collection of openly available models, introduced to provide businesses with advanced AI tools.
Custom AI Solutions for Enterprises
With the NVIDIA AI Foundry, enterprises and nations can now build bespoke 'supermodels' tailored to their specific industry needs using Llama 3.1 and NVIDIA's technology. These models can be trained with proprietary and synthetic data generated from Llama 3.1 405B and the NVIDIA Nemotron™ Reward model.
The AI Foundry is powered by the NVIDIA DGX™ Cloud AI platform, co-engineered with leading public cloud providers, offering scalable compute resources to meet evolving AI demands. This service aims to support enterprises and nations in developing sovereign AI strategies and custom large language models (LLMs) for domain-specific applications.
Key Industry Adoption
Accenture is the first to leverage NVIDIA AI Foundry to create custom Llama 3.1 models for its clients. Companies like Aramco, AT&T, and Uber are among the early adopters of the new Llama NVIDIA NIM microservices, indicating a strong interest across various industries.
“Meta’s openly available Llama 3.1 models mark a pivotal moment for the adoption of generative AI within the world’s enterprises,” said Jensen Huang, founder and CEO of NVIDIA. “Llama 3.1 opens the floodgates for every enterprise and industry to build state-of-the-art generative AI applications. NVIDIA AI Foundry has integrated Llama 3.1 throughout and is ready to help enterprises build and deploy custom Llama supermodels.”
Enhanced AI Capabilities
NVIDIA NIM inference microservices for Llama 3.1 are now available for download, promising up to 2.5x higher throughput compared to traditional inference methods. Enterprises can also pair these with new NVIDIA NeMo Retriever NIM microservices to create advanced AI retrieval pipelines for digital assistants and human avatars.
Accenture, utilizing its AI Refinery™ framework, is pioneering the use of NVIDIA AI Foundry to develop custom Llama 3.1 models. “The world’s leading enterprises see how generative AI is transforming every industry and are eager to deploy applications powered by custom models,” said Julie Sweet, chair and CEO of Accenture. “Accenture has been working with NVIDIA NIM inference microservices for our internal AI applications, and now, using NVIDIA AI Foundry, we can help clients quickly create and deploy custom Llama 3.1 models to power transformative AI applications for their own business priorities.”
Comprehensive AI Model Services
NVIDIA AI Foundry offers an end-to-end service that includes model curation, synthetic data generation, fine-tuning, retrieval, and evaluation. Enterprises can use Llama 3.1 models and the NVIDIA NeMo platform to create domain-specific models, with the option to generate synthetic data to enhance model accuracy.
NVIDIA and Meta have collaborated to provide a distillation recipe for Llama 3.1, enabling developers to build smaller, custom models suitable for a range of infrastructure, from AI workstations to laptops.
Leading companies across healthcare, energy, financial services, retail, transportation, and telecommunications are already integrating NVIDIA NIM microservices for Llama 3.1, trained on over 16,000 NVIDIA H100 Tensor Core GPUs.
Future Prospects
Production support for Llama 3.1 NIM and NeMo Retriever NIM microservices is available through NVIDIA AI Enterprise. Additionally, members of the NVIDIA Developer Program will soon have free access to NIM microservices for research, development, and testing.
For more information, visit the NVIDIA Newsroom.
Image source: Shutterstock