DEEPSEEK
NVIDIA Unveils Llama Nemotron Super v1.5 for Enhanced AI Efficiency
NVIDIA introduces the Llama Nemotron Super v1.5, promising improved accuracy and efficiency in AI applications, particularly in reasoning and agentic tasks.
NVIDIA Surpasses 1,000 TPS/User with Llama 4 Maverick and Blackwell GPUs
NVIDIA achieves a world-record inference speed of over 1,000 TPS/user using Blackwell GPUs and Llama 4 Maverick, setting a new standard for AI model performance.
NVIDIA Enhances Llama 3.3 70B Model Performance with TensorRT-LLM
Discover how NVIDIA's TensorRT-LLM boosts Llama 3.3 70B model inference throughput by 3x using advanced speculative decoding techniques.
AMD Ryzen AI 300 Series Enhances Llama.cpp Performance in Consumer Applications
AMD's Ryzen AI 300 series processors are boosting the performance of Llama.cpp in consumer applications, enhancing throughput and latency for language models.
NVIDIA GH200 Superchip Boosts Llama Model Inference by 2x
The NVIDIA GH200 Grace Hopper Superchip accelerates inference on Llama models by 2x, enhancing user interactivity without compromising system throughput, according to NVIDIA.
Harnessing AMD Radeon GPUs for Efficient Llama 3 Fine-Tuning
Explore the innovative methods for fine-tuning Llama 3 on AMD Radeon GPUs, focusing on reducing computational costs and enhancing model efficiency.
Boosting LLM Performance: llama.cpp on NVIDIA RTX Systems
NVIDIA enhances LLM performance on RTX GPUs with llama.cpp, offering efficient AI solutions for developers.
Ollama Enables Local Running of Llama 3.2 on AMD GPUs
Ollama makes it easier to run Meta's Llama 3.2 model locally on AMD GPUs, offering support for both Linux and Windows systems.
NVIDIA Unveils Llama 3.1-Nemotron-51B: A Leap in Accuracy and Efficiency
NVIDIA's Llama 3.1-Nemotron-51B sets new benchmarks in AI with superior accuracy and efficiency, enabling high workloads on a single GPU.
NVIDIA Enhances Llama 3.1 405B Performance with TensorRT Model Optimizer
NVIDIA's TensorRT Model Optimizer significantly boosts performance of Meta's Llama 3.1 405B large language model on H200 GPUs.