🔔
🎄
🎁
🦌
🛷
NEW
INFERENCE News - Blockchain.News

CRYPTOCURRENCY

Perplexity AI Leverages NVIDIA Inference Stack to Handle 435 Million Monthly Queries
cryptocurrency

Perplexity AI Leverages NVIDIA Inference Stack to Handle 435 Million Monthly Queries

Perplexity AI utilizes NVIDIA's inference stack, including H100 Tensor Core GPUs and Triton Inference Server, to manage over 435 million search queries monthly, optimizing performance and reducing costs.

AWS Expands NVIDIA NIM Microservices for Enhanced AI Inference
cryptocurrency

AWS Expands NVIDIA NIM Microservices for Enhanced AI Inference

AWS and NVIDIA enhance AI inference capabilities by expanding NIM microservices across AWS platforms, boosting efficiency and reducing latency for generative AI applications.

NVIDIA's TensorRT-LLM Multiblock Attention Enhances AI Inference on HGX H200
cryptocurrency

NVIDIA's TensorRT-LLM Multiblock Attention Enhances AI Inference on HGX H200

NVIDIA's TensorRT-LLM introduces multiblock attention, significantly boosting AI inference throughput by up to 3.5x on the HGX H200, tackling challenges of long-sequence lengths.

Accelerating Causal Inference with NVIDIA RAPIDS and cuML
cryptocurrency

Accelerating Causal Inference with NVIDIA RAPIDS and cuML

Discover how NVIDIA RAPIDS and cuML enhance causal inference by leveraging GPU acceleration for large datasets, offering significant speed improvements over traditional CPU-based methods.

NVIDIA GH200 Superchip Boosts Llama Model Inference by 2x
cryptocurrency

NVIDIA GH200 Superchip Boosts Llama Model Inference by 2x

The NVIDIA GH200 Grace Hopper Superchip accelerates inference on Llama models by 2x, enhancing user interactivity without compromising system throughput, according to NVIDIA.

Enhancing AI Inference with NVIDIA NIM and Google Kubernetes Engine
cryptocurrency

Enhancing AI Inference with NVIDIA NIM and Google Kubernetes Engine

NVIDIA collaborates with Google Cloud to integrate NVIDIA NIM with Google Kubernetes Engine, offering scalable AI inference solutions through Google Cloud Marketplace.

NVIDIA Triton Inference Server Excels in MLPerf Inference 4.1 Benchmarks
cryptocurrency

NVIDIA Triton Inference Server Excels in MLPerf Inference 4.1 Benchmarks

NVIDIA Triton Inference Server achieves exceptional performance in MLPerf Inference 4.1 benchmarks, demonstrating its capabilities in AI model deployment.

Strategies to Optimize Large Language Model (LLM) Inference Performance
cryptocurrency

Strategies to Optimize Large Language Model (LLM) Inference Performance

NVIDIA experts share strategies to optimize large language model (LLM) inference performance, focusing on hardware sizing, resource optimization, and deployment methods.

Hugging Face Introduces Inference-as-a-Service with NVIDIA NIM for AI Developers
cryptocurrency

Hugging Face Introduces Inference-as-a-Service with NVIDIA NIM for AI Developers

Hugging Face and NVIDIA collaborate to offer Inference-as-a-Service, enhancing AI model efficiency and accessibility for developers.

Together AI Unveils Inference Engine 2.0 with Turbo and Lite Endpoints
cryptocurrency

Together AI Unveils Inference Engine 2.0 with Turbo and Lite Endpoints

Together AI launches Inference Engine 2.0, offering Turbo and Lite endpoints for enhanced performance, quality, and cost-efficiency.

Trending topics