INFERENCE News - Blockchain.News

DEEPSEEK

DeepSeek is an AI company and a family of large language models based in Hangzhou, China. It was founded in 2023 and funded by High-Flyer, a well - known quantitative asset management giant. DeepSeek is dedicated to developing advanced large language models and related technologies. It has released several models, including DeepSeek LLM, DeepSeek Coder, DeepSeekMath, and DeepSeek - VL. The latest version, DeepSeek - V3, which was launched in December 2024, has 67.1 billion parameters and was trained on a dataset of 14.8 trillion tokens. It uses FP8 training and open - sources the native FP8 weights. Benchmark tests show that it outperforms Llama 3.1 and Qwen 2.5 while matching GPT - 4O and Claude 3.5 Sonnet. In addition, DeepSeek - R1, which was officially released on January 20, 2025, performs on a par with OpenAI O1 in terms of mathematics, code, and natural language reasoning tasks. DeepSeek's models have a wide range of applications, such as chat and coding scenarios, multilingual automatic translation, image generation, and AI painting. With their high performance and low cost, DeepSeek's models have quickly gained popularity. For example, on February 2, 2025, the DeepSeek app climbed to the top of the download charts in 140 countries on the Apple App Store and also topped the Android Play Store in the United States

deepseek

AutoJudge Revolutionizes LLM Inference with Enhanced Token Processing

AutoJudge introduces a novel method to accelerate large language model inference by optimizing token processing, reducing human annotation needs, and improving processing speed with minimal accuracy loss.

by Caroline Bishop
Dec 05, 2025

deepseek

Together AI Sets New Benchmark with Fastest Inference for Open-Source Models

Together AI achieves unprecedented speed in open-source model inference, leveraging GPU optimization and quantization techniques to outperform competitors on NVIDIA Blackwell architecture.

by Felix Pinkston
Dec 02, 2025

deepseek

NVIDIA's Breakthrough: 4x Faster Inference in Math Problem Solving with Advanced Techniques

NVIDIA achieves a 4x faster inference in solving complex math problems using NeMo-Skills, TensorRT-LLM, and ReDrafter, optimizing large language models for efficient scaling.

by Terrill Dicki
Nov 10, 2025

deepseek

NVIDIA Grove Simplifies AI Inference on Kubernetes

NVIDIA introduces Grove, a Kubernetes API that streamlines complex AI inference workloads, enhancing scalability and orchestration of multi-component systems.

by Caroline Bishop
Nov 10, 2025

deepseek

NVIDIA Enhances AI Inference with Dynamo and Kubernetes Integration

NVIDIA's Dynamo platform now integrates with Kubernetes to streamline AI inference management, offering improved performance and reduced costs for data centers, according to NVIDIA's latest updates.

by James Ding
Nov 10, 2025

deepseek

NVIDIA Blackwell Outshines in InferenceMAX™ v1 Benchmarks

NVIDIA's Blackwell architecture demonstrates significant performance and efficiency gains in SemiAnalysis's InferenceMAX™ v1 benchmarks, setting new standards for AI hardware.

by Luisa Crawford
Oct 10, 2025

deepseek

NVIDIA Blackwell Dominates InferenceMAX Benchmarks with Unmatched AI Efficiency

NVIDIA's Blackwell platform excels in the latest InferenceMAX v1 benchmarks, showcasing superior AI performance and efficiency, promising significant return on investment for AI factories.

by Tony Kim
Oct 10, 2025

deepseek

Enhancing LLM Inference with NVIDIA Run:ai and Dynamo Integration

NVIDIA's Run:ai v2.23 integrates with Dynamo to address large language model inference challenges, offering gang scheduling and topology-aware placement for efficient, scalable deployments.

by Lawrence Jengar
Sep 29, 2025

deepseek

NVIDIA Dynamo Tackles KV Cache Bottlenecks in AI Inference

NVIDIA Dynamo introduces KV Cache offloading to address memory bottlenecks in AI inference, enhancing efficiency and reducing costs for large language models.

by Rebeca Moen
Sep 19, 2025

deepseek

Reducing AI Inference Latency with Speculative Decoding

Explore how speculative decoding techniques, including EAGLE-3, reduce latency and enhance efficiency in AI inference, optimizing large language model performance on NVIDIA GPUs.

by Terrill Dicki
Sep 18, 2025

DEEPSEEK

Trending topics