NVIDIA Run:ai v2.24 Tackles GPU Scheduling Fairness for AI Workloads

Caroline Bishop   Jan 29, 2026 01:39  UTC 17:39

0 Min Read

NVIDIA has released Run:ai v2.24 with a time-based fairshare scheduling mode that addresses a persistent headache for organizations running AI workloads on shared GPU clusters: teams with smaller, frequent jobs starving out teams that need burst capacity for larger training runs.

The feature, built on NVIDIA's open-source KAI Scheduler, gives the scheduling system memory. Rather than making allocation decisions based solely on what's happening right now, the scheduler tracks historical resource consumption and adjusts queue priorities accordingly. Teams that have been hogging resources get deprioritized; teams that have been waiting get bumped up.

Why This Matters for AI Operations

The problem sounds technical but has real business consequences. Picture two ML teams sharing a 100-GPU cluster. Team A runs continuous computer vision training jobs. Team B occasionally needs 60 GPUs for post-training runs after analyzing customer feedback. Under traditional fair-share scheduling, Team B's large job can sit in queue indefinitely—every time resources free up, Team A's smaller jobs slot in first because they fit within the available capacity.

The timing aligns with broader industry trends. According to recent Kubernetes predictions for 2026, AI workloads are becoming the primary driver of Kubernetes growth, with cloud-native job queueing systems like Kueue expected to see major adoption increases. GPU scheduling and distributed training operators rank among the key updates shaping the ecosystem.

How It Works

Time-based fairshare calculates each queue's effective weight using three inputs: the configured weight (what a team should get), actual usage over a configurable window (default: one week), and a K-value that determines how aggressively the system corrects imbalances.

When a queue has consumed more than its proportional share, its effective weight drops. When it's been starved, the weight gets boosted. Guaranteed quotas—the resources each team is entitled to regardless of what others are doing—remain protected throughout.

A few implementation details worth noting: usage is measured against total cluster capacity, not against what other teams consumed. This prevents penalizing teams for using GPUs that would otherwise sit idle. Priority tiers still function normally, with high-priority queues getting resources before lower-priority ones regardless of historical usage.

Configuration and Testing

Settings are configured per node-pool, letting administrators experiment on a dedicated pool without affecting production workloads. NVIDIA has also released an open-source time-based fairshare simulator for the KAI Scheduler, allowing teams to model queue allocations before deployment.

The feature ships with Run:ai v2.24 and is available through the platform UI. Organizations running the open-source KAI Scheduler can enable it via configuration steps in the project documentation.

For enterprises scaling AI infrastructure, the release addresses a genuine operational pain point. Whether it moves the needle on NVIDIA's stock—currently trading around $89,128 with minimal 24-hour movement—depends on broader adoption patterns. But for ML platform teams tired of fielding complaints about stuck training jobs, it's a welcome fix.



Read More