Back to news
ai Priority 4/5 5/8/2026, 11:05:47 AM

ServiceNow AI Corrects vLLM V1 Reinforcement Learning Rollout Log Probability Calculations for Consistency

ServiceNow AI Corrects vLLM V1 Reinforcement Learning Rollout Log Probability Calculations for Consistency

ServiceNow AI reported critical backend fixes in vLLM V1 to align its reinforcement learning performance with the established vLLM 0.8.5 reference. The investigation revealed that the V1 engine, which involved a significant rewrite of the original codebase, produced inconsistent rollout log probabilities. These discrepancies directly impacted online reinforcement learning systems that rely on accurate log probability targets for optimization.

Related tools

Recommended tools for this topic

These picks prioritize high-intent tools relevant to this topic. Some links may include partner or affiliate tracking.

#vllm#huggingface#reinforcement-learning#ai#backend

Comparison

AspectBefore / AlternativeAfter / This
Log Probability ProcessingInconsistent rollout values in early V1Aligned with vLLM 0.8.5 reference
LM Head PrecisionVariable precision depending on runtimeEnforced fp32 lm_head for final projections
Runtime DefaultsV1 specific legacy defaultsStandardized defaults for RL consistency
Weight UpdatesPotential lags in in-flight updatesCorrected in-flight weight update path

Action Checklist

  1. Upgrade vLLM to version 0.18.1 or higher This version includes the core backend fixes for reinforcement learning.
  2. Verify GSPO training metrics against vLLM V0 benchmarks Ensure that rollout log probabilities match between the old and new engines.
  3. Validate fp32 precision for the lm_head projection Check that the final layer is using 32-bit floats to avoid rounding errors.
  4. Test online RL pipelines including PPO and GRPO These algorithms are highly sensitive to sampling and log probability discrepancies.

Source: Hugging Face Blog

This page summarizes the original source. Check the source for full details.

Related