ai Priority 4/5 5/8/2026, 11:05:47 AM

ServiceNow AI Corrects vLLM V1 Reinforcement Learning Rollout Log Probability Calculations for Consistency

ServiceNow AI reported critical backend fixes in vLLM V1 to align its reinforcement learning performance with the established vLLM 0.8.5 reference. The investigation revealed that the V1 engine, which involved a significant rewrite of the original codebase, produced inconsistent rollout log probabilities. These discrepancies directly impacted online reinforcement learning systems that rely on accurate log probability targets for optimization.

Related tools

Comparison

Aspect	Before / Alternative	After / This
Log Probability Processing	Inconsistent rollout values in early V1	Aligned with vLLM 0.8.5 reference
LM Head Precision	Variable precision depending on runtime	Enforced fp32 lm_head for final projections
Runtime Defaults	V1 specific legacy defaults	Standardized defaults for RL consistency
Weight Updates	Potential lags in in-flight updates	Corrected in-flight weight update path

Action Checklist

Upgrade vLLM to version 0.18.1 or higher This version includes the core backend fixes for reinforcement learning.
Verify GSPO training metrics against vLLM V0 benchmarks Ensure that rollout log probabilities match between the old and new engines.
Validate fp32 precision for the lm_head projection Check that the final layer is using 32-bit floats to avoid rounding errors.
Test online RL pipelines including PPO and GRPO These algorithms are highly sensitive to sampling and log probability discrepancies.

Source: Hugging Face Blog

This page summarizes the original source. Check the source for full details.

More English news Open source

ServiceNow AI Corrects vLLM V1 Reinforcement Learning Rollout Log Probability Calculations for Consistency

Recommended tools for this topic

Comparison

Action Checklist

Related