cloud Priority 4/5 4/24/2026, 11:05:37 AM

AWS Parallel Computing Service Adds Support for Slurm 25.11 and New Monitoring Endpoints

The latest update to AWS Parallel Computing Service integrates Slurm version 25.11 to enhance workload orchestration and observability for high-performance computing clusters. This release enables a Prometheus-compatible OpenMetrics endpoint, allowing administrators to collect cluster metrics directly using standard monitoring tools. These additions provide deeper visibility into scheduler performance and job lifecycle management.

Related tools

Comparison

Aspect	Before / Alternative	After / This
Slurm Version	Older versions such as 23.11 or 24.05	Version 25.11
Metric Collection	Custom scripts or indirect log parsing	Native Prometheus-compatible OpenMetrics endpoint
Audit Capabilities	Standard job and system logging	Dedicated scheduler audit logs for compliance
Job Handling	Standard re-queueing mechanisms	Introduction of expedited re-queue functionality

Action Checklist

Verify compatibility of existing Slurm plugins with version 25.11 Custom plugins may require recompilation or code updates
Configure the OpenMetrics endpoint in your Prometheus monitoring stack Ensure network access between the Prometheus server and the PCS controller
Enable scheduler audit logs in the AWS Management Console Check CloudWatch Logs storage costs for high-volume clusters
Test job re-queueing behavior in a development environment Validate how the expedited re-queue affects priority scheduling

Source: AWS What's New

This page summarizes the original source. Check the source for full details.

More English news Open source

AWS Parallel Computing Service Adds Support for Slurm 25.11 and New Monitoring Endpoints

Recommended tools for this topic

Comparison

Action Checklist