Back to news
cloud Priority 4/5 4/24/2026, 11:05:37 AM

AWS Parallel Computing Service Adds Support for Slurm 25.11 and New Monitoring Endpoints

AWS Parallel Computing Service Adds Support for Slurm 25.11 and New Monitoring Endpoints

The latest update to AWS Parallel Computing Service integrates Slurm version 25.11 to enhance workload orchestration and observability for high-performance computing clusters. This release enables a Prometheus-compatible OpenMetrics endpoint, allowing administrators to collect cluster metrics directly using standard monitoring tools. These additions provide deeper visibility into scheduler performance and job lifecycle management.

#aws#cloud#official#marketing:marchitecture/compute

Comparison

AspectBefore / AlternativeAfter / This
Slurm VersionOlder versions such as 23.11 or 24.05Version 25.11
Metric CollectionCustom scripts or indirect log parsingNative Prometheus-compatible OpenMetrics endpoint
Audit CapabilitiesStandard job and system loggingDedicated scheduler audit logs for compliance
Job HandlingStandard re-queueing mechanismsIntroduction of expedited re-queue functionality

Action Checklist

  1. Verify compatibility of existing Slurm plugins with version 25.11 Custom plugins may require recompilation or code updates
  2. Configure the OpenMetrics endpoint in your Prometheus monitoring stack Ensure network access between the Prometheus server and the PCS controller
  3. Enable scheduler audit logs in the AWS Management Console Check CloudWatch Logs storage costs for high-volume clusters
  4. Test job re-queueing behavior in a development environment Validate how the expedited re-queue affects priority scheduling

Source: AWS What's New

This page summarizes the original source. Check the source for full details.

Related