cloud Priority 4/5 4/19/2026, 11:05:37 AM

Google Cloud Announces General Availability of NVIDIA L4 GPUs on Cloud Run for AI Inference

Google Cloud has officially announced the general availability of NVIDIA L4 GPU support within its Cloud Run serverless platform. This update allows developers to run high-performance workloads such as generative AI inference, video processing, and large-scale data transformations without managing underlying virtual machine infrastructure. The NVIDIA L4 GPU, powered by the Ada Lovelace architecture, provides superior energy efficiency and performance compared to its predecessors, making it ideal for modern serverless AI applications. The service maintains the core benefits of Cloud Run, including the ability to scale to zero when there is no traffic and rapid scaling in response to incoming requests. Developers no longer need to configure complex GPU node pools or handle manual driver installations, as the platform manages these aspects automatically. By simplifying the stack, teams can deploy containerized AI models more quickly while utilizing the pay-as-you-go pricing model to optimize operational expenses. This release significantly impacts how organizations approach AI deployment by lowering the barrier to entry for GPU-accelerated computing. With the integration of sidecar containers and persistent volume support, Cloud Run now offers a robust environment for enterprise-grade AI services. This change is particularly relevant for engineers looking to integrate large language models or computer vision into existing web services with minimal architectural overhead and maximum cost predictability.

Related tools

Comparison

Aspect	Before / Alternative	After / This
Infrastructure Management	Manual GPU node pool configuration in GKE or VM management	Fully managed serverless environment with zero node management
Scaling Behavior	Pre-provisioned VMs or slow-scaling Kubernetes nodes	Fast scaling with scale-to-zero capabilities for cost savings
Cost Model	Hourly billing for running instances regardless of load	Pay-per-request billing based on actual GPU usage time

Action Checklist

Select the NVIDIA L4 GPU option in Cloud Run service settings Ensure your region supports L4 GPU instances
Configure container resources to request GPU allocation A minimum of 4 vCPUs and 16GB RAM is typically required for GPU usage
Optimize container startup time for better scaling performance Use smaller base images or lazy loading for large model weights

Source: Google Cloud Blog

This page summarizes the original source. Check the source for full details.

More English news Open source

Google Cloud Announces General Availability of NVIDIA L4 GPUs on Cloud Run for AI Inference

Recommended tools for this topic

Comparison

Action Checklist