/ NEWS

PyTorch 2.5 Speeding things up

The release of PyTorch 2.5 introduces significant enhancements, including a new CuDNN backend for SDPA and optimizations in the TorchInductor CPU backend. These updates aim to improve performance and streamline user experience across various machine learning tasks.

Key Features and Insights

Beta Features

  • CuDNN Backend for SDPA: This new backend offers speed improvements by default on NVIDIA H100 GPUs, achieving up to 75% faster performance compared to FlashAttentionV2.
  • Regional Compilation: The torch.compile feature allows for regional compilation of repeated nn.Module instances without recompilation, reducing cold startup times while maintaining performance.
  • TorchInductor CPU Backend Optimization: Enhancements include CPP backend code generation and support for various data types, delivering consistent performance improvements across multiple benchmark suites.

Prototype Features

  • FlexAttention: A flexible API for implementing diverse attention mechanisms with improved memory efficiency.
  • Compiled Autograd: Captures the entire backward pass, allowing for deferred tracing that is resilient to forward pass disruptions.
  • Flight Recorder: A debugging tool that helps identify issues in stuck jobs by capturing runtime information.
  • Max-autotune Support: Profiles multiple implementations at compile time to select the best-performing one for GEMM operations.
  • Enhanced Intel GPU Support: Improved support for Intel GPUs to accelerate machine learning workflows on both Data Center and Client GPUs.
  • FP16 Support: Float16 is now supported on CPU paths for both eager mode and TorchInductor, enhancing performance in neural network tasks.
  • Autoload Device Extension: Streamlines integration of out-of-tree device extensions by automating loading processes.
  • TorchInductor on Windows: The Inductor CPU backend is now compatible with Windows environments, supporting multiple compilers.

This release comprises 4095 commits from 504 contributors, reflecting the ongoing commitment of the PyTorch community to enhance its capabilities.

To read full article and explore all new updates provided by this update check the official blog post by PyTorch