ML-Guided Cold Plate Design and Thermal Analysis for Liquid-Cooled HPC Servers
Abstract
Efficient thermal management is a major bottleneck in scaling high-performance computing (HPC) systems, where cooling accounts for a substantial share of total energy use. Liquid-cooled cold plates are increasingly adopted in data centers and power electronics, yet their design optimization remains costly due to computationally burdensome computational fluid dynamics (CFD) simulations and high-dimensional geometric spaces. We introduce a physics-informed neural network (PINN) framework for rapid thermal analysis and design exploration of parameterized cold plates. Our approach jointly solves the incompressible Navier–Stokes and conjugate heat transfer equations, leveraging a two-stage curriculum that first stabilizes liquid flow field learning before introducing thermal coupling. Once trained, the model produces physically consistent predictions and orders-of-magnitude faster inference than conventional CFD solvers. We demonstrate the framework across multiple cold plate topologies, capturing design-dependent flow patterns and thermal gradients that inform geometry–performance trade-offs. These results establish PINNs as a promising surrogate modeling tool for accelerating liquid-cooling design workflows, with implications for reducing the energy and carbon footprint of HPC infrastructure.