As technology advances and AI and machine learning applications proliferate, IT equipment has grown to require more power and higher heat loads. The energy efficiency and power use problem brought on by advanced IT equipment compounds itself: chips require more power and generate more heat, and that in turn requires additional power put towards cooling.
Liquid cooling is vital for scaling AI as it effectively manages the heat produced by high-performance computing systems. This approach improves reliability, lowers energy usage and accommodates the heavy computational requirements of AI tasks. By bringing cooling elements closer to the heat source, direct-to-chip liquid cooling solutions enhance heat dissipation efficiency and enable more precise temperature control.
This innovative cooling method not only improves the overall performance and reliability of servers but also allows data centers to operate at higher power densities.
Crafting the Chip
An important part of the chip manufacturing process is testing. Chip makers need to make sure that the chips they are producing are ready to operate at maximum capacity with 24/7 uptime. This requires intense testing before they leave the manufacturing facility. Because chips need to be cooled for these tests, manufacturers need strong cooling infrastructure on site. Historically, chip manufacturers were able to use mostly air cooling for tests. Even if chips were liquid cooled by end customers, they could be tested using less efficient cooling. However, new AI and high-performance chips cannot be switched on unless they are liquid cooled since air cooling cannot handle the heat load without damage to the chip.
Because of this, liquid cooling must be deployed in factory environments on day one. Manufacturers need to ensure the chips can successfully run at maximum power in the field, so they need to be tested under extreme operating conditions. End-users of chips can also benefit from this process by using the same liquid cooling infrastructure deployed in factories in the field, providing a consistent cooling infrastructure for chips.
Helium Leak Testing
A key aspect of chip manufacturing with liquid cooling is helium leak testing, used to find even the smallest leaks in liquid cooling infrastructure. It is crucial for chip manufacturers to make sure all liquid cooling components supplied by manufacturers, such as manifolds and CDUs, undergo helium leak testing. These tests help maintain integrity and reliability, enhancing overall quality of liquid infrastructure supporting high-performance IT. By following this practice, chip manufacturers can confidently ensure their products are produced without damage or failure.
nVent is a leader and innovator in liquid cooling with a strong track record of solving the toughest cooling challenges for global cloud service providers and collaborating to support the future of AI computing. nVent provides standard and custom solutions for the next generation of computing supported by its global scale and capacity. It is well-positioned to deliver resilient and sustainable liquid cooling and power solutions to support AI and high-performance computing around the globe.
It is important for chip manufacturers and end users to work with cooling infrastructure providers that understand how to properly design, install and service cooling architecture. Learn more about nVent’s comprehensive portfolio and expertise, including adaptable, modular and scalable data center solutions: Data Centers and Networking | nVent DATA-SOLUTIONS