Deep Learning for Real-Time Chip Temperature and Power Predictions

Deep Learning for Real-Time Chip Temperature and Power Predictions

Event Date: May 31, 2023
Authors: M. Bhatasana and A. Marconnet
Journal: 2023 Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems (ITherm)
2023 Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems (ITherm), Orlando, FL, May 30 - June 2, 2023.

Deep learning is a subset of machine learning that focuses on complex non-linear processing of data. These frameworks are central to emerging technologies like automated driving and medical imaging but could also be applied to thermal management challenges to reduce computational time and enable real-time predictions of temperature and power during operation of electronic devices. In this paper, we leverage convolutional neural network (CNN) frameworks to (1) predict the temperature map given the power distribution on the heated surface (i.e., the forward problem), and (2) predict the power distribution given a temperature map of the exposed surface in a silicon die (i.e., the inverse problem). The forward problem is solved using two CNN architectures. For a given power map, the first CNN predicts the range of temperatures on the heated surface (that is, the hottest and coolest temperatures), while the second CNN predicts the normalized spatial temperature distribution throughout this surface. This normalized distribution is then scaled using the temperatures estimated by the first CNN to predict the absolute temperature map. The predictions of minimum and maximum temperatures have an MAE of less than 0.5°C, with the combined framework to predict temperature distributions having an MAE of less than 1°C. The inverse problem is solved using a modified U-Net architecture that uses a popular pre-trained encoder MobileNetV2, and decoder blocks from the pix2pix framework. The MAE between the input temperature maps and those resulting from the predicted power maps is 1.4°C with an error in the normalized distribution of only 2%. With an inference time of 5 milliseconds (forward problem) and 14 milliseconds (inverse problem) on a commercial processor, this analysis shows potential for deployment on-chip for real- time temperature distribution predictions or for integration with inverse algorithms to predict power distributions.